Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMORO-2811][Bug][FLINK]: flink unified catalog doesn't support mixed_hive #2814

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Aireed
Copy link
Contributor

@Aireed Aireed commented May 7, 2024

Why are the changes needed?

Close #2811 .

Brief change log

create table if not exists xxxx (
....
);

A unifiedCatalog which metastore is hms.

  • when create a mixed-hive table:

    • If there is a table in mixed-hive format with the same name, the program can execute correctly with no error thrown and the table is not overwrited;
    • If there is a table in hive format with the same name, the program will throw an error,
    • If there is no table with the same name, a mixed-hive table will be created.
  • when create a iceberg table

    • if there is a table in iceberg format with the same name, the program can execute correctly with no error ,and the table is not overwrited;
    • If there is no table with the same name, a iceberg table will be created.
  • when create a mixed-iceberg table

    • If there is a table in iceberg format with the same name, the program can execute correctly with no error thrown and the table is not overwrited;
    • If there is a table in hive format with the same name, the program should thrown an error to indicate the table does exist;
    • If there is no table with the same name, an iceberg table will be created.

How was this patch tested?

  • Add some test cases that check the changes thoroughly including negative and positive cases if possible

  • Add screenshots for manual tests if appropriate

  • Run test locally before making a pull request

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@github-actions github-actions bot added module:mixed-flink Flink moduel for Mixed Format module:mixed-hive Hive moduel for Mixed Format labels May 7, 2024
@Aireed
Copy link
Contributor Author

Aireed commented May 7, 2024

cc @YesOrNo828 @xieyi888 PTAL.

Comment on lines +229 to +244
if (!configuration.contains(TABLE_FORMAT)) {
// if user doesn't specify the table format, we try to load the table and get the table
// format.
try {
AmoroTable amoroTable =
unifiedCatalog.loadTable(tableIdentifier.getDatabase(), tableIdentifier.getTableName());
format = amoroTable.format();
} catch (Throwable t) {
LOG.warn(
"We can't load table {}.{} through unifiedCatalog, use default format {}",
tablePath.getDatabaseName(),
tablePath.getObjectName(),
format.name());
}
}
final TableFormat catalogFormat = format;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is used to create a table. How to load the table before creating it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is used to create a table. How to load the table before creating it?

  public void createTable(ObjectPath tablePath, CatalogBaseTable table, boolean ignoreIfExists)

In the Flink SQL scenario, when users are writing SQL, this situation arises when they use "create table if not exists" to define a table that already exists. The parameter "ignoreIfExists" is used to ignore the existence of the table.

AmoroTable amoroTable =
unifiedCatalog.loadTable(tableIdentifier.getDatabase(), tableIdentifier.getTableName());
format = amoroTable.format();
} catch (Throwable t) {
Copy link
Contributor

@xieyi888 xieyi888 May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe

catch (org.apache.amoro.NoSuchTableException e) 

would be better

Comment on lines +229 to +247
if (!configuration.contains(TABLE_FORMAT)) {
// if user doesn't specify the table format, we try to load the table and get the table
// format.
try {
AmoroTable amoroTable =
unifiedCatalog.loadTable(tableIdentifier.getDatabase(), tableIdentifier.getTableName());
format = amoroTable.format();
} catch (Throwable t) {
LOG.warn(
"We can't load table {}.{} through unifiedCatalog, use default format {}",
tablePath.getDatabaseName(),
tablePath.getObjectName(),
format.name());
}
}
final TableFormat catalogFormat = format;
AbstractCatalog catalog =
getOriginalCatalog(format).orElseGet(() -> createOriginalCatalog(tableIdentifier, format));
getOriginalCatalog(format)
.orElseGet(() -> createOriginalCatalog(tableIdentifier, catalogFormat));
Copy link
Contributor

@xieyi888 xieyi888 May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If table exists, we load table and get tableFormat for it, Otherwise, we use default tableFormat to create table, which is MIXED_ICEBERG.

Did we should force user to add 'table.format' option in create table statement here?

Copy link
Contributor Author

@Aireed Aireed May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we should force user to add 'table.format' option in create table statement?

Good idea. If the table cannot be loaded via unifiedCatalog and no format is specified, we should report an error to prompt the user to specify the format. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could support a property of catalog which define a defult table format when creating a table without specific the format.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could support a property of catalog which define a defult table format when creating a table without specific the format.

IMO, Configuring the format on both the table and the catalog at the same time can easily mislead users. The "Unified Catalog" is designed to allow users not to worry about the underlying format (except when it is created).

Let's listen to some other opinions. WDYT @YesOrNo828 @xieyi888

@@ -221,8 +226,25 @@ public void createTable(ObjectPath tablePath, CatalogBaseTable table, boolean ig
TableIdentifier tableIdentifier =
TableIdentifier.of(
unifiedCatalog.name(), tablePath.getDatabaseName(), tablePath.getObjectName());
if (!configuration.contains(TABLE_FORMAT)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!configuration.contains(TABLE_FORMAT)) {
if (!configuration.contains(TABLE_FORMAT) && ignoreIfExists && unfiiedCatalog.tableExists()) {

Comment on lines +229 to +247
if (!configuration.contains(TABLE_FORMAT)) {
// if user doesn't specify the table format, we try to load the table and get the table
// format.
try {
AmoroTable amoroTable =
unifiedCatalog.loadTable(tableIdentifier.getDatabase(), tableIdentifier.getTableName());
format = amoroTable.format();
} catch (Throwable t) {
LOG.warn(
"We can't load table {}.{} through unifiedCatalog, use default format {}",
tablePath.getDatabaseName(),
tablePath.getObjectName(),
format.name());
}
}
final TableFormat catalogFormat = format;
AbstractCatalog catalog =
getOriginalCatalog(format).orElseGet(() -> createOriginalCatalog(tableIdentifier, format));
getOriginalCatalog(format)
.orElseGet(() -> createOriginalCatalog(tableIdentifier, catalogFormat));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could support a property of catalog which define a defult table format when creating a table without specific the format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:mixed-flink Flink moduel for Mixed Format module:mixed-hive Hive moduel for Mixed Format
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug][FLINK]: flink unified catalog doesn't support mixed_hive
3 participants