Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-34993] parser changes to support model CRUD sql #24769

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

lihaosky
Copy link

What is the purpose of the change

Support model CRUD sql syntax

Brief change log

Add following syntax for model

  • create model
  • drop model
  • show model
  • show create model
  • alter model

Verifying this change

Unit test for parser

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (n)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (yes)
  • If yes, how is the feature documented? (JavaDocs)

@lihaosky
Copy link
Author

cc @twalthr , @wuchong

@flinkbot
Copy link
Collaborator

flinkbot commented May 10, 2024

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

Copy link
Contributor

@twalthr twalthr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @lihaosky. I added some comments. Esp. the support for temporary and error behavior for missing implementation should be solved before merging this.

<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-checkstyle-plugin</artifactId>

<configuration>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not be required. See comment below.

@@ -2412,6 +2440,8 @@ SqlDrop SqlDropExtended(Span s, boolean replace) :
drop = SqlDropDatabase(s, replace)
|
drop = SqlDropFunction(s, replace, isTemporary)
|
drop = SqlDropModel(s) /** temporary model is not supported **/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we add this as part of this PR? Should be straight forward to add.

}

/**
* CREATE MODEL [IF NOT EXIST] modelName
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add temporary as part of this PR, should be easily copyable from CREATE TABLE?

}
|
<SET>
modelOptionList = TableProperties()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename TableProperties() to ObjectOptions() or just Options() for consistency. I guess we are also using TableProperties() for the catalog WITH clause? we should leave the code base in a better shape with every PR.


import static java.util.Objects.requireNonNull;

/** {@link SqlNode} to describe the CREATE MODEL syntax. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also give a full syntax example in the JavaDoc here similar to SqlAlterModel

* <pre>{@code
* CREATE MODEL my_model WITH (
* ‘task’ = ‘regression’,
* ‘algorithm’: ‘boosted_tree’,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

colon is also not supported. A similar to SqlAlterModel would be more appropriate here.


/**
* DESCRIBE MODEL [EXTENDED] [[catalogName.] dataBasesName].sqlIdentifier sql call. Here we add Rich
* in className to follow the convention of SqlDescribeTable, which only had it to distinguish from
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use {@link when referring to a class like SqlDescribeTable this help the reader to quickly navigate the code base

+ " 'key2'='value2'\n"
+ " ) as select f1, f2 from t1\n")
.ok(
"CREATE MODEL IF NOT EXISTS `M1` WITH (\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also test the error behavior when a user adds explicit columns to input and output using CREATE MODEL AS

"-//Puppy Crawl//DTD Suppressions 1.1//EN"
"http://www.puppycrawl.com/dtds/suppressions_1_1.dtd">

<suppressions>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a dedicated file? Shouldn't adding this to the regular Flink suppressions be enough?

@@ -147,6 +148,7 @@ class FlinkPlannerImpl(
|| sqlNode.isInstanceOf[SqlShowProcedures]
|| sqlNode.isInstanceOf[SqlShowJobs]
|| sqlNode.isInstanceOf[SqlDescribeJob]
|| sqlNode.isInstanceOf[SqlRichDescribeModel]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the current error behavior when people start using the syntax without an implementation for it? Does Flink generate a helpful message? Like "Unsupported operatin: CREATE MODEL AS"? It's unlikely that we will provide implementation for all keywords until feature freeze in 3 weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants