Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature][SQL Config] Add SQL config adapter #6757

Merged
merged 32 commits into from
May 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
74a2234
[Bug] Fix negative constant error in SQLTransform
Mar 15, 2024
28f4dac
add e2e test case
Mar 15, 2024
266234b
Merge branch 'apache:dev' into dev
rewerma Mar 19, 2024
3056cef
Merge branch 'apache:dev' into dev
rewerma Apr 24, 2024
bb677a8
[Feature][SQL Config] Add SQL config adapter
Apr 25, 2024
fdd43ee
add license header
Apr 25, 2024
1157878
add another syntax of sink sql config for jdbc test
Apr 25, 2024
9235c8e
fix e2e test config util and remove ST4
Apr 26, 2024
34dcd33
fix e2e test spark2 config builder
Apr 26, 2024
5e18f21
add insert select source_table syntax for SQL config
Apr 26, 2024
b538fd6
optimize code
Apr 26, 2024
f5d1695
optimize code
Apr 26, 2024
3d3fde1
fix dead line
Apr 27, 2024
38617cd
fix dead link
Apr 27, 2024
a13ef7b
fix sql transform type error
Apr 27, 2024
77235d6
optimize code
Apr 27, 2024
0ad2d90
fix e2e test
Apr 27, 2024
c8e7c56
Merge branch 'apache:dev' into dev
rewerma Apr 28, 2024
5551efc
Merge branch 'dev' into freature/sql-config
Apr 28, 2024
2db4d14
Merge branch 'apache:dev' into dev
rewerma May 6, 2024
efe60e3
Merge branch 'dev' into freature/sql-config
May 6, 2024
d5effec
fix conflict code
May 6, 2024
bd9e0f2
Merge branch 'apache:dev' into freature/sql-config
rewerma May 6, 2024
74351e9
Merge branch 'apache:dev' into dev
rewerma May 6, 2024
ba70acf
Merge branch 'dev' into freature/sql-config
May 6, 2024
9c9aa2a
Merge remote-tracking branch 'origin/freature/sql-config' into freatu…
May 6, 2024
98c2d80
Merge branch 'apache:dev' into dev
rewerma May 7, 2024
f72ee56
Merge branch 'dev' into freature/sql-config
May 7, 2024
05d679b
Merge branch 'apache:dev' into dev
rewerma May 7, 2024
5136ae6
Merge branch 'dev' into freature/sql-config
May 7, 2024
0f66475
fix spelling mistake of sql-config.md
May 8, 2024
d6b3a64
add some test case
May 8, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/en/concept/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ configure the Config file.
The main format of the Config file is `hocon`, for more details of this format type you can refer to [HOCON-GUIDE](https://github.com/lightbend/config/blob/main/HOCON.md),
BTW, we also support the `json` format, but you should know that the name of the config file should end with `.json`

We also support the `SQL` format, for details, please refer to the [SQL configuration](sql-config.md) file.

## Example

Before you read on, you can find config file
Expand Down
189 changes: 189 additions & 0 deletions docs/en/concept/sql-config.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,189 @@
# SQL Configuration File

## Structure of SQL Configuration File

The `SQL` configuration file appears as follows.

### SQL

```sql
/* config
env {
parallelism = 1
job.mode = "BATCH"
}
*/

CREATE TABLE source_table WITH (
'connector'='jdbc',
hailin0 marked this conversation as resolved.
Show resolved Hide resolved
'type'='source',
'url' = 'jdbc:mysql://localhost:3306/seatunnel',
'driver' = 'com.mysql.cj.jdbc.Driver',
'user' = 'root',
'password' = '123456',
'query' = 'select * from source',
'properties'= '{
useSSL = false,
rewriteBatchedStatements = true
}'
);

CREATE TABLE sink_table WITH (
'connector'='jdbc',
'type'='sink',
'url' = 'jdbc:mysql://localhost:3306/seatunnel',
'driver' = 'com.mysql.cj.jdbc.Driver',
'user' = 'root',
'password' = '123456',
'generate_sink_sql' = 'true',
'database' = 'seatunnel',
'table' = 'sink'
);

INSERT INTO sink_table SELECT id, name, age, email FROM source_table;
```

## Explanation of `SQL` Configuration File

### General Configuration in SQL File

```sql
/* config
env {
parallelism = 1
job.mode = "BATCH"
}
*/
```

In the `SQL` file, common configuration sections are defined using `/* config */` comments. Inside, common configurations like `env` can be defined using `HOCON` format.

### SOURCE SQL Syntax

```sql
CREATE TABLE source_table WITH (
'connector'='jdbc',
'type'='source',
'url' = 'jdbc:mysql://localhost:3306/seatunnel',
'driver' = 'com.mysql.cj.jdbc.Driver',
'user' = 'root',
'password' = '123456',
'query' = 'select * from source',
'properties' = '{
useSSL = false,
rewriteBatchedStatements = true
}'
);
```

* Using `CREATE TABLE ... WITH (...)` syntax creates a mapping for the source table. The `TABLE` name is the name of the source-mapped table, and the `WITH` syntax contains source-related configuration parameters.
* There are two fixed parameters in the WITH syntax: `connector` and `type`, representing connector plugin name (such as `jdbc`, `FakeSource`, etc.) and source type (fixed as `source`), respectively.
* Other parameter names can reference relevant configuration parameters of the corresponding connector plugin, but the format needs to be changed to `'key' = 'value',`.
* If `'value'` is a sub-configuration, you can directly use a string in `HOCON` format. Note: if using a sub-configuration in `HOCON` format, the internal property items must be separated by `,`, like this:

```sql
'properties' = '{
useSSL = false,
rewriteBatchedStatements = true
}'
```

* If using `'` within `'value'`, it needs to be escaped with `''`, like this:

```sql
'query' = 'select * from source where name = ''Joy Ding'''
```

### SINK SQL Syntax

```sql
CREATE TABLE sink_table WITH (
'connector'='jdbc',
'type'='sink',
'url' = 'jdbc:mysql://localhost:3306/seatunnel',
'driver' = 'com.mysql.cj.jdbc.Driver',
'user' = 'root',
'password' = '123456',
'generate_sink_sql' = 'true',
'database' = 'seatunnel',
'table' = 'sink'
);
```

* Using `CREATE TABLE ... WITH (...)` syntax creates a mapping for the target table. The `TABLE` name is the name of the target-mapped table, and the `WITH` syntax contains sink-related configuration parameters.
* There are two fixed parameters in the `WITH` syntax: `connector` and `type`, representing connector plugin name (such as `jdbc`, `console`, etc.) and target type (fixed as `sink`), respectively.
* Other parameter names can reference relevant configuration parameters of the corresponding connector plugin, but the format needs to be changed to `'key' = 'value',`.

### INSERT INTO SELECT Syntax

```sql
INSERT INTO sink_table SELECT id, name, age, email FROM source_table;
```

* The `SELECT FROM` part is the table name of the source-mapped table.
* The `INSERT INTO` part is the table name of the target-mapped table.
* Note: This syntax does **not support** specifying fields in `INSERT`, like this: `INSERT INTO sink_table (id, name, age, email) SELECT id, name, age, email FROM source_table;`

### INSERT INTO SELECT TABLE Syntax

```sql
INSERT INTO sink_table SELECT source_table;
```

* The `SELECT` part directly uses the name of the source-mapped table, indicating that all data from the source table will be inserted into the target table.
* Using this syntax does not generate related `transform` configurations. This syntax is generally used in multi-table synchronization scenarios. For example:

```sql
CREATE TABLE source_table WITH (
'connector'='jdbc',
'type' = 'source',
'url' = 'jdbc:mysql://127.0.0.1:3306/seatunnel',
'driver' = 'com.mysql.cj.jdbc.Driver',
'user' = 'root',
'password' = '123456',
'table_list' = '[
{
table_path = "source.table1"
},
{
table_path = "source.table2",
query = "select * from source.table2"
}
]'
);

CREATE TABLE sink_table WITH (
'connector'='jdbc',
'type' = 'sink',
'url' = 'jdbc:mysql://127.0.0.1:3306/seatunnel',
'driver' = 'com.mysql.cj.jdbc.Driver',
'user' = 'root',
'password' = '123456',
'generate_sink_sql' = 'true',
'database' = 'sink'
);

INSERT INTO sink_table SELECT source_table;
```

### CREATE TABLE AS Syntax

```sql
CREATE TABLE temp1 AS SELECT id, name, age, email FROM source_table;
```

* This syntax creates a temporary table with the result of a `SELECT` query, used for `INSERT INTO` operations.
* The syntax of the `SELECT` part refers to: [SQL-transform](../transform-v2/sql.md) `query` configuration item

```sql
CREATE TABLE temp1 AS SELECT id, name, age, email FROM source_table;

INSERT INTO sink_table SELECT * FROM temp1;
```

## Example of SQL Configuration File Submission

```bash
./bin/seatunnel.sh --config ./config/sample.sql
```

4 changes: 4 additions & 0 deletions docs/zh/concept/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,10 @@ BTW, we also support the `json` format, but you should know that the name of the
配置文件的主要格式是 `hocon`, 有关该格式类型的更多信息你可以参考[HOCON-GUIDE](https://github.com/lightbend/config/blob/main/HOCON.md),
顺便提一下,我们也支持 `json`格式,但你应该知道配置文件的名称应该是以 `.json`结尾。

We also support the `SQL` format, for details, please refer to the [SQL configuration](sql-config.md) file.

我们同时提供了以 `SQL` 格式,详细可以参考[SQL配置文件](sql-config.md)。

## 例子

在你阅读之前,你可以在发布包中的config目录[这里](https://github.com/apache/seatunnel/tree/dev/config)找到配置文件的例子。
Expand Down
189 changes: 189 additions & 0 deletions docs/zh/concept/sql-config.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,189 @@
# SQL配置文件

## SQL配置文件结构

`SQL`配置文件类似下面。

### SQL

```sql
/* config
env {
parallelism = 1
job.mode = "BATCH"
}
*/

CREATE TABLE source_table WITH (
'connector'='jdbc',
'type'='source',
'url' = 'jdbc:mysql://localhost:3306/seatunnel',
'driver' = 'com.mysql.cj.jdbc.Driver',
'user' = 'root',
'password' = '123456',
'query' = 'select * from source',
'properties'= '{
useSSL = false,
rewriteBatchedStatements = true
}'
);

CREATE TABLE sink_table WITH (
'connector'='jdbc',
'type'='sink',
'url' = 'jdbc:mysql://localhost:3306/seatunnel',
'driver' = 'com.mysql.cj.jdbc.Driver',
'user' = 'root',
'password' = '123456',
'generate_sink_sql' = 'true',
'database' = 'seatunnel',
'table' = 'sink'
);

INSERT INTO sink_table SELECT id, name, age, email FROM source_table;
```

## `SQL`配置文件说明

### 通用配置

```sql
/* config
env {
parallelism = 1
job.mode = "BATCH"
}
*/
```

在`SQL`文件中通过 `/* config */` 注释定义通用配置部分,内部可以使用`hocon`格式定义通用的配置,如`env`等。

### SOURCE SQL语法

```sql
CREATE TABLE source_table WITH (
'connector'='jdbc',
'type'='source',
'url' = 'jdbc:mysql://localhost:3306/seatunnel',
'driver' = 'com.mysql.cj.jdbc.Driver',
'user' = 'root',
'password' = '123456',
'query' = 'select * from source',
'properties' = '{
useSSL = false,
rewriteBatchedStatements = true
}'
);
```

* 使用 `CREATE TABLE ... WITH (...)` 语法可创建源端表映射, `TABLE`表名为源端映射的表名,`WITH`语法中为源端相关的配置参数
* 在WITH语法中有两个固定参数:`connector` 和 `type`,分别表示连接器插件名(如:`jdbc`、`FakeSource`等)和源端类型(固定为:`source`)
* 其它参数名可以参考对应连接器插件的相关配置参数,但是格式需要改为`'key' = 'value',`的形式
* 如果`'value'`为一个子配置,可以直接使用`hocon`格式的字符串,注意:如果使用`hocon`格式的子配置,内部的属性项之间必须用`,`分隔!如:

```sql
'properties' = '{
useSSL = false,
rewriteBatchedStatements = true
}'
```

* 如果在`'value'`中使用到`'`,需要用`''`进行转义,如:

```sql
'query' = 'select * from source where name = ''Joy Ding'''
```

### SINK SQL语法

```sql
CREATE TABLE sink_table WITH (
'connector'='jdbc',
'type'='sink',
'url' = 'jdbc:mysql://localhost:3306/seatunnel',
'driver' = 'com.mysql.cj.jdbc.Driver',
'user' = 'root',
'password' = '123456',
'generate_sink_sql' = 'true',
'database' = 'seatunnel',
'table' = 'sink'
);
```

* 使用 `CREATE TABLE ... WITH (...)` 语法可创建目标端表映射, `TABLE`表名为目标端映射的表名,`WITH`语法中为目标端相关的配置参数
* 在WITH语法中有两个固定参数:`connector` 和 `type`,分别表示连接器插件名(如:`jdbc`、`console`等)和目标端类型(固定为:`sink`)
* 其它参数名可以参考对应连接器插件的相关配置参数,但是格式需要改为`'key' = 'value',`的形式

### INSERT INTO SELECT语法

```sql
INSERT INTO sink_table SELECT id, name, age, email FROM source_table;
```

* `SELECT FROM` 部分为源端映射表的表名,`SELECT` 部分的语法参考:[SQL-transform](../transform-v2/sql.md) `query` 配置项
* `INSERT INTO` 部分为目标端映射表的表名
* 注意:该语法**不支持**在 `INSERT` 中指定字段,如:`INSERT INTO sink_table (id, name, age, email) SELECT id, name, age, email FROM source_table;`

### INSERT INTO SELECT TABLE语法

```sql
INSERT INTO sink_table SELECT source_table;
```

* `SELECT` 部分直接使用源端映射表的表名,表示将源端表的所有数据插入到目标端表中
* 使用该语法不会生成`trasform`的相关配置,这种语法一般用在多表同步的场景,示例:

```sql
CREATE TABLE source_table WITH (
'connector'='jdbc',
'type' = 'source',
'url' = 'jdbc:mysql://127.0.0.1:3306/seatunnel',
'driver' = 'com.mysql.cj.jdbc.Driver',
'user' = 'root',
'password' = '123456',
'table_list' = '[
{
table_path = "source.table1"
},
{
table_path = "source.table2",
query = "select * from source.table2"
}
]'
);

CREATE TABLE sink_table WITH (
'connector'='jdbc',
'type' = 'sink',
'url' = 'jdbc:mysql://127.0.0.1:3306/seatunnel',
'driver' = 'com.mysql.cj.jdbc.Driver',
'user' = 'root',
'password' = '123456',
'generate_sink_sql' = 'true',
'database' = 'sink'
);

INSERT INTO sink_table SELECT source_table;
```

### CREATE TABLE AS语法

```sql
CREATE TABLE temp1 AS SELECT id, name, age, email FROM source_table;
```

* 该语法可以将一个`SELECT`查询结果作为一个临时表,用于的`INSERT INTO`操作
* `SELECT` 部分的语法参考:[SQL-transform](../transform-v2/sql.md) `query` 配置项

```sql
CREATE TABLE temp1 AS SELECT id, name, age, email FROM source_table;

INSERT INTO sink_table SELECT * FROM temp1;
```

## SQL配置文件任务提交示例

```bash
./bin/seatunnel.sh --config ./config/sample.sql
```