Skip to content

Commit

Permalink
InfluxDB source add read by chunk fix doc
Browse files Browse the repository at this point in the history
  • Loading branch information
zhoulonghua committed May 7, 2024
1 parent 0ef871f commit bcf7659
Show file tree
Hide file tree
Showing 3 changed files with 9 additions and 9 deletions.
9 changes: 3 additions & 6 deletions docs/en/connector-v2/source/InfluxDB.md
Original file line number Diff line number Diff line change
Expand Up @@ -209,12 +209,9 @@ source {
```

> Tips:
> - Chunked queries are used to address situations where no suitable split column can be found for partitioned querying, yet the data volume is large.
Therefore, if a split_column is configured or chunk_size = 0, chunked queries will not be performed.
> - When using partitioned queries, the parallelism of the source can only be set to 1, yet the speed remains fast, which will put pressure on the downstream.
It is recommended to increase the parallelism of the downstream, or increase the output rate, to reduce backpressure and improve performance.
> - When using chunked queries, pressure will be applied to the InfluxDB database itself, which is proportional to the data volume.
In tests, when Seatunnel synchronized more than 20GB of data, the memory usage of InfluxDB increased by over 10GB.
> - Chunked queries are used to address situations where no suitable split column can be found for partitioned querying, yet the data volume is large. Therefore, if a split_column is configured or chunk_size = 0, chunked queries will not be performed.
> - When using partitioned queries, the parallelism of the source can only be set to 1, yet the speed remains fast, which will put pressure on the downstream. It is recommended to increase the parallelism of the downstream, or increase the output rate, to reduce backpressure and improve performance.
> - When using chunked queries, pressure will be applied to the InfluxDB database itself, which is proportional to the data volume. In tests, when Seatunnel synchronized more than 20GB of data, the memory usage of InfluxDB increased by over 10GB.
## Changelog

Expand Down
8 changes: 5 additions & 3 deletions docs/zh/connector-v2/source/InfluxDB.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@

## Options

| name | type | required | default value |
| name | type | required | default value |
|--------------------|--------|----------|---------------|
| url | string | yes | - |
| sql | string | yes | - |
Expand Down Expand Up @@ -90,7 +90,6 @@ InfluxDB 分片列
> - InfluxDB的时间不支持作为分片主键,因为时间字段不能参与数学计算
> - 目前,split_column仅支持Integer类型分片,并不支持float、string、date等类型。

### upper_bound [long]

分片字段数据的上限
Expand Down Expand Up @@ -136,7 +135,7 @@ InfluxDB的查询超时时间,单位为秒

### common options

插件公共参数,请参考 [公共选项](common-options.md)
插件公共参数,请参考 [公共选项](common-options.md)

## Examples

Expand Down Expand Up @@ -167,6 +166,7 @@ source {
```

不使用分片查询的示例

```hocon
source {
Expand All @@ -187,6 +187,7 @@ source {
```

使用分块查询的示例

```hocon
source {
InfluxDB {
Expand All @@ -204,6 +205,7 @@ source {
}
}
```

> Tips:
> - 分块查询是为了解决没有办法找到合适的分片列进行分片查询,但同时数据量又大的情况。所以如果配置了split_column或者chunk_size = 0则不进行分块查询。
> - 使用分块查询时,source并行度只能为1,但速度仍然很快,将对下游造成压力,建议将下游的并行度调大,或者输出速率调大,减少反压,提高性能。
Expand Down
1 change: 1 addition & 0 deletions docs/zh/connector-v2/source/common-options.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
`result_table_name`没有被指定时,通过这个插件处理的数据将不会被注册为可以被其他插件直接访问的数据集`(dataStream/dataset)`,也不会被称作临时表`(table)`

当指定了`result_table_name`时,通过这个插件处理的数据将会被注册为可以被其他插件直接访问的数据集`(dataStream/dataset)`,或者称为临时表`(table)`。在这里注册的数据集`(dataStream/dataset)`可以通过指定`source_table_name`被其他插件直接访问。

### parallelism [int]

如果 `parallelism` 没有指定, 将默认使用env的 `parallelism`
Expand Down

0 comments on commit bcf7659

Please sign in to comment.