New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-32082][docs] Documentation of checkpoint file-merging #24766
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR.
PTAL my comments.
docs/content/docs/dev/datastream/fault-tolerance/checkpointing.md
Outdated
Show resolved
Hide resolved
docs/content/docs/dev/datastream/fault-tolerance/checkpointing.md
Outdated
Show resolved
Hide resolved
to be written into a single file, reducing the number of file creations and file deletions, helping to alleviate the pressure | ||
of file system metadata management and file flooding problem. The unified fie merging mechanism can be enabled by setting | ||
the property `state.checkpoints.file-merging.enabled` to `true`. **Note** that enabling this mechanism may lead to space amplification, | ||
that is, the actual occupation on the file system will be larger than the checkpoint size. `state.checkpoints.file-merging.max-space-amplification` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, the metric of checkpoint size should be consistent, right ?
If It's compared with before or acutal state, let's adjust it as before checkpoint size
or actual state size
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, thanks for the suggestion, I adjusted it as actual state size
.
## 统一的 checkpoint 文件合并机制 | ||
|
||
Flink 1.20 引入了统一的 checkpoint 文件合并机制,该机制允许把分散的 checkpoint 文件写到同一个文件中,减少 checkpoint 文件创建删除的次数, | ||
有助于减轻文件系统元数据管理的压力、 解决文件洪泛问题。可以通过将 `state.checkpoints.file-merging.enabled` 设置为 `true` 来开启该机制。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
文件洪泛问题
seems not a common description in chinese.
How about just describing it more directly ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it to 文件过多问题
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, I left some comments.
@@ -292,4 +292,25 @@ The final checkpoint would be triggered immediately after all operators have rea | |||
without waiting for periodic triggering, but the job will need to wait for this final checkpoint | |||
to be completed. | |||
|
|||
## Unify file merging mechanism for checkpoints |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding (Experimental)
in title.
|
||
The unified file merging mechanism for checkpointing is introduced to Flink 1.20 as an MVP ("minimum viable product") feature, | ||
which allows scattered small checkpoint files to be written into a single file, reducing the number of file creations | ||
and file deletions, helping to alleviate the pressure of file system metadata management and file flooding problem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and file deletions, helping to alleviate the pressure of file system metadata management and file flooding problem. | |
and file deletions, which alleviates the pressure of file system metadata management raised by the file flooding problem during checkpoints. |
will be larger than actual state size. `state.checkpoints.file-merging.max-space-amplification` | ||
can be used to limit the upper bound of space amplification. | ||
|
||
This mechanism is applicable to keyed state, operator state and channel state in Flink. Subtask level granular merging is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mechanism is applicable to keyed state, operator state and channel state in Flink. Subtask level granular merging is | |
This mechanism is applicable to keyed state, operator state and channel state in Flink. Merging at subtask level is |
The unified fie merging mechanism also supports file merging across checkpoints, which can be enabled by setting | ||
`state.checkpoints.file-merging.across-checkpoint-boundary` to `true`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unified fie merging mechanism also supports file merging across checkpoints, which can be enabled by setting | |
`state.checkpoints.file-merging.across-checkpoint-boundary` to `true`. | |
This feature also supports merging files across checkpoints. To enable this, set | |
`state.checkpoints.file-merging.across-checkpoint-boundary` to `true`. |
The unified fie merging mechanism also supports file merging across checkpoints, which can be enabled by setting | ||
`state.checkpoints.file-merging.across-checkpoint-boundary` to `true`. | ||
|
||
This mechanism introduces a file pool to handle concurrent writing scenarios. The blocking mode can be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mechanism introduces a file pool to handle concurrent writing scenarios. The blocking mode can be | |
This mechanism introduces a file pool to handle concurrent writing scenarios. There are two modes....... The blocking mode...... while the non-blocking modes...... . This can be configured via ``. |
Add some description to mode? instead of talking about enabling the option.
## Unify file merging mechanism for checkpoints | ||
|
||
The unified file merging mechanism for checkpointing is introduced to Flink 1.20 as an MVP ("minimum viable product") feature, | ||
which allows scattered small checkpoint files to be written into a single file, reducing the number of file creations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which allows scattered small checkpoint files to be written into a single file, reducing the number of file creations | |
which allows scattered small checkpoint files to be written into larger files, reducing the number of file creations |
The unified file merging mechanism for checkpointing is introduced to Flink 1.20 as an MVP ("minimum viable product") feature, | ||
which allows scattered small checkpoint files to be written into a single file, reducing the number of file creations | ||
and file deletions, helping to alleviate the pressure of file system metadata management and file flooding problem. | ||
The unified fie merging mechanism can be enabled by setting the property `state.checkpoints.file-merging.enabled` to `true`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unified fie merging mechanism can be enabled by setting the property `state.checkpoints.file-merging.enabled` to `true`. | |
The mechanism can be enabled by setting `state.checkpoints.file-merging.enabled` to `true`. |
which allows scattered small checkpoint files to be written into a single file, reducing the number of file creations | ||
and file deletions, helping to alleviate the pressure of file system metadata management and file flooding problem. | ||
The unified fie merging mechanism can be enabled by setting the property `state.checkpoints.file-merging.enabled` to `true`. | ||
**Note** that enabling this mechanism may lead to space amplification, that is, the actual occupation on the file system |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
**Note** that enabling this mechanism may lead to space amplification, that is, the actual occupation on the file system | |
**Note** that as a trade-off, enabling this mechanism may lead to space amplification, that is, the actual occupation on the file system |
file frequently; while the blocking mode will be blocked until there are returned files available in the file pool. This can be configured via | ||
`state.checkpoints.file-merging.pool-blocking`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
file frequently; while the blocking mode will be blocked until there are returned files available in the file pool. This can be configured via | |
`state.checkpoints.file-merging.pool-blocking`. | |
file frequently; while the blocking mode will be blocked until there are returned files available in the file pool. This can be configured via | |
setting `state.checkpoints.file-merging.pool-blocking` as `true` for blocking or `false` for non-blocking. |
`state.checkpoints.file-merging.max-subtasks-per-file` 选项配置单个文件允许写入的最大 subtask 数目。 | ||
|
||
统一文件合并机制也支持跨 checkpoint 的文件合并,通过设置 `state.checkpoints.file-merging.across-checkpoint-boundary` 为 `true` 开启。 | ||
该机制引入了文件池用于处理并发写的场景,文件池有两种模式,Non-blocking 模式的文件池会对每个文件请求即时返回一个物理文件,在频繁请求的情况下会创建出许多物理文件;而 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add an empty line above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update! LGTM
What is the purpose of the change
This PR adds documentation of checkpoint file-merging.
Brief change log
Verifying this change
This change is a trivial rework / code cleanup without any test coverage.
Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: (no)Documentation