Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-32082][docs] Documentation of checkpoint file-merging #24766

Merged
merged 1 commit into from May 16, 2024

Conversation

fredia
Copy link
Contributor

@fredia fredia commented May 10, 2024

What is the purpose of the change

This PR adds documentation of checkpoint file-merging.

Brief change log

  • add documentation of checkpoint file-merging

Verifying this change

This change is a trivial rework / code cleanup without any test coverage.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (no)
  • If yes, how is the feature documented? (docs)

@fredia fredia requested review from masteryhx and Zakelly May 10, 2024 03:42
@flinkbot
Copy link
Collaborator

flinkbot commented May 10, 2024

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

Copy link
Contributor

@masteryhx masteryhx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR.
PTAL my comments.

to be written into a single file, reducing the number of file creations and file deletions, helping to alleviate the pressure
of file system metadata management and file flooding problem. The unified fie merging mechanism can be enabled by setting
the property `state.checkpoints.file-merging.enabled` to `true`. **Note** that enabling this mechanism may lead to space amplification,
that is, the actual occupation on the file system will be larger than the checkpoint size. `state.checkpoints.file-merging.max-space-amplification`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, the metric of checkpoint size should be consistent, right ?
If It's compared with before or acutal state, let's adjust it as before checkpoint size or actual state size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, thanks for the suggestion, I adjusted it as actual state size.

## 统一的 checkpoint 文件合并机制

Flink 1.20 引入了统一的 checkpoint 文件合并机制,该机制允许把分散的 checkpoint 文件写到同一个文件中,减少 checkpoint 文件创建删除的次数,
有助于减轻文件系统元数据管理的压力、 解决文件洪泛问题。可以通过将 `state.checkpoints.file-merging.enabled` 设置为 `true` 来开启该机制。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

文件洪泛问题 seems not a common description in chinese.
How about just describing it more directly ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to 文件过多问题.

Copy link
Contributor

@Zakelly Zakelly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, I left some comments.

@@ -292,4 +292,25 @@ The final checkpoint would be triggered immediately after all operators have rea
without waiting for periodic triggering, but the job will need to wait for this final checkpoint
to be completed.

## Unify file merging mechanism for checkpoints
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding (Experimental) in title.


The unified file merging mechanism for checkpointing is introduced to Flink 1.20 as an MVP ("minimum viable product") feature,
which allows scattered small checkpoint files to be written into a single file, reducing the number of file creations
and file deletions, helping to alleviate the pressure of file system metadata management and file flooding problem.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
and file deletions, helping to alleviate the pressure of file system metadata management and file flooding problem.
and file deletions, which alleviates the pressure of file system metadata management raised by the file flooding problem during checkpoints.

will be larger than actual state size. `state.checkpoints.file-merging.max-space-amplification`
can be used to limit the upper bound of space amplification.

This mechanism is applicable to keyed state, operator state and channel state in Flink. Subtask level granular merging is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This mechanism is applicable to keyed state, operator state and channel state in Flink. Subtask level granular merging is
This mechanism is applicable to keyed state, operator state and channel state in Flink. Merging at subtask level is

Comment on lines 309 to 310
The unified fie merging mechanism also supports file merging across checkpoints, which can be enabled by setting
`state.checkpoints.file-merging.across-checkpoint-boundary` to `true`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The unified fie merging mechanism also supports file merging across checkpoints, which can be enabled by setting
`state.checkpoints.file-merging.across-checkpoint-boundary` to `true`.
This feature also supports merging files across checkpoints. To enable this, set
`state.checkpoints.file-merging.across-checkpoint-boundary` to `true`.

The unified fie merging mechanism also supports file merging across checkpoints, which can be enabled by setting
`state.checkpoints.file-merging.across-checkpoint-boundary` to `true`.

This mechanism introduces a file pool to handle concurrent writing scenarios. The blocking mode can be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This mechanism introduces a file pool to handle concurrent writing scenarios. The blocking mode can be
This mechanism introduces a file pool to handle concurrent writing scenarios. There are two modes....... The blocking mode...... while the non-blocking modes...... . This can be configured via ``.

Add some description to mode? instead of talking about enabling the option.

## Unify file merging mechanism for checkpoints

The unified file merging mechanism for checkpointing is introduced to Flink 1.20 as an MVP ("minimum viable product") feature,
which allows scattered small checkpoint files to be written into a single file, reducing the number of file creations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
which allows scattered small checkpoint files to be written into a single file, reducing the number of file creations
which allows scattered small checkpoint files to be written into larger files, reducing the number of file creations

The unified file merging mechanism for checkpointing is introduced to Flink 1.20 as an MVP ("minimum viable product") feature,
which allows scattered small checkpoint files to be written into a single file, reducing the number of file creations
and file deletions, helping to alleviate the pressure of file system metadata management and file flooding problem.
The unified fie merging mechanism can be enabled by setting the property `state.checkpoints.file-merging.enabled` to `true`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The unified fie merging mechanism can be enabled by setting the property `state.checkpoints.file-merging.enabled` to `true`.
The mechanism can be enabled by setting `state.checkpoints.file-merging.enabled` to `true`.

which allows scattered small checkpoint files to be written into a single file, reducing the number of file creations
and file deletions, helping to alleviate the pressure of file system metadata management and file flooding problem.
The unified fie merging mechanism can be enabled by setting the property `state.checkpoints.file-merging.enabled` to `true`.
**Note** that enabling this mechanism may lead to space amplification, that is, the actual occupation on the file system
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Note** that enabling this mechanism may lead to space amplification, that is, the actual occupation on the file system
**Note** that as a trade-off, enabling this mechanism may lead to space amplification, that is, the actual occupation on the file system

Comment on lines 314 to 315
file frequently; while the blocking mode will be blocked until there are returned files available in the file pool. This can be configured via
`state.checkpoints.file-merging.pool-blocking`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
file frequently; while the blocking mode will be blocked until there are returned files available in the file pool. This can be configured via
`state.checkpoints.file-merging.pool-blocking`.
file frequently; while the blocking mode will be blocked until there are returned files available in the file pool. This can be configured via
setting `state.checkpoints.file-merging.pool-blocking` as `true` for blocking or `false` for non-blocking.

`state.checkpoints.file-merging.max-subtasks-per-file` 选项配置单个文件允许写入的最大 subtask 数目。

统一文件合并机制也支持跨 checkpoint 的文件合并,通过设置 `state.checkpoints.file-merging.across-checkpoint-boundary` 为 `true` 开启。
该机制引入了文件池用于处理并发写的场景,文件池有两种模式,Non-blocking 模式的文件池会对每个文件请求即时返回一个物理文件,在频繁请求的情况下会创建出许多物理文件;而
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an empty line above?

Copy link
Contributor

@Zakelly Zakelly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update! LGTM

@fredia fredia merged commit a8cf2ba into apache:master May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants