Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All workflow steps need to execute on the same kubernetes node #294

Open
stefannica opened this issue Nov 15, 2021 · 0 comments
Open

All workflow steps need to execute on the same kubernetes node #294

stefannica opened this issue Nov 15, 2021 · 0 comments

Comments

@stefannica
Copy link
Member

With the current Tekton backend workflow implementation, all steps in a workflow need to be executed on the same kubernetes node. This limitation is enforced by Tekton, because all steps need to mount the PVC associated with the pipeline workspace where the codeset is cloned.

This can have serious performance consequences, and can even lead to pod scheduling pipeline failures e.g. when a workflow step requires resources (GPUs/CPUs) that are not available on the node where the pipeline run is scheduled.

Luckily, all steps in a workflow are executed in sequence (there's no support for parallel workflow steps yet), so at least this doesn't reduce the degree of parallelism of FuseML workflows.

A different strategy should be investigated, for example one that doesn't automatically map the workspace PVC if it's not needed. Alternatively, distributed storage should be used if available.

@stefannica stefannica added this to Backlog in FuseML Project Board via automation Nov 15, 2021
@stefannica stefannica changed the title All workflow step need to execute on the same kubernetes node All workflow steps need to execute on the same kubernetes node Nov 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

1 participant