Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Optimization Performance docs #5278

Merged

Conversation

davidmirror-ops
Copy link
Contributor

@davidmirror-ops davidmirror-ops commented Apr 23, 2024

Tracking issue

Closes #4611

Why are the changes needed?

This PR looks to bring clarity and guide users to improve Propeller performance from a better understanding of how it works.

What changes were proposed in this pull request?

How was this patch tested?

Setup process

Screenshots

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

Docs link

Copy link

codecov bot commented Apr 23, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 60.23%. Comparing base (5bed5cc) to head (234c98f).
Report is 591 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5278      +/-   ##
==========================================
+ Coverage   58.93%   60.23%   +1.29%     
==========================================
  Files         645      646       +1     
  Lines       55380    45664    -9716     
==========================================
- Hits        32638    27505    -5133     
+ Misses      20159    15569    -4590     
- Partials     2583     2590       +7     
Flag Coverage Δ
unittests ?
unittests-datacatalog 69.31% <ø> (?)
unittests-flyteadmin 58.90% <ø> (?)
unittests-flytecopilot 17.79% <ø> (?)
unittests-flyteidl 79.30% <ø> (?)
unittests-flyteplugins 61.94% <ø> (?)
unittests-flytepropeller 57.32% <ø> (?)
unittests-flytestdlib 65.75% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@davidmirror-ops davidmirror-ops marked this pull request as ready for review May 9, 2024 00:02
@davidmirror-ops davidmirror-ops changed the title [WIP] Update Optimization Performance page Update Optimization Performance page May 9, 2024
@davidmirror-ops davidmirror-ops changed the title Update Optimization Performance page Update Optimization Performance docs May 9, 2024
Copy link
Contributor

@neverett neverett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some questions and feedback, let me know if I can help clarify anything!

==========================
`FlytePropeller <https://pkg.go.dev/github.com/flyteorg/FlytePropeller>`_ is the core engine of Flyte that executes the workflows for Flyte.
It is designed as a `Kubernetes Controller <https://kubernetes.io/docs/concepts/architecture/controller/>`_, where the desired state is specified as a FlyteWorkflow `Custom Resource <https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/>`_.
a. Every workflow execution is independent and can be performed by a completeley distinct process.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend making this an unordered list to make it clear that order doesn't matter here.

`FlytePropeller <https://pkg.go.dev/github.com/flyteorg/FlytePropeller>`_ is the core engine of Flyte that executes the workflows for Flyte.
It is designed as a `Kubernetes Controller <https://kubernetes.io/docs/concepts/architecture/controller/>`_, where the desired state is specified as a FlyteWorkflow `Custom Resource <https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/>`_.
a. Every workflow execution is independent and can be performed by a completeley distinct process.
b. When a workflow definition is compiled, the resulting DAG structure is traversed by the controller and the goal is to gracefully transition each task to Success.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "Success" the actually status emitted? If so, maybe put in backticks.

It is designed as a `Kubernetes Controller <https://kubernetes.io/docs/concepts/architecture/controller/>`_, where the desired state is specified as a FlyteWorkflow `Custom Resource <https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/>`_.
a. Every workflow execution is independent and can be performed by a completeley distinct process.
b. When a workflow definition is compiled, the resulting DAG structure is traversed by the controller and the goal is to gracefully transition each task to Success.
c. Node executions are performed by various FlytePlugins; a diverse collection of operations spanning Kubernetes and other remote services. FlytePropeller is only responsible for effectively monitoring and managing these executions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused by this sentence—does this mean that FlytePlugins are a diverse colllection of operations spanning K8s and other remote services, or are these distinct things?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah,this was was inherited from the old doc but not very clear. I'll try to improve it.


One of the base assumptions of FlytePropeller is that every workflow is independent and can be executed by a completely distinct process, without a need for communication with other processes. Meanwhile, one workflow tracks the dependencies between tasks using a DAG structure and hence constantly needs synchronization.
Currently, FlytePropeller executes Workflows by using an event loop to periodically track and amend the execution status. Within each iteration, a single thread requests the state of Workflow nodes and performs operations (i.e., scheduling node executions, handling failures, etc) to gracefully transition a Workflow from the observed state to the desired state (i.e., Success). Consequently, actual node executions are performed by various FlytePlugins, a diverse collection of operations spanning k8s and other remote services, and FlytePropeller is only responsible for effectively monitoring and managing these executions.
In the following sections you will learn how Flyte takes care of the correct and reliable execution of workflows through multiple stages, and what strategies you can apply to help the system efficiently handle increasing load.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In the following sections you will learn how Flyte takes care of the correct and reliable execution of workflows through multiple stages, and what strategies you can apply to help the system efficiently handle increasing load.
In the following sections you will learn how Flyte takes care of the correct and reliable execution of workflows through multiple stages and what strategies you can apply to help the system efficiently handle increasing load.

:header-rows: 1
.. image:: https://raw.githubusercontent.com/flyteorg/static-resources/main/flyte/configuration/perf_optimization/propeller-perf-lifecycle-01.png

The ``Worker`` is the independent, lightweight and idempotent process that interacts with all the components in the Propeller controller to drive executions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The ``Worker`` is the independent, lightweight and idempotent process that interacts with all the components in the Propeller controller to drive executions.
The ``Worker`` is the independent, lightweight, and idempotent process that interacts with all the components in the Propeller controller to drive executions.


The Hash Shard Strategy, denoted by "type: Hash" in the configuration below, uses consistent hashing to evenly distribute FlyteWorkflows over managed FlytePropeller instances. This configuration requires a "shard-count" variable which defines the number of managed FlytePropeller instances. You may change the shard count without impacting existing workflows. Note that changing the shard-count is a manual step, it is not auto-scaling.
The Hash Shard Strategy, denoted by ``type: Hash`` in the configuration below, uses consistent hashing to evenly distribute FlyteWorkflows over managed FlytePropeller instances. This configuration requires a ``shard-count`` variable which defines the number of managed FlytePropeller instances. You may change the shard count without impacting existing workflows. Note that changing the ``shard-count`` is a manual step, it is not auto-scaling.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The Hash Shard Strategy, denoted by ``type: Hash`` in the configuration below, uses consistent hashing to evenly distribute FlyteWorkflows over managed FlytePropeller instances. This configuration requires a ``shard-count`` variable which defines the number of managed FlytePropeller instances. You may change the shard count without impacting existing workflows. Note that changing the ``shard-count`` is a manual step, it is not auto-scaling.
The hash shard strategy, denoted by ``type: Hash`` in the configuration below, uses consistent hashing to evenly distribute Flyte workflows over managed FlytePropeller instances. This configuration requires a ``shard-count`` variable, which defines the number of managed FlytePropeller instances. You may change the shard count without impacting existing workflows. Note that changing the ``shard-count`` is a manual step; it is not auto-scaling.

@@ -211,7 +269,7 @@ The Hash Shard Strategy, denoted by "type: Hash" in the configuration below, use
type: Hash # use the "hash" shard strategy
shard-count: 4 # the total number of shards

The Project and Domain Shard Strategies, denoted by "type: project" and "type: domain" respectively, use the FlyteWorkflow project and domain metadata to shard FlyteWorkflows. These Shard Strategies are configured using a "per-shard-mapping" option, which is a list of ID lists. Each element in the "per-shard-mapping" list defines a new shard and the ID list assigns responsibility for the specified IDs to that shard. A shard configured as a single wildcard ID (i.e. "*") is responsible for all IDs that are not covered by other shards. Only a single shard may be configured with a wildcard ID and on that shard their must be only one ID, namely the wildcard.
The Project and Domain Shard Strategies, denoted by ``type: project`` and ``type: domain`` respectively, use the FlyteWorkflow project and domain metadata to shard FlyteWorkflows. These Shard Strategies are configured using a ``per-shard-mapping`` option, which is a list of IDs. Each element in the ``per-shard-mapping`` list defines a new shard, and the ID list assigns responsibility for the specified IDs to that shard. A shard configured as a single wildcard ID (i.e. ``*``) is responsible for all IDs that are not covered by other shards. Only a single shard may be configured with a wildcard ID and, on that shard, there must be only one ID, namely the wildcard.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The Project and Domain Shard Strategies, denoted by ``type: project`` and ``type: domain`` respectively, use the FlyteWorkflow project and domain metadata to shard FlyteWorkflows. These Shard Strategies are configured using a ``per-shard-mapping`` option, which is a list of IDs. Each element in the ``per-shard-mapping`` list defines a new shard, and the ID list assigns responsibility for the specified IDs to that shard. A shard configured as a single wildcard ID (i.e. ``*``) is responsible for all IDs that are not covered by other shards. Only a single shard may be configured with a wildcard ID and, on that shard, there must be only one ID, namely the wildcard.
The project and domain shard strategies, denoted by ``type: project`` and ``type: domain`` respectively, use the Flyte workflow project and domain metadata to shard Flyte workflows. These shard strategies are configured using a ``per-shard-mapping`` option, which is a list of IDs. Each element in the ``per-shard-mapping`` list defines a new shard, and the ID list assigns responsibility for the specified IDs to that shard. A shard configured as a single wildcard ID (i.e. ``*``) is responsible for all IDs that are not covered by other shards. Only a single shard may be configured with a wildcard ID and, on that shard, there must be only one ID, namely the wildcard.

@@ -248,7 +306,7 @@ The Project and Domain Shard Strategies, denoted by "type: project" and "type: d

Multi-Cluster mode
===================
In our experience at Lyft, we saw that the Kubernetes cluster would have problems before FlytePropeller or FlyteAdmin would have impact. Thus Flyte supports adding multiple dataplane clusters by default. Each dataplane cluster, has one or more FlytePropellers running in them, and flyteadmin manages the routing and assigning of workloads to these clusters.
If the K8s cluster itself becomes a performance bottleneck, Flyte supports adding multiple K8s dataplane clusters by default. Each dataplane cluster has one or more FlytePropellers running in them, and flyteadmin manages the routing and assigning of workloads to these clusters.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If the K8s cluster itself becomes a performance bottleneck, Flyte supports adding multiple K8s dataplane clusters by default. Each dataplane cluster has one or more FlytePropellers running in them, and flyteadmin manages the routing and assigning of workloads to these clusters.
If the K8s cluster itself becomes a performance bottleneck, Flyte supports adding multiple K8s dataplane clusters by default. Each dataplane cluster has one or more FlytePropellers running in it, and flyteadmin manages the routing and assigning of workloads to these clusters.

@@ -257,8 +315,8 @@ Improving etcd Performance
Offloading Static Workflow Information from CRD
-----------------------------------------------

Flyte uses a k8s CRD (Custom Resource Definition) to store and track workflow executions. This resource includes the workflow definition, for example tasks and subworkflows that are involved and the dependencies between nodes, but also includes the execution status of the workflow. The latter information (ie. runtime status) is dynamic, meaning changes during the workflow's execution as nodes transition phases and the workflow execution progresses. However, the former information (ie. workflow definition) remains static, meaning it will never change and is only consulted to retrieve node definitions and workflow dependencies.
Flyte uses a K8s CRD (Custom Resource Definition) to store and track workflow executions. This resource includes the workflow definition, for example tasks and subworkflows that are involved and the dependencies between nodes. It also includes the execution status of the workflow. The latter information (ie. runtime status) is dynamic, and changes during the workflow's execution as nodes transition phases and the workflow execution progresses. However, the former information (ie. workflow definition) remains static, meaning it will never change and is only consulted to retrieve node definitions and workflow dependencies.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Flyte uses a K8s CRD (Custom Resource Definition) to store and track workflow executions. This resource includes the workflow definition, for example tasks and subworkflows that are involved and the dependencies between nodes. It also includes the execution status of the workflow. The latter information (ie. runtime status) is dynamic, and changes during the workflow's execution as nodes transition phases and the workflow execution progresses. However, the former information (ie. workflow definition) remains static, meaning it will never change and is only consulted to retrieve node definitions and workflow dependencies.
Flyte uses a K8s CRD (Custom Resource Definition) to store and track workflow executions. This resource includes the workflow definitionfor example, tasks and subworkflows that are involved, and the dependencies between nodes. It also includes the execution status of the workflow. The latter information (i.e. runtime status) is dynamic, and changes during the workflow's execution as nodes transition phases and the workflow execution progresses. However, the former information (i.e. workflow definition) remains static, meaning it will never change and is only consulted to retrieve node definitions and workflow dependencies.


CRDs are stored within etcd, a key-value datastore heavily used in kubernetes. Etcd requires a complete rewrite of the value data every time a single field changes. Consequently, the read / write performance of etcd, as with all key-value stores, is strongly correlated with the size of the data. In Flyte's case, to guarantee only-once execution of nodes we need to persist workflow state by updating the CRD at every node phase change. As the size of a workflow increases this means we are frequently rewriting a large CRD. In addition to poor read / write performance in etcd this update may be restricted by a hard limit on the overall CRD size.
CRDs are stored within ``etcd``, which requires a complete rewrite of the value data every time a single field changes. Consequently, the read / write performance of ``etcd``, as with all key-value stores, is strongly correlated with the size of the data. In Flyte's case, to guarantee only-once execution of nodes we need to persist workflow state by updating the CRD at every node phase change. As the size of a workflow increases this means we are frequently rewriting a large CRD. In addition to poor read / write performance in ``etcd``, these updates may be restricted by a hard limit on the overall CRD size.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
CRDs are stored within ``etcd``, which requires a complete rewrite of the value data every time a single field changes. Consequently, the read / write performance of ``etcd``, as with all key-value stores, is strongly correlated with the size of the data. In Flyte's case, to guarantee only-once execution of nodes we need to persist workflow state by updating the CRD at every node phase change. As the size of a workflow increases this means we are frequently rewriting a large CRD. In addition to poor read / write performance in ``etcd``, these updates may be restricted by a hard limit on the overall CRD size.
CRDs are stored within ``etcd``, which requires a complete rewrite of the value data every time a single field changes. Consequently, the read / write performance of ``etcd``, as with all key-value stores, is strongly correlated with the size of the data. In Flyte's case, to guarantee only-once execution of nodes, we need to persist workflow state by updating the CRD at every node phase change. As the size of a workflow increases, this means we are frequently rewriting a large CRD. In addition to poor read / write performance in ``etcd``, these updates may be restricted by a hard limit on the overall CRD size.

Copy link
Contributor

@hamersaw hamersaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks accurate to me! thanks for cleaning this up - it feels much more useful.

It is designed as a `Kubernetes Controller <https://kubernetes.io/docs/concepts/architecture/controller/>`_, where the desired state is specified as a FlyteWorkflow `Custom Resource <https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/>`_.
a. Every workflow execution is independent and can be performed by a completeley distinct process.
b. When a workflow definition is compiled, the resulting DAG structure is traversed by the controller and the goal is to gracefully transition each task to Success.
c. Node executions are performed by various FlytePlugins; a diverse collection of operations spanning Kubernetes and other remote services. FlytePropeller is only responsible for effectively monitoring and managing these executions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect we should be precise with our terminology here. Technically task executions use FlytePlugins. Node executions can be dynamics, subworkflows, gate nodes, array nodes, etc. A TaskNode is a node execution that then uses FlytePlugins.

timeout: 30s # Refers to timeout when talking with kubeapi server
kube-client-config:
qps: 100 # Refers to max rate of requests (queries per second) to kube-apiserver
burst: 25 # refers to max burst rate.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

burst should be >= qps. we did a little dive here into the code and realized we have been misconfiguring this for awhile.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh good finding. I tried to rephrase this section, to the best of my knowledge. I think I'll need to also update the configuration reference page in a separate PR.

Sharded scale-out
-------------------
FlytePropeller Manager is a new component introduced as part of `this RFC <https://github.com/flyteorg/flyte/blob/master/rfc/system/1483-flytepropeller-horizontal-scaling.md>`_ to facilitate horizontal scaling of FlytePropeller through sharding. Effectively, the Manager is responsible for maintaining liveness and proper configuration over a collection of FlytePropeller instances. This scheme uses k8s label selectors to deterministically assign FlyteWorkflow CRD responsibilities to FlytePropeller instances, effectively distributing processing load over the shards.
FlytePropeller Manager is a new component introduced to facilitate horizontal scaling of FlytePropeller through sharding. Effectively, the Manager is responsible for maintaining liveness and proper configuration over a collection of FlytePropeller instances. This scheme uses K8s label selectors to deterministically assign FlyteWorkflow CRD responsibilities to FlytePropeller instances, effectively distributing processing load over the shards.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@davidmirror-ops davidmirror-ops force-pushed the dx685-propeller-arch-docs branch 2 times, most recently from 367b641 to 917f986 Compare May 15, 2024 21:47
Signed-off-by: davidmirror-ops <david.espejo@union.ai>
Signed-off-by: davidmirror-ops <david.espejo@union.ai>
Signed-off-by: davidmirror-ops <david.espejo@union.ai>
Signed-off-by: davidmirror-ops <david.espejo@union.ai>
Signed-off-by: davidmirror-ops <david.espejo@union.ai>
Signed-off-by: davidmirror-ops <david.espejo@union.ai>
Signed-off-by: davidmirror-ops <david.espejo@union.ai>
Signed-off-by: davidmirror-ops <david.espejo@union.ai>
Signed-off-by: davidmirror-ops <david.espejo@union.ai>
Signed-off-by: davidmirror-ops <david.espejo@union.ai>
Signed-off-by: davidmirror-ops <david.espejo@union.ai>
Signed-off-by: davidmirror-ops <david.espejo@union.ai>
Signed-off-by: davidmirror-ops <david.espejo@union.ai>
Signed-off-by: davidmirror-ops <david.espejo@union.ai>
Signed-off-by: davidmirror-ops <david.espejo@union.ai>
Signed-off-by: davidmirror-ops <david.espejo@union.ai>
Signed-off-by: davidmirror-ops <david.espejo@union.ai>
Signed-off-by: davidmirror-ops <david.espejo@union.ai>
Signed-off-by: davidmirror-ops <david.espejo@union.ai>
Signed-off-by: davidmirror-ops <david.espejo@union.ai>
Signed-off-by: davidmirror-ops <david.espejo@union.ai>
@davidmirror-ops davidmirror-ops merged commit 519080b into flyteorg:master May 16, 2024
49 of 50 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Docs] provide clarity on the cache that flyte propeller uses
3 participants