New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Update Optimization Performance docs #5278

Merged

davidmirror-ops merged 21 commits into flyteorg:master from davidmirror-ops:dx685-propeller-arch-docs

May 16, 2024

Contributor

davidmirror-ops commented Apr 23, 2024 •

edited

Tracking issue

Closes #4611

Why are the changes needed?

This PR looks to bring clarity and guide users to improve Propeller performance from a better understanding of how it works.

What changes were proposed in this pull request?

How was this patch tested?

Setup process

Screenshots

Check all the applicable boxes

I updated the documentation accordingly.
All new and existing tests passed.
All commits are signed-off.

Related PRs

Docs link

codecov bot commented Apr 23, 2024 •

edited

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 60.23%. Comparing base (5bed5cc) to head (234c98f).
Report is 591 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #5278      +/-   ##
==========================================
+ Coverage   58.93%   60.23%   +1.29%     
==========================================
  Files         645      646       +1     
  Lines       55380    45664    -9716     
==========================================
- Hits        32638    27505    -5133     
+ Misses      20159    15569    -4590     
- Partials     2583     2590       +7

Flag	Coverage Δ
unittests	`?`
unittests-datacatalog	`69.31% <ø> (?)`
unittests-flyteadmin	`58.90% <ø> (?)`
unittests-flytecopilot	`17.79% <ø> (?)`
unittests-flyteidl	`79.30% <ø> (?)`
unittests-flyteplugins	`61.94% <ø> (?)`
unittests-flytepropeller	`57.32% <ø> (?)`
unittests-flytestdlib	`65.75% <ø> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

davidmirror-ops marked this pull request as ready for review

May 9, 2024 00:02

davidmirror-ops requested review from neverett and ppiegaze as code owners

May 9, 2024 00:02

davidmirror-ops changed the title ~~[WIP] Update Optimization Performance page~~ Update Optimization Performance page

davidmirror-ops requested a review from hamersaw

May 9, 2024 00:03

davidmirror-ops changed the title ~~Update Optimization Performance page~~ Update Optimization Performance docs

neverett reviewed

View reviewed changes

Contributor

neverett left a comment

Left some questions and feedback, let me know if I can help clarify anything!

docs/deployment/configuration/performance.rst Outdated

-              ==========================
-              `FlytePropeller <https://pkg.go.dev/github.com/flyteorg/FlytePropeller>`_ is the core engine of Flyte that executes the workflows for Flyte.
-              It is designed as a `Kubernetes Controller <https://kubernetes.io/docs/concepts/architecture/controller/>`_, where the desired state is specified as a FlyteWorkflow `Custom Resource <https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/>`_.
+              a. Every workflow execution is independent and can be performed by a completeley distinct process.

Contributor

neverett May 9, 2024

I recommend making this an unordered list to make it clear that order doesn't matter here.

docs/deployment/configuration/performance.rst Outdated

-              `FlytePropeller <https://pkg.go.dev/github.com/flyteorg/FlytePropeller>`_ is the core engine of Flyte that executes the workflows for Flyte.
-              It is designed as a `Kubernetes Controller <https://kubernetes.io/docs/concepts/architecture/controller/>`_, where the desired state is specified as a FlyteWorkflow `Custom Resource <https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/>`_.
+              a. Every workflow execution is independent and can be performed by a completeley distinct process.
+              b. When a workflow definition is compiled, the resulting DAG structure is traversed by the controller and the goal is to gracefully transition each task to Success.

Contributor

neverett May 9, 2024

Is "Success" the actually status emitted? If so, maybe put in backticks.

docs/deployment/configuration/performance.rst Outdated

-              It is designed as a `Kubernetes Controller <https://kubernetes.io/docs/concepts/architecture/controller/>`_, where the desired state is specified as a FlyteWorkflow `Custom Resource <https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/>`_.
+              a. Every workflow execution is independent and can be performed by a completeley distinct process.
+              b. When a workflow definition is compiled, the resulting DAG structure is traversed by the controller and the goal is to gracefully transition each task to Success.
+              c. Node executions are performed by various FlytePlugins; a diverse collection of operations spanning Kubernetes and other remote services. FlytePropeller is only responsible for effectively monitoring and managing these executions.

Contributor

neverett May 9, 2024

I'm a bit confused by this sentence—does this mean that FlytePlugins are a diverse colllection of operations spanning K8s and other remote services, or are these distinct things?

Contributor Author

davidmirror-ops May 10, 2024

Yeah,this was was inherited from the old doc but not very clear. I'll try to improve it.

docs/deployment/configuration/performance.rst Outdated

-              One of the base assumptions of FlytePropeller is that every workflow is independent and can be executed by a completely distinct process, without a need for communication with other processes. Meanwhile, one workflow tracks the dependencies between tasks using a DAG structure and hence constantly needs synchronization.
-              Currently, FlytePropeller executes Workflows by using an event loop to periodically track and amend the execution status. Within each iteration, a single thread requests the state of Workflow nodes and performs operations (i.e., scheduling node executions, handling failures, etc) to gracefully transition a Workflow from the observed state to the desired state (i.e., Success). Consequently, actual node executions are performed by various FlytePlugins, a diverse collection of operations spanning k8s and other remote services, and FlytePropeller is only responsible for effectively monitoring and managing these executions.
+              In the following sections you will learn how Flyte takes care of the correct and reliable execution of workflows through multiple stages, and what strategies you can apply to help the system efficiently handle increasing load.

Contributor

neverett May 9, 2024

Suggested change

      
            In the following sections you will learn how Flyte takes care of the correct and reliable execution of workflows through multiple stages, and what strategies you can apply to help the system efficiently handle increasing load.
          
            In the following sections you will learn how Flyte takes care of the correct and reliable execution of workflows through multiple stages and what strategies you can apply to help the system efficiently handle increasing load.

docs/deployment/configuration/performance.rst Outdated

-                 :header-rows: 1
+              .. image:: https://raw.githubusercontent.com/flyteorg/static-resources/main/flyte/configuration/perf_optimization/propeller-perf-lifecycle-01.png
+              The ``Worker`` is the independent, lightweight and idempotent process that interacts with all the components in the Propeller controller to drive executions.

Contributor

neverett May 9, 2024

Suggested change

      
            The ``Worker`` is the independent, lightweight and idempotent process that interacts with all the components in the Propeller controller to drive executions. 
          
            The ``Worker`` is the independent, lightweight, and idempotent process that interacts with all the components in the Propeller controller to drive executions.

docs/deployment/configuration/performance.rst Outdated

    
              The Hash Shard Strategy, denoted by "type: Hash" in the configuration below, uses consistent hashing to evenly distribute FlyteWorkflows over managed FlytePropeller instances. This configuration requires a "shard-count" variable which defines the number of managed FlytePropeller instances. You may change the shard count without impacting existing workflows. Note that changing the shard-count is a manual step, it is not auto-scaling.

              The Hash Shard Strategy, denoted by ``type: Hash`` in the configuration below, uses consistent hashing to evenly distribute FlyteWorkflows over managed FlytePropeller instances. This configuration requires a ``shard-count`` variable which defines the number of managed FlytePropeller instances. You may change the shard count without impacting existing workflows. Note that changing the ``shard-count`` is a manual step, it is not auto-scaling.

Contributor

neverett May 9, 2024

Suggested change

      
            The Hash Shard Strategy, denoted by ``type: Hash`` in the configuration below, uses consistent hashing to evenly distribute FlyteWorkflows over managed FlytePropeller instances. This configuration requires a ``shard-count`` variable which defines the number of managed FlytePropeller instances. You may change the shard count without impacting existing workflows. Note that changing the ``shard-count`` is a manual step, it is not auto-scaling.
          
            The hash shard strategy, denoted by ``type: Hash`` in the configuration below, uses consistent hashing to evenly distribute Flyte workflows over managed FlytePropeller instances. This configuration requires a ``shard-count`` variable, which defines the number of managed FlytePropeller instances. You may change the shard count without impacting existing workflows. Note that changing the ``shard-count`` is a manual step; it is not auto-scaling.

docs/deployment/configuration/performance.rst Outdated

    
            @@ -211,7 +269,7 @@ The Hash Shard Strategy, denoted by "type: Hash" in the configuration below, use
          
                          type: Hash     # use the "hash" shard strategy

                          shard-count: 4 # the total number of shards

              The Project and Domain Shard Strategies, denoted by "type: project" and "type: domain" respectively, use the FlyteWorkflow project and domain metadata to shard FlyteWorkflows. These Shard Strategies are configured using a "per-shard-mapping" option, which is a list of ID lists. Each element in the "per-shard-mapping" list defines a new shard and the ID list assigns responsibility for the specified IDs to that shard. A shard configured as a single wildcard ID (i.e. "*") is responsible for all IDs that are not covered by other shards. Only a single shard may be configured with a wildcard ID and on that shard their must be only one ID, namely the wildcard.

              The Project and Domain Shard Strategies, denoted by ``type: project`` and ``type: domain`` respectively, use the FlyteWorkflow project and domain metadata to shard FlyteWorkflows. These Shard Strategies are configured using a ``per-shard-mapping`` option, which is a list of IDs. Each element in the ``per-shard-mapping`` list defines a new shard, and the ID list assigns responsibility for the specified IDs to that shard. A shard configured as a single wildcard ID (i.e. ``*``) is responsible for all IDs that are not covered by other shards. Only a single shard may be configured with a wildcard ID and, on that shard, there must be only one ID, namely the wildcard.

Contributor

neverett May 9, 2024

Suggested change

      
            The Project and Domain Shard Strategies, denoted by ``type: project`` and ``type: domain`` respectively, use the FlyteWorkflow project and domain metadata to shard FlyteWorkflows. These Shard Strategies are configured using a ``per-shard-mapping`` option, which is a list of IDs. Each element in the ``per-shard-mapping`` list defines a new shard, and the ID list assigns responsibility for the specified IDs to that shard. A shard configured as a single wildcard ID (i.e. ``*``) is responsible for all IDs that are not covered by other shards. Only a single shard may be configured with a wildcard ID and, on that shard, there must be only one ID, namely the wildcard.
          
            The project and domain shard strategies, denoted by ``type: project`` and ``type: domain`` respectively, use the Flyte workflow project and domain metadata to shard Flyte workflows. These shard strategies are configured using a ``per-shard-mapping`` option, which is a list of IDs. Each element in the ``per-shard-mapping`` list defines a new shard, and the ID list assigns responsibility for the specified IDs to that shard. A shard configured as a single wildcard ID (i.e. ``*``) is responsible for all IDs that are not covered by other shards. Only a single shard may be configured with a wildcard ID and, on that shard, there must be only one ID, namely the wildcard.

docs/deployment/configuration/performance.rst Outdated

    
            @@ -248,7 +306,7 @@ The Project and Domain Shard Strategies, denoted by "type: project" and "type: d
          
              Multi-Cluster mode

              ===================

              In our experience at Lyft, we saw that the Kubernetes cluster would have problems before FlytePropeller or FlyteAdmin would have impact. Thus Flyte supports adding multiple dataplane clusters by default. Each dataplane cluster, has one or more FlytePropellers running in them, and flyteadmin manages the routing and assigning of workloads to these clusters.

              If the K8s cluster itself becomes a performance bottleneck, Flyte supports adding multiple K8s dataplane clusters by default. Each dataplane cluster has one or more FlytePropellers running in them, and flyteadmin manages the routing and assigning of workloads to these clusters.

Contributor

neverett May 9, 2024

Suggested change

      
            If the K8s cluster itself becomes a performance bottleneck, Flyte supports adding multiple K8s dataplane clusters by default. Each dataplane cluster has one or more FlytePropellers running in them, and flyteadmin manages the routing and assigning of workloads to these clusters.
          
            If the K8s cluster itself becomes a performance bottleneck, Flyte supports adding multiple K8s dataplane clusters by default. Each dataplane cluster has one or more FlytePropellers running in it, and flyteadmin manages the routing and assigning of workloads to these clusters.

docs/deployment/configuration/performance.rst Outdated

    
            @@ -257,8 +315,8 @@ Improving etcd Performance
          
              Offloading Static Workflow Information from CRD

              -----------------------------------------------

              Flyte uses a k8s CRD (Custom Resource Definition) to store and track workflow executions. This resource includes the workflow definition, for example tasks and subworkflows that are involved and the dependencies between nodes, but also includes the execution status of the workflow. The latter information (ie. runtime status) is dynamic, meaning changes during the workflow's execution as nodes transition phases and the workflow execution progresses. However, the former information (ie. workflow definition) remains static, meaning it will never change and is only consulted to retrieve node definitions and workflow dependencies.

              Flyte uses a K8s CRD (Custom Resource Definition) to store and track workflow executions. This resource includes the workflow definition, for example tasks and subworkflows that are involved and the dependencies between nodes. It also includes the execution status of the workflow. The latter information (ie. runtime status) is dynamic, and changes during the workflow's execution as nodes transition phases and the workflow execution progresses. However, the former information (ie. workflow definition) remains static, meaning it will never change and is only consulted to retrieve node definitions and workflow dependencies.

Contributor

neverett May 9, 2024

Suggested change

      
            Flyte uses a K8s CRD (Custom Resource Definition) to store and track workflow executions. This resource includes the workflow definition, for example tasks and subworkflows that are involved and the dependencies between nodes. It also includes the execution status of the workflow. The latter information (ie. runtime status) is dynamic, and changes during the workflow's execution as nodes transition phases and the workflow execution progresses. However, the former information (ie. workflow definition) remains static, meaning it will never change and is only consulted to retrieve node definitions and workflow dependencies.
          
            Flyte uses a K8s CRD (Custom Resource Definition) to store and track workflow executions. This resource includes the workflow definition—for example, tasks and subworkflows that are involved, and the dependencies between nodes. It also includes the execution status of the workflow. The latter information (i.e. runtime status) is dynamic, and changes during the workflow's execution as nodes transition phases and the workflow execution progresses. However, the former information (i.e. workflow definition) remains static, meaning it will never change and is only consulted to retrieve node definitions and workflow dependencies.

docs/deployment/configuration/performance.rst Outdated

    
              CRDs are stored within etcd, a key-value datastore heavily used in kubernetes. Etcd requires a complete rewrite of the value data every time a single field changes. Consequently, the read / write performance of etcd, as with all key-value stores, is strongly correlated with the size of the data. In Flyte's case, to guarantee only-once execution of nodes we need to persist workflow state by updating the CRD at every node phase change. As the size of a workflow increases this means we are frequently rewriting a large CRD. In addition to poor read / write performance in etcd this update may be restricted by a hard limit on the overall CRD size.

              CRDs are stored within ``etcd``, which requires a complete rewrite of the value data every time a single field changes. Consequently, the read / write performance of ``etcd``, as with all key-value stores, is strongly correlated with the size of the data. In Flyte's case, to guarantee only-once execution of nodes we need to persist workflow state by updating the CRD at every node phase change. As the size of a workflow increases this means we are frequently rewriting a large CRD. In addition to poor read / write performance in ``etcd``, these updates may be restricted by a hard limit on the overall CRD size.

Contributor

neverett May 9, 2024

Suggested change

      
            CRDs are stored within ``etcd``, which requires a complete rewrite of the value data every time a single field changes. Consequently, the read / write performance of ``etcd``, as with all key-value stores, is strongly correlated with the size of the data. In Flyte's case, to guarantee only-once execution of nodes we need to persist workflow state by updating the CRD at every node phase change. As the size of a workflow increases this means we are frequently rewriting a large CRD. In addition to poor read / write performance in ``etcd``, these updates may be restricted by a hard limit on the overall CRD size.
          
            CRDs are stored within ``etcd``, which requires a complete rewrite of the value data every time a single field changes. Consequently, the read / write performance of ``etcd``, as with all key-value stores, is strongly correlated with the size of the data. In Flyte's case, to guarantee only-once execution of nodes, we need to persist workflow state by updating the CRD at every node phase change. As the size of a workflow increases, this means we are frequently rewriting a large CRD. In addition to poor read / write performance in ``etcd``, these updates may be restricted by a hard limit on the overall CRD size.

hamersaw reviewed

View reviewed changes

Contributor

hamersaw left a comment

looks accurate to me! thanks for cleaning this up - it feels much more useful.

docs/deployment/configuration/performance.rst Outdated

-              It is designed as a `Kubernetes Controller <https://kubernetes.io/docs/concepts/architecture/controller/>`_, where the desired state is specified as a FlyteWorkflow `Custom Resource <https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/>`_.
+              a. Every workflow execution is independent and can be performed by a completeley distinct process.
+              b. When a workflow definition is compiled, the resulting DAG structure is traversed by the controller and the goal is to gracefully transition each task to Success.
+              c. Node executions are performed by various FlytePlugins; a diverse collection of operations spanning Kubernetes and other remote services. FlytePropeller is only responsible for effectively monitoring and managing these executions.

Contributor

hamersaw May 9, 2024

I suspect we should be precise with our terminology here. Technically task executions use FlytePlugins. Node executions can be dynamics, subworkflows, gate nodes, array nodes, etc. A TaskNode is a node execution that then uses FlytePlugins.

docs/deployment/configuration/performance.rst Outdated

-                          timeout: 30s # Refers to timeout when talking with kubeapi server
+                    kube-client-config:
+                      qps: 100 # Refers to max rate of requests (queries per second) to kube-apiserver
+                      burst: 25 # refers to max burst rate.

Contributor

hamersaw May 10, 2024

burst should be >= qps. we did a little dive here into the code and realized we have been misconfiguring this for awhile.

Contributor Author

davidmirror-ops May 10, 2024

Uh good finding. I tried to rephrase this section, to the best of my knowledge. I think I'll need to also update the configuration reference page in a separate PR.

docs/deployment/configuration/performance.rst Outdated

    
              Sharded scale-out

              -------------------

              FlytePropeller Manager is a new component introduced as part of `this RFC <https://github.com/flyteorg/flyte/blob/master/rfc/system/1483-flytepropeller-horizontal-scaling.md>`_ to facilitate horizontal scaling of FlytePropeller through sharding. Effectively, the Manager is responsible for maintaining liveness and proper configuration over a collection of FlytePropeller instances. This scheme uses k8s label selectors to deterministically assign FlyteWorkflow CRD responsibilities to FlytePropeller instances, effectively distributing processing load over the shards.

              FlytePropeller Manager is a new component introduced to facilitate horizontal scaling of FlytePropeller through sharding. Effectively, the Manager is responsible for maintaining liveness and proper configuration over a collection of FlytePropeller instances. This scheme uses K8s label selectors to deterministically assign FlyteWorkflow CRD responsibilities to FlytePropeller instances, effectively distributing processing load over the shards.

Contributor

hamersaw May 10, 2024

+1

davidmirror-ops force-pushed the dx685-propeller-arch-docs branch 2 times, most recently from 367b641 to 917f986 Compare

May 15, 2024 21:47

davidmirror-ops added 19 commits

May 15, 2024 16:49


          Initial wf lifecycle description

2caa35e

Signed-off-by: davidmirror-ops <david.espejo@union.ai>


          Table format v1

c2ed4a2

Signed-off-by: davidmirror-ops <david.espejo@union.ai>


          Fix table and reorder sections

40c5c3d

Signed-off-by: davidmirror-ops <david.espejo@union.ai>


          Remove video reference

c5b6d49

Signed-off-by: davidmirror-ops <david.espejo@union.ai>


          Add section on kubeclient config

5b26da5

Signed-off-by: davidmirror-ops <david.espejo@union.ai>


          Table format for each section v2

0b12a78

Signed-off-by: davidmirror-ops <david.espejo@union.ai>


          Add diagram plus complete steps 2 and 3

c506f0e

Signed-off-by: davidmirror-ops <david.espejo@union.ai>


          Complete section 3

833d63f

Signed-off-by: davidmirror-ops <david.espejo@union.ai>


          Complete structure for all sections

9187ada

Signed-off-by: davidmirror-ops <david.espejo@union.ai>


          Intro ResourceVersion pt1

fe66073

Signed-off-by: davidmirror-ops <david.espejo@union.ai>


          Attempt to explain ResourceVersionCache 1

c64dac5

Signed-off-by: davidmirror-ops <david.espejo@union.ai>


          Fix format issues

b72224d

Signed-off-by: davidmirror-ops <david.espejo@union.ai>


          Fix rst issues

e628160

Signed-off-by: davidmirror-ops <david.espejo@union.ai>


          Fix grammar

5b14017

Signed-off-by: davidmirror-ops <david.espejo@union.ai>


          Fix table

a4bf8be

Signed-off-by: davidmirror-ops <david.espejo@union.ai>


          Fix table v2

7547d03

Signed-off-by: davidmirror-ops <david.espejo@union.ai>


          Fix table v3

75eade8

Signed-off-by: davidmirror-ops <david.espejo@union.ai>


          Fix random grammar

Signed-off-by: davidmirror-ops <david.espejo@union.ai>


          Apply reviews

98fda33

Signed-off-by: davidmirror-ops <david.espejo@union.ai>


          Minor fixes

9ae180f

Signed-off-by: davidmirror-ops <david.espejo@union.ai>

davidmirror-ops force-pushed the dx685-propeller-arch-docs branch from 917f986 to 9ae180f Compare

May 15, 2024 21:49

davidmirror-ops requested review from neverett and hamersaw

May 15, 2024 22:02


          Apply reviews v2

234c98f

Signed-off-by: davidmirror-ops <david.espejo@union.ai>

neverett approved these changes

View reviewed changes

davidmirror-ops merged commit 519080b into flyteorg:master

49 of 50 checks passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment