Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add workflows to verify if examples are valid #2014

Open
tenzen-y opened this issue Mar 8, 2024 · 4 comments
Open

Add workflows to verify if examples are valid #2014

tenzen-y opened this issue Mar 8, 2024 · 4 comments

Comments

@tenzen-y
Copy link
Member

tenzen-y commented Mar 8, 2024

We have many examples, and these allow users to understand easily how to perform TrainingJobs.
However, we don't have any verifications if the examples are valid. So, I would propose that we add CI workflows to verify that examples are working.

Katib workflows would be good examples to implement in the training-operator: https://github.com/kubeflow/katib/blob/master/.github/workflows/e2e-test-pytorch-mnist.yaml

/good-first-issue

Copy link

@tenzen-y:
This request has been marked as suitable for new contributors.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-good-first-issue command.

In response to this:

We have many examples, and these allow users to understand easily how to perform TrainingJobs.
However, we don't have any verifications if the examples are valid. So, I would propose that we add CI workflows to verify that examples are working.

Katib workflows would be good examples to implement in the training-operator: https://github.com/kubeflow/katib/blob/master/.github/workflows/e2e-test-pytorch-mnist.yaml

/good-first-issue

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@shivas1516
Copy link

shivas1516 commented Mar 10, 2024

I'd like to work on this GitHub Action for the training operator examples issue. It matches my difficulty level.
Any guidance you can provide would be greatly appreciated and will help me proceed forward faster.

/assign

@shivas1516
Copy link

@tenzen-y Are adding e2e tests in workflow necessary for verifying Training Operator examples like in Katib? Can you provide some additional information to this. it helps me to solve this issue

@tenzen-y
Copy link
Member Author

@tenzen-y Are adding e2e tests in workflow necessary for verifying Training Operator examples like in Katib? Can you provide some additional information to this. it helps me to solve this issue

We need to implement the following steps in the script:

  1. Build example and operator images
  2. Start KinD cluster
  3. Load built images into the cluster
  4. Set up the TrainingOperator
  5. Create a Job with built images
  6. Verify if a created Job succeeded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants