Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[k8s] Ambiguity when GPU labels overlap with an existing accelerator #3562

Open
romilbhardwaj opened this issue May 17, 2024 · 0 comments
Open
Labels
k8s Kubernetes related items

Comments

@romilbhardwaj
Copy link
Collaborator

User reported on a cluster manually labeled with GPUs as NVIDIA-RTX-A6000:

SKYPILOT_DEBUG=1 sky launch --cloud kubernetes --gpus a6000:1 ./mistral.yaml
D 05-17 10:20:50 skypilot_config.py:136] Using config path: /home/amgmt/.sky/config.yaml
D 05-17 10:20:50 skypilot_config.py:140] Config loaded:
D 05-17 10:20:50 skypilot_config.py:140] {'kubernetes': {'ports': 'loadbalancer'}}
D 05-17 10:20:50 skypilot_config.py:150] Config syntax check passed.
Task from YAML spec: ./mistral.yaml
Traceback (most recent call last):
  File "/usr/local/bin/sky", line 8, in <module>
    sys.exit(cli())
  File "/usr/lib/python3/dist-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.8/dist-packages/sky/utils/common_utils.py", line 350, in _record
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/sky/cli.py", line 1198, in invoke
    return super().invoke(ctx)
  File "/usr/lib/python3/dist-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3/dist-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3/dist-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/sky/utils/common_utils.py", line 371, in _record
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/sky/cli.py", line 1421, in launch
    task_or_dag = _make_task_or_dag_from_entrypoint_with_overrides(
  File "/usr/local/lib/python3.8/dist-packages/sky/cli.py", line 1177, in _make_task_or_dag_from_entrypoint_with_overrides
    task.set_resources_override(override_params)
  File "/usr/local/lib/python3.8/dist-packages/sky/task.py", line 626, in set_resources_override
    new_resources = res.copy(**override_params)
  File "/usr/local/lib/python3.8/dist-packages/sky/resources.py", line 1053, in copy
    resources = Resources(
  File "/usr/local/lib/python3.8/dist-packages/sky/resources.py", line 201, in __init__
    self._set_accelerators(accelerators, accelerator_args)
  File "/usr/local/lib/python3.8/dist-packages/sky/resources.py", line 506, in _set_accelerators
    accelerators = {
  File "/usr/local/lib/python3.8/dist-packages/sky/resources.py", line 507, in <dictcomp>
    accelerator_registry.canonicalize_accelerator_name(
  File "/usr/local/lib/python3.8/dist-packages/sky/utils/accelerator_registry.py", line 117, in canonicalize_accelerator_name
    raise ValueError(f'Accelerator name {accelerator!r} is ambiguous. '
ValueError: Accelerator name 'a6000' is ambiguous. Please choose one of ['A6000', 'NVIDIA-RTX-A6000'].

We probably need to detect this case and hint to users to use canonical names for labelling if they have done it manually.

@romilbhardwaj romilbhardwaj added the k8s Kubernetes related items label May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
k8s Kubernetes related items
Projects
None yet
Development

No branches or pull requests

1 participant