Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AssertionError: zero stage 1 requires an optimizer #987

Open
yonglianglan opened this issue Jul 4, 2023 · 3 comments
Open

AssertionError: zero stage 1 requires an optimizer #987

yonglianglan opened this issue Jul 4, 2023 · 3 comments
Labels
bug Something isn't working good first issue Good for newcomers help wanted This issue needs assistance

Comments

@yonglianglan
Copy link

An error occurred when using the evaluation code。
the command is:
python ./deepy.py evaluate.py xxxx.yml --eval_tasks piqa

image

The training mode is multi-machine mode and the evaluation mode is single-machine mode.

Has anyone had a similar issue? thanks!

@yonglianglan yonglianglan added the bug Something isn't working label Jul 4, 2023
@StellaAthena
Copy link
Member

This is a known issue that is awkward to handle. Our current recommendation is to set ZeRO stage 0 when calling the evaluation script. We are working on integrating DeepSpeed Inference which will solve this issue and substantially accelerate inference tasks as well.

@StellaAthena StellaAthena added good first issue Good for newcomers help wanted This issue needs assistance labels Jul 31, 2023
@vsabavat
Copy link

vsabavat commented Nov 14, 2023

Is this bug resolved? How do we pass the or set the ZeRo stage 1? I also see the same error during inference.

python ./deepy.py generate.py -d configs 125M local_setup text_generation

  File "generate.py", line 91, in <module>
    main()
  File "generate.py", line 33, in main
    model, neox_args = setup_for_inference_or_eval(use_cache=True)
  File "/localhome/local-vsabavat/ai/training/gpt-neox/megatron/utils.py", line 448, in setup_for_inference_or_eval
    model, _, _ = setup_model_and_optimizer(
  File "/localhome/local-vsabavat/ai/training/gpt-neox/megatron/training.py", line 647, in setup_model_and_optimizer
    model, optimizer, _, lr_scheduler = deepspeed.initialize(
  File "/localhome/local-vsabavat/.local/lib/python3.8/site-packages/deepspeed/__init__.py", line 186, in initialize
    engine = PipelineEngine(args=args,
  File "/localhome/local-vsabavat/.local/lib/python3.8/site-packages/deepspeed/runtime/pipe/engine.py", line 68, in __init__
    super().__init__(*super_args, **super_kwargs)
  File "/localhome/local-vsabavat/.local/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 309, in __init__
    self.optimizer = self._configure_zero_optimizer(optimizer=None)
  File "/localhome/local-vsabavat/.local/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1468, in _configure_zero_optimizer
    assert not isinstance(optimizer, DummyOptim), "zero stage {} requires an optimizer".format(zero_stage)
AssertionError: zero stage 1 requires an optimizer```

@AIproj
Copy link
Contributor

AIproj commented Nov 27, 2023

@vsabavat In one of your yml config files you should have something that looks like

  "zero_optimization": {
    "stage": 1,
    "allgather_partitions": true,
    "allgather_bucket_size": 1260000000,
    "overlap_comm": true,
    "reduce_scatter": true,
    "reduce_bucket_size": 1260000000,
    "contiguous_gradients": true,
    "cpu_offload": false
  },

In my example, the stage is set to 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers help wanted This issue needs assistance
Projects
None yet
Development

No branches or pull requests

4 participants