Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add group_size 7 and fix compat with Yi 1.5 34b #246

Closed
wants to merge 3 commits into from

Conversation

Qubitium
Copy link
Contributor

@Qubitium Qubitium commented May 14, 2024

Allow users to compile for group_size 7 and have it compatible with Yi 1.0/1.5 34B models.

Fix #181

I did not modify the CI script to include group_size 7 by default as it would increase the compile time to even longer than it already is. Users that want Yi-34B compat can opt to compile only group_size 7 via env var FLASHINFER_GROUP_SIZES (fastest, maybe 10-15 minutes) or add 7 to the existing 1,4,6,8 for full compat with other model (super slow compile, measured in hours).

Yi 1.5 is a great model and this will help those engines (sglang) that uses flashinfer to deploy this model.

Tests

  • PASS Load and run with Sglang + Yi 1.5 34B
  • FAILED stability test with temp=0.7. Output contains nan/inf.

@xuzhenqi
Copy link
Contributor

@Qubitium Do you test correctness of Yi-model? I tried this method before, and found BatchPrefill kernel does not return correct outputs. I also fixed BatchPrefill kernel in #223 .

@Qubitium
Copy link
Contributor Author

Qubitium commented May 15, 2024

@Qubitium Do you test correctness of Yi-model? I tried this method before, and found BatchPrefill kernel does not return correct outputs. I also fixed BatchPrefill kernel in #223 .

@xuzhenqi Going to do some expanded human eval using this PR later today and will let you know the results. Btw, #223 looks great and appears to be more generic/mid-term solution to this group size issue until dynamic group-size is implemented.

@Qubitium
Copy link
Contributor Author

@xuzhenqi You are right. There are output instability when we use temperature=0.7 coupled with sglang:

Traceback (most recent call last):
  File "/root/miniconda3/lib/python3.11/site-packages/sglang/srt/managers/router/model.py", line 187, in exposed_step
    self.forward_step()
  File "/root/miniconda3/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.11/site-packages/sglang/srt/managers/router/model.py", line 202, in forward_step
    self.forward_fill_batch(new_batch)
  File "/root/miniconda3/lib/python3.11/site-packages/sglang/srt/managers/router/model.py", line 424, in forward_fill_batch
    next_token_ids, _ = batch.sample(logits)
                        ^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.11/site-packages/sglang/srt/managers/router/infer_batch.py", line 552, in sample
    sampled_index = torch.multinomial(probs_sort, num_samples=1)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

Now we will compile and test your #223 PR instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] model Yi-34B compat
2 participants