microsoft / DeepSpeed-MII Public

Notifications You must be signed in to change notification settings
Fork 160
Star 1.7k

Code
Issues 167
Pull requests 17
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: microsoft/DeepSpeed-MII

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

167 Open 106 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

Support LLava next stronger

#483 opened May 20, 2024 by thesby

How can I use the same prompt to produce the same text output as vllm

#482 opened May 19, 2024 by Greatpanc

Tf32 support

#481 opened May 17, 2024 by Chasapas

DeepSpeed-MII 能加载量化的int4或者int8的模型吗？

#479 opened May 16, 2024 by wangyongpenga

Does deepspeed-mii support prefix_allowed_tokens_fn?

#477 opened May 14, 2024 by zcakzhuu

[REQUEST] LLAMA-3 support

#475 opened May 13, 2024 by MRYingLEE

[REQUEST] Mixtral-8x22B support

#474 opened May 11, 2024 by y-live-koba

Cannot run Yi-34B-Chat => ValueError: Unsupported q_ratio: 7

#472 opened May 9, 2024 by joeking11829

BUG in run_batch_processing

#471 opened May 8, 2024 by zhihui96

ValueError: Unsupported model type phi3

#469 opened Apr 30, 2024 by abpani

error when using Qwen1.5-32B

#468 opened Apr 29, 2024 by puppet101

Performance with vllm

#467 opened Apr 26, 2024 by littletomatodonkey

Only running one replica even though setting many replicas

#465 opened Apr 24, 2024 by thesby

RuntimeError: The server socket has failed to listen on any local network address

#464 opened Apr 19, 2024 by thesby

[FEATURE] Access to logits and final hidden layer

#463 opened Apr 17, 2024 by lshamis

How is the prompt segmentation specifically implemented for Dynamic SplitFuse? Is there any code implement or code snippet ？

#462 opened Apr 17, 2024 by wenyangchou

How do I launch the api on a graphics card other than cuda: 0

#460 opened Apr 15, 2024 by Stark-zheng

Is openai compatible server still working?

#459 opened Apr 15, 2024 by RobinQu

how can I use deepspeed to split the model to submit GPU?

#458 opened Apr 9, 2024 by WanBenLe

[FEATURE REQUEST] Add Support for Qwen1.5-MoE Architecture in DeepSpeed-MII

#457 opened Apr 4, 2024 by freQuensy23-coder

Add support for DBRX

#455 opened Apr 2, 2024 by azaccor

Any plans for produnction-ready services?

#454 opened Apr 2, 2024 by SeungminHeo

Limit VRAM usage in serving the model

#453 opened Mar 31, 2024 by risedangel

inference_core_ops.so: undefined symbol: _Z19cuda_wf6af16_linearRN2at6TensorES1_S1_S1_S1_S1_iiii

#452 opened Mar 31, 2024 by Andronixs

How can i use this library with langchain or llama_index?

#450 opened Mar 31, 2024 by risedangel

Previous 1 2 3 4 5 6 7 Next

Previous Next

ProTip! Find all open issues with in progress development work with linked:pr.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly