Issues: intel-analytics/ipex-llm
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Improve First Token Latency for multi-GPU projects (by flash attention or alternative)
#10897
opened Apr 26, 2024 by
moutainriver
RuntimeError: "fused_dropout" not implemented for 'Byte' when running trl ppo finetuning
#10854
opened Apr 23, 2024 by
Jasonzzt
GPU hang when switch between Llama2 and Llama3 on ARC770
user issue
#10852
opened Apr 23, 2024 by
moutainriver
Running 2 x A770 with Ollama, inference responses slow down dramatically
user issue
#10847
opened Apr 22, 2024 by
digitalscream
[MTL][Qwen] model fail RuntimeError: "normal_kernel_cpu" not implemented for 'Byte'
#10826
opened Apr 22, 2024 by
juan-OY
Unable to run llama_cpp example from quickstart guide (PI_ERROR_BUILD_PROGRAM_FAILURE)
user issue
#10819
opened Apr 20, 2024 by
tristan-k
chatglm3-6b with fp8, 1k input, 512 output, and batch 64 failed by all-in-one benchmark tool
user issue
#10818
opened Apr 20, 2024 by
Fred-cell
Flex 170 GPU ollama unbale to detect GPU and sycl-ls also not detecting it !!!
user issue
#10801
opened Apr 18, 2024 by
shailesh837
Ipex-llm older ollama serve hangs after 5 minutes Intel Arc GPU 770
user issue
#10800
opened Apr 18, 2024 by
shailesh837
Model output is different when using default optimize_model
user issue
#10782
opened Apr 17, 2024 by
vishnumadhu365
deepspeed_optimize_model_gpu Qwen/Qwen-7B-Chat
user issue
#10763
opened Apr 15, 2024 by
kevin-t-tang
Previous Next
ProTip!
Type g i on any issue or pull request to go back to the issue listing page.