Any specific optimization did in kserve to support LLM inference? #3623

Jeffwan · 2024-04-22T20:46:17Z

Hi community,

I am wondering any specific optimization did in kserve to support LLM applications? Is there a feature list?

Jeffwan · 2024-04-25T05:35:12Z

/kind question

yuzisun · 2024-05-04T13:46:04Z

Hi @Jeffwan ! KServe's position is still on LLM inference orchestration and serverless autoscaling, so we are focusing on supporting OpenAI protocol by integrating optimized LLM serving runtime like vLLM/ Tensorrt-LLM/TGI, improving LLM container cold startup time and enabling autoscaling based on custom metrics like the number of input/output tokens. We will update our roadmap with a list features we have planned.

oss-prow-bot bot added the kind/question label Apr 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Any specific optimization did in kserve to support LLM inference? #3623

Any specific optimization did in kserve to support LLM inference? #3623

Jeffwan commented Apr 22, 2024

Jeffwan commented Apr 25, 2024

yuzisun commented May 4, 2024 •

edited

Any specific optimization did in kserve to support LLM inference? #3623

Any specific optimization did in kserve to support LLM inference? #3623

Comments

Jeffwan commented Apr 22, 2024

Jeffwan commented Apr 25, 2024

yuzisun commented May 4, 2024 • edited

yuzisun commented May 4, 2024 •

edited