-
Notifications
You must be signed in to change notification settings - Fork 715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LFX Mentorship (Jun-Aug, 2024): Create a search-enabled API server for local LLMs #3372
Comments
Hey juntao, I'm interested in this Mentorship. When will the pretest and application details be announced? Thanks. |
@suryyyansh Thank you so much for your interest. The application has not officially started. If you would like to get a headstart, I would ask you to build the rag-api-server yourself and find something (code or docs) that you can improve upon. |
@juntao I was able to build the rag-api-server, but there are no issues to solve in the repository, and the code and docs seem good to me! Can you suggest some improvements that you want to prove my competency for the LFX Mentorship? Thanks |
Hey @juntao, Till now I have implemented and tested the endpoints with three models, was able to run these without any error.
I have shared below the sample of my setups, and what outputs I got: (below comments) |
RAG-with different LLM:In this, instead of using the default LLM model mentioned in docs Llama-3-8B-Instruct-GGUF, I used the Phi-3-mini-4k-instruct and All-MiniLM-L6-v2-Embedding-GGU for embedding. Created the Qdrant instance running and used the default
|
RAG with different LLM and Custom Embedded-Knowledge BaseIn this, I changed the LLM model to a smaller model : TinyLlama-1.1B-Chat-v1.0 and created a small custom knowledge-base-embedding, on the topic Microsoft Here are the outputs:
|
Perhaps you could create a video recording to show how it works? Thanks. @harsh-ps-2003 |
Hey @juntao I had my PR merged on this issue, and have implemented functionality to assert that the qdrant service is currently active when running the server on my local fork. I'm currently working on implementing an interface to work with different search APIs (You API, Tavily, etc..) in a similar vein to the plugin system for wasi-nn. Is there anything else I should be doing? please let me know |
Hey @juntao |
@juntao I've finished implementing the search-enabled rag-api-server, It's available here. The github actions don't pass because I didn't compile against clippy (It still compiles fine without warnings with Here is an example of the search results being fetched: And here, chat completion uses the search results as context: My implementation relies on a new trait called Query. New APIs can be introduced by implementing the search function provided by this trait. For demonstration purposes, the use of Tavily has been hardcoded, but can be changed to be decided at runtime. Users can invoke the search functionality either:
If possible, please also review my PR to add a basic existence check for qdrant here Although this implementation is quite rudimentary, I hope it's enough to prove my competence for this mentorship! |
Summary
WasmEdge is a lightweight inference runtime for AI and LLM applications. The LlamaEdge project has developed an OpenAI-compatible API server and a server-side RAG app based on WasmEdge.
In this project, we aim to use the LlamaEdge components to build a new API server that incorporates real-time Internet search results into LLM answers.
Details
We will build a search-enabled API server that is similar to the rag-api-server but with the following unique features:
system_prompt
. It is the same way as vector search results are added to the context in the rag-api-server.The project will be written in Rust and compiled to WebAssembly to run inside WasmEdge. Qualified mentees need to demonstrate that they can make changes to the rag-api-server, and then build and deploy the updated API server.
LFX
Expected outcome: An OpenAI-compatible local LLM API server that uses Google Search for supplemental context
Recommended skills:
Mentor:
Application link: https://mentorship.lfx.linuxfoundation.org/project/0a4e08a1-3404-46fc-b0d0-5117ec4ec119
The text was updated successfully, but these errors were encountered: