Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LFX Mentorship (Jun-Aug, 2024): Create a search-enabled API server for local LLMs #3372

Open
juntao opened this issue Apr 28, 2024 · 11 comments
Labels
enhancement New feature or request LFX Mentorship Tasks for LFX Mentorship participants

Comments

@juntao
Copy link
Member

juntao commented Apr 28, 2024

Summary

WasmEdge is a lightweight inference runtime for AI and LLM applications. The LlamaEdge project has developed an OpenAI-compatible API server and a server-side RAG app based on WasmEdge.

In this project, we aim to use the LlamaEdge components to build a new API server that incorporates real-time Internet search results into LLM answers.

Details

We will build a search-enabled API server that is similar to the rag-api-server but with the following unique features:

  • It will use the user question as the query to search Google first.
  • The top search results are then added to the LLM conversation context as the system_prompt. It is the same way as vector search results are added to the context in the rag-api-server.
  • Generate and return the server response.

The project will be written in Rust and compiled to WebAssembly to run inside WasmEdge. Qualified mentees need to demonstrate that they can make changes to the rag-api-server, and then build and deploy the updated API server.

LFX

Expected outcome: An OpenAI-compatible local LLM API server that uses Google Search for supplemental context

Recommended skills:

  • Rust language
  • LlamaEdge

Mentor:

Application link: https://mentorship.lfx.linuxfoundation.org/project/0a4e08a1-3404-46fc-b0d0-5117ec4ec119

@juntao juntao added the enhancement New feature or request label Apr 28, 2024
@juntao juntao changed the title feat: Create an LLM agent for Rust code QA feat: Create a search-enabled API server for local LLMs Apr 28, 2024
@juntao juntao added the LFX Mentorship Tasks for LFX Mentorship participants label Apr 28, 2024
@suryyyansh
Copy link

suryyyansh commented Apr 29, 2024

Hey juntao, I'm interested in this Mentorship. When will the pretest and application details be announced? Thanks.

@juntao
Copy link
Member Author

juntao commented Apr 30, 2024

@suryyyansh Thank you so much for your interest. The application has not officially started. If you would like to get a headstart, I would ask you to build the rag-api-server yourself and find something (code or docs) that you can improve upon.

@hydai hydai changed the title feat: Create a search-enabled API server for local LLMs LFX Mentorship (Jun-Aug, 2024): Create a search-enabled API server for local LLMs May 2, 2024
@harsh-ps-2003
Copy link

harsh-ps-2003 commented May 6, 2024

@juntao I was able to build the rag-api-server, but there are no issues to solve in the repository, and the code and docs seem good to me! Can you suggest some improvements that you want to prove my competency for the LFX Mentorship? Thanks

@angad-singhh
Copy link

Hey @juntao,
As part of exploring the project, I was able setup and run the Rag-API-Server and its endpoints, run the Web-UI Chat application by llama Edge, create custom knowledge base-embedding, setup the Qdrant vector database.

Till now I have implemented and tested the endpoints with three models, was able to run these without any error.

  • Initially when I started exploring the WasmEdge repo, LFX projects, Flows Repo, Llama Edge repo, I was getting some troubles and issues, but after setting up multiple sample projects it became comfortable.

I have shared below the sample of my setups, and what outputs I got: (below comments)

@angad-singhh
Copy link

RAG-with different LLM:

In this, instead of using the default LLM model mentioned in docs Llama-3-8B-Instruct-GGUF, I used the Phi-3-mini-4k-instruct and All-MiniLM-L6-v2-Embedding-GGU for embedding. Created the Qdrant instance running and used the default paris.txt as knowledge base for vector database.

  • Retrieving the context from knowledge base based on prompt
Screenshot 2024-05-19 183347
  • RAG context is then merged into the systems message
Screenshot 2024-05-19 183406
  • Finally output is generated based on RAG context + User Prompt
Screenshot 2024-05-19 183433

@angad-singhh
Copy link

RAG with different LLM and Custom Embedded-Knowledge Base

In this, I changed the LLM model to a smaller model : TinyLlama-1.1B-Chat-v1.0 and created a small custom knowledge-base-embedding, on the topic Microsoft microsoft.txt.

Here are the outputs:

  • Custom Knowledge base for embedding
Screenshot 2024-05-19 182840
  • Retrieving the context from knowledge base based on prompt
Screenshot 2024-05-19 172947
  • RAG context is then merged into the systems message
Screenshot 2024-05-19 173108
  • Finally output is generated based on RAG context + User Prompt
Screenshot 2024-05-19 182750

@juntao
Copy link
Member Author

juntao commented May 20, 2024

@juntao I was able to build the rag-api-server, but there are no issues to solve in the repository, and the code and docs seem good to me! Can you suggest some improvements that you want to prove my competency for the LFX Mentorship? Thanks

Perhaps you could create a video recording to show how it works? Thanks. @harsh-ps-2003

@suryyyansh
Copy link

Hey @juntao I had my PR merged on this issue, and have implemented functionality to assert that the qdrant service is currently active when running the server on my local fork.

I'm currently working on implementing an interface to work with different search APIs (You API, Tavily, etc..) in a similar vein to the plugin system for wasi-nn.

Is there anything else I should be doing? please let me know

@DhruvSinghiitmandi
Copy link

DhruvSinghiitmandi commented May 20, 2024

Hey @juntao
I am Dhruv Singh , pursuing my bachelors at IIT Mandi in Computer Science.
I have some experience with LLMs(finetuning ) and building backend for RAG based Application ( multimodal RAG based chatbot using LLava and custom LLama2 finetuned on custom dataset) . I'm interested in this project, currently I have built the code successfully and run locally. Initially i faced some issue in getting the server to run as qdrant service was not running (running the qdrant service was not mentioned in the readme as i had built the code from an old fork that i made ). I have already begin integrating the server with tavily API for now . I have made a different endpoint for retrieval through web. Currently I'm facing some issues in the build processs but I'm actively working on it. Is there anything more I should be doing now?

@suryyyansh
Copy link

suryyyansh commented May 22, 2024

I've managed to implement a search retrieval endpoint that is API agnostic:
image

I will now be moving onto implementing search as a system prompt in chat.

@suryyyansh
Copy link

suryyyansh commented May 22, 2024

@juntao I've finished implementing the search-enabled rag-api-server, It's available here. The github actions don't pass because I didn't compile against clippy (It still compiles fine without warnings with cargo build --target wasm32-wasi).

Here is an example of the search results being fetched:
search_results

And here, chat completion uses the search results as context:
curl_rag_search_example

My implementation relies on a new trait called Query. New APIs can be introduced by implementing the search function provided by this trait. For demonstration purposes, the use of Tavily has been hardcoded, but can be changed to be decided at runtime.

Users can invoke the search functionality either:

  1. manually, prepending their message with "[SEARCH]", or
  2. dynamically, by asking a question with no suitable qdrant embeddings.

If possible, please also review my PR to add a basic existence check for qdrant here

Although this implementation is quite rudimentary, I hope it's enough to prove my competence for this mentorship!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request LFX Mentorship Tasks for LFX Mentorship participants
Projects
None yet
Development

No branches or pull requests

5 participants