LFX Mentorship (Jun-Aug, 2024): Create a search-enabled API server for local LLMs #3372

juntao · 2024-04-28T10:35:05Z

Summary

WasmEdge is a lightweight inference runtime for AI and LLM applications. The LlamaEdge project has developed an OpenAI-compatible API server and a server-side RAG app based on WasmEdge.

In this project, we aim to use the LlamaEdge components to build a new API server that incorporates real-time Internet search results into LLM answers.

Details

We will build a search-enabled API server that is similar to the rag-api-server but with the following unique features:

It will use the user question as the query to search Google first.
The top search results are then added to the LLM conversation context as the system_prompt. It is the same way as vector search results are added to the context in the rag-api-server.
Generate and return the server response.

The project will be written in Rust and compiled to WebAssembly to run inside WasmEdge. Qualified mentees need to demonstrate that they can make changes to the rag-api-server, and then build and deploy the updated API server.

LFX

Expected outcome: An OpenAI-compatible local LLM API server that uses Google Search for supplemental context

Recommended skills:

Rust language
LlamaEdge

Mentor:

Michael Yuan @juntao michael@secondstate.io

Application link: https://mentorship.lfx.linuxfoundation.org/project/0a4e08a1-3404-46fc-b0d0-5117ec4ec119

The text was updated successfully, but these errors were encountered:

suryyyansh · 2024-04-29T10:23:15Z

Hey juntao, I'm interested in this Mentorship. When will the pretest and application details be announced? Thanks.

juntao · 2024-04-30T05:58:09Z

@suryyyansh Thank you so much for your interest. The application has not officially started. If you would like to get a headstart, I would ask you to build the rag-api-server yourself and find something (code or docs) that you can improve upon.

harsh-ps-2003 · 2024-05-06T12:24:58Z

@juntao I was able to build the rag-api-server, but there are no issues to solve in the repository, and the code and docs seem good to me! Can you suggest some improvements that you want to prove my competency for the LFX Mentorship? Thanks

angad-singhh · 2024-05-19T13:02:39Z

Hey @juntao,
As part of exploring the project, I was able setup and run the Rag-API-Server and its endpoints, run the Web-UI Chat application by llama Edge, create custom knowledge base-embedding, setup the Qdrant vector database.

Till now I have implemented and tested the endpoints with three models, was able to run these without any error.

Initially when I started exploring the WasmEdge repo, LFX projects, Flows Repo, Llama Edge repo, I was getting some troubles and issues, but after setting up multiple sample projects it became comfortable.

I have shared below the sample of my setups, and what outputs I got: (below comments)

angad-singhh · 2024-05-19T13:06:10Z

RAG-with different LLM:

In this, instead of using the default LLM model mentioned in docs Llama-3-8B-Instruct-GGUF, I used the Phi-3-mini-4k-instruct and All-MiniLM-L6-v2-Embedding-GGU for embedding. Created the Qdrant instance running and used the default paris.txt as knowledge base for vector database.

Retrieving the context from knowledge base based on prompt

RAG context is then merged into the systems message

Finally output is generated based on RAG context + User Prompt

angad-singhh · 2024-05-19T13:06:51Z

RAG with different LLM and Custom Embedded-Knowledge Base

In this, I changed the LLM model to a smaller model : TinyLlama-1.1B-Chat-v1.0 and created a small custom knowledge-base-embedding, on the topic Microsoft microsoft.txt.

Here are the outputs:

Custom Knowledge base for embedding

Retrieving the context from knowledge base based on prompt

RAG context is then merged into the systems message

Finally output is generated based on RAG context + User Prompt

juntao · 2024-05-20T06:42:35Z

@juntao I was able to build the rag-api-server, but there are no issues to solve in the repository, and the code and docs seem good to me! Can you suggest some improvements that you want to prove my competency for the LFX Mentorship? Thanks

Perhaps you could create a video recording to show how it works? Thanks. @harsh-ps-2003

suryyyansh · 2024-05-20T07:13:14Z

Hey @juntao I had my PR merged on this issue, and have implemented functionality to assert that the qdrant service is currently active when running the server on my local fork.

I'm currently working on implementing an interface to work with different search APIs (You API, Tavily, etc..) in a similar vein to the plugin system for wasi-nn.

Is there anything else I should be doing? please let me know

DhruvSinghiitmandi · 2024-05-20T16:11:36Z

Hey @juntao
I am Dhruv Singh , pursuing my bachelors at IIT Mandi in Computer Science.
I have some experience with LLMs(finetuning ) and building backend for RAG based Application ( multimodal RAG based chatbot using LLava and custom LLama2 finetuned on custom dataset) . I'm interested in this project, currently I have built the code successfully and run locally. Initially i faced some issue in getting the server to run as qdrant service was not running (running the qdrant service was not mentioned in the readme as i had built the code from an old fork that i made ). I have already begin integrating the server with tavily API for now . I have made a different endpoint for retrieval through web. Currently I'm facing some issues in the build processs but I'm actively working on it. Is there anything more I should be doing now?

suryyyansh · 2024-05-22T12:54:14Z

I've managed to implement a search retrieval endpoint that is API agnostic:

I will now be moving onto implementing search as a system prompt in chat.

suryyyansh · 2024-05-22T21:50:58Z

@juntao I've finished implementing the search-enabled rag-api-server, It's available here. The github actions don't pass because I didn't compile against clippy (It still compiles fine without warnings with cargo build --target wasm32-wasi).

Here is an example of the search results being fetched:

And here, chat completion uses the search results as context:

My implementation relies on a new trait called Query. New APIs can be introduced by implementing the search function provided by this trait. For demonstration purposes, the use of Tavily has been hardcoded, but can be changed to be decided at runtime.

Users can invoke the search functionality either:

manually, prepending their message with "[SEARCH]", or
dynamically, by asking a question with no suitable qdrant embeddings.

If possible, please also review my PR to add a basic existence check for qdrant here

Although this implementation is quite rudimentary, I hope it's enough to prove my competence for this mentorship!

juntao added the enhancement New feature or request label Apr 28, 2024

juntao changed the title ~~feat: Create an LLM agent for Rust code QA~~ feat: Create a search-enabled API server for local LLMs Apr 28, 2024

juntao added the LFX Mentorship Tasks for LFX Mentorship participants label Apr 28, 2024

hydai changed the title ~~feat: Create a search-enabled API server for local LLMs~~ LFX Mentorship (Jun-Aug, 2024): Create a search-enabled API server for local LLMs May 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LFX Mentorship (Jun-Aug, 2024): Create a search-enabled API server for local LLMs #3372

LFX Mentorship (Jun-Aug, 2024): Create a search-enabled API server for local LLMs #3372

juntao commented Apr 28, 2024 •

edited by alabulei1

suryyyansh commented Apr 29, 2024 •

edited

juntao commented Apr 30, 2024

harsh-ps-2003 commented May 6, 2024 •

edited

angad-singhh commented May 19, 2024

angad-singhh commented May 19, 2024

angad-singhh commented May 19, 2024

juntao commented May 20, 2024

suryyyansh commented May 20, 2024

DhruvSinghiitmandi commented May 20, 2024 •

edited

suryyyansh commented May 22, 2024 •

edited

suryyyansh commented May 22, 2024 •

edited

LFX Mentorship (Jun-Aug, 2024): Create a search-enabled API server for local LLMs #3372

LFX Mentorship (Jun-Aug, 2024): Create a search-enabled API server for local LLMs #3372

Comments

juntao commented Apr 28, 2024 • edited by alabulei1

Summary

Details

LFX

suryyyansh commented Apr 29, 2024 • edited

juntao commented Apr 30, 2024

harsh-ps-2003 commented May 6, 2024 • edited

angad-singhh commented May 19, 2024

angad-singhh commented May 19, 2024

RAG-with different LLM:

angad-singhh commented May 19, 2024

RAG with different LLM and Custom Embedded-Knowledge Base

juntao commented May 20, 2024

suryyyansh commented May 20, 2024

DhruvSinghiitmandi commented May 20, 2024 • edited

suryyyansh commented May 22, 2024 • edited

suryyyansh commented May 22, 2024 • edited

juntao commented Apr 28, 2024 •

edited by alabulei1

suryyyansh commented Apr 29, 2024 •

edited

harsh-ps-2003 commented May 6, 2024 •

edited

DhruvSinghiitmandi commented May 20, 2024 •

edited

suryyyansh commented May 22, 2024 •

edited

suryyyansh commented May 22, 2024 •

edited