You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
AnythingLLM enables the embedding of documents and web URLs, cuts them into chunks and stores them in a vector database.
AnythingLLM uses the VectorDB to match the chunks that have the highest semantic similarity to your query. Then it adds those as context to the prompt that is sent to the LLM running on Jan. You can 'pin' a particular document to paste that into the context in its entirety. How well this pinning works depends on how well the model that you use can handle large contexts.
Tip
Mistral 7B instruct v0.2 can handle contexts up to 32k tokens without having a 'lost-in-the-middle' problem. This makes it a good candidate when you want to run a model locally for a RAG use-case. Another good option is a large context flavour of Llama 3. There's 32k, 64k, and even 262k context versions. Make sure you have enough (V)RAM though!
On top of threads, they've added the concept of workspaces. Per workspace you can embed sets of documents that make sense together. This way you can have separate workspaces for asking questions about different topics.
This enhances Jan to enable more advanced RAG applications; Jan can currently only attach one document to a thread at a time.
Setup
Jan - local API
Make sure a model is installed and chatting with it works in Jan.
Click <> in the bottom left corner.
You can use the default settings on the left, which will expose the local Jan API at http://127.0.0.1:1337.
On the right pick your model of choice.
Click the big blue "Start Server" button.
AnythingLLM
Go to settings (bottom left spanner icon) > LLM Preference | or during the initial setup wizard do the below when asked to pick an LLM.
Pick 'Generic OpenAI' from the providers. Jan's API is OpenAI compliant.
In the URL field enter http://<IP>:<port>/v1 , if you used the defaults that would be http://127.0.0.1:1337/v1.
Skip the API key
Then enter the model ID of the model you have started. You can find this by going back to Jan, clicking the ⋮ icon next to "model settings" > show files in explorer > open model.json in a text editor > look for the value of "id". This looks something like llama-3-8b-instruct-32k-q8.
Enter the token context window matching the 'Context Length' set in Jan.
Save the changes in AnythingLLM with the button at the top right and exit the settings.
Setting up a workspace
Create a new workspace and give it a nice name.
Click the cogwheel (⚙️) to adjust the workspace preferences.
Go to tab 'Chat Settings' and set your preferred temperature,
For the prompt you can leave it as-is or design your own. What works best depends on the model and your use-case.
Your first RAG
Click the upload icon ↥ next to the cogwheel (⚙️).
Here you can upload files or even fetch a website.
AnythingLLM
AnythingLLM enables the embedding of documents and web URLs, cuts them into chunks and stores them in a vector database.
AnythingLLM uses the VectorDB to match the chunks that have the highest semantic similarity to your query. Then it adds those as context to the prompt that is sent to the LLM running on Jan. You can 'pin' a particular document to paste that into the context in its entirety. How well this pinning works depends on how well the model that you use can handle large contexts.
Tip
Mistral 7B instruct v0.2 can handle contexts up to 32k tokens without having a 'lost-in-the-middle' problem. This makes it a good candidate when you want to run a model locally for a RAG use-case. Another good option is a large context flavour of Llama 3. There's 32k, 64k, and even 262k context versions. Make sure you have enough (V)RAM though!
On top of threads, they've added the concept of workspaces. Per workspace you can embed sets of documents that make sense together. This way you can have separate workspaces for asking questions about different topics.
This enhances Jan to enable more advanced RAG applications; Jan can currently only attach one document to a thread at a time.
Setup
Jan - local API
<>
in the bottom left corner.AnythingLLM
http://<IP>:<port>/v1
, if you used the defaults that would behttp://127.0.0.1:1337/v1
."id"
. This looks something likellama-3-8b-instruct-32k-q8
.Setting up a workspace
Your first RAG
↥
next to the cogwheel (⚙️).This is the principle. You can do the same with text files, audio, csv's, spreadsheets,....
The text was updated successfully, but these errors were encountered: