-
Ollama is new but yet very powerfull simple way to run OpenSource LLM on your own Mac with metal support (they plan support for other OS next). It's a Go program exposing a simple API to interact with different local LLM models, here is the documentation: I want to create a simple frontend using the Vercel AI SDK, I've looked a bit into the documentation and I guess I'm going to use https://sdk.vercel.ai/docs/api-reference/use-completion ? But couldn't find a way to add more parameters to specify the model to use? Would appreciate some guidance, will share what I'm able to achieve then, thanks! EDIT: If nothing exists, I'm happy to share some of time for pushing a PR into Vercel AI otherwise. Cheers. |
Beta Was this translation helpful? Give feedback.
Replies: 7 comments
-
A PR would be accepted. You'll likely need to create a wrapper for the Ollama rest API, like here: https://github.com/vercel/ai/blob/main/packages/core/streams/anthropic-stream.ts |
Beta Was this translation helpful? Give feedback.
-
I've created a create next-ollama-app by cloning the langchain example. |
Beta Was this translation helpful? Give feedback.
-
Why not use ollama-node @brunnolou ? I think that might give you something that is closer to the rest of the examples. |
Beta Was this translation helpful? Give feedback.
-
Here is a starter kit for the AI SDK & Ollama using ModelFusion (a library that I'm working on) as glue: https://github.com/lgrammel/modelfusion-ollama-nextjs-starter |
Beta Was this translation helpful? Give feedback.
-
This "worked" for me. Should handle both chat completions and normal completions. Submitted in #935 import {
AIStream,
readableFromAsyncIterable,
type AIStreamCallbacksAndOptions,
createCallbacksTransformer,
createStreamDataTransformer
} from 'ai'
// Chat message interfaces
interface ChatMessage {
role: 'system' | 'user' | 'assistant'
content: string
images?: string[]
}
interface ChatRequestParams {
model: string
messages: ChatMessage[]
stream?: boolean
format?: string
options?: Record<string, unknown>
template?: string
}
interface ChatResponse {
model: string
created_at: string
message?: ChatMessage
done: boolean
total_duration?: number
load_duration?: number
prompt_eval_count?: number
prompt_eval_duration?: number
eval_count?: number
eval_duration?: number
}
interface ChatCompletionChunk {
model: string
created_at: string
message: ChatMessage
done: boolean
}
export interface CompletionResponse {
model: string
created_at: string
response: string
done: boolean
context?: number[]
total_duration?: number
load_duration?: number
prompt_eval_count?: number
prompt_eval_duration?: number
eval_count?: number
eval_duration?: number
}
export interface CompletionChunk {
model: string
created_at: string
response: string
done: boolean
}
// Extend StreamData type to include ChatResponse
type OllamaStreamData = ChatResponse | ChatCompletionChunk | CompletionResponse | CompletionChunk
// Function to send chat requests
export async function sendChatRequest (data: OllamaChatCompletionsParams): Promise<AsyncIterable<ChatResponse>> {
const url = `https://your_ollama_instance/api/chat`
const response = await fetch(url, {
method: 'POST',
headers: {
'Content-Type': 'application/json'
},
body: JSON.stringify({
model: data.model,
messages: data.messages,
stream: data?.stream,
format: data?.format,
options: data?.options,
template: data?.template
})
})
return response
}
function parseOllamaStream (): (data: string) => OllamaStreamData {
return data => {
try {
return JSON.parse(data)
} catch (error) {
if (error instanceof SyntaxError) {
console.warn('Received non-JSON data:', data)
} else {
throw error
}
}
}
}
async function * streamable<T> (stream: AsyncIterable<T>) {
for await (const chunk of stream) {
yield chunk
}
}
// A modified version of the streamable function specifically for chat messages
async function * chatStreamable (
stream: AsyncIterable<ChatResponse>,
) {
for await (const response of stream) {
if (response.message) {
yield response.message
}
if (response.done) {
// Additional final response data can be handled here if necessary
return
}
}
}
export function OllamaStream (
res: Response | AsyncIterable<OllamaStreamData>,
cb?: AIStreamCallbacksAndOptions
): ReadableStream<string> {
if ('body' in res) {
const asyncIterable = chunksToAsyncIterator(res.body, parseOllamaStream())
return readableFromAsyncIterable(asyncIterable)
.pipeThrough(createCallbacksTransformer(cb))
.pipeThrough(createStreamDataTransformer(cb?.experimental_streamData))
} else if (Symbol.asyncIterator in res) {
return readableFromAsyncIterable(streamable(res))
.pipeThrough(createCallbacksTransformer(cb))
.pipeThrough(createStreamDataTransformer(cb?.experimental_streamData))
} else {
throw new Error('The provided resource is neither a Response nor an AsyncIterable.')
}
}
// Helper function to convert a ReadableStream (from the Fetch Response) to an AsyncIterable
async function * chunksToAsyncIterator (
stream: ReadableStream<Uint8Array>,
parseFn: (data: string) => OllamaStreamData
): AsyncIterable<OllamaStreamData> {
let buffer = ''
const reader = stream.getReader()
try {
while (true) {
const { done, value } = await reader.read()
if (done) break
const textDecoder = new TextDecoder()
buffer += textDecoder.decode(value)
let boundary = buffer.indexOf('\n')
while (boundary !== -1) {
const dataToParse = buffer.substring(0, boundary)
buffer = buffer.substring(boundary + 1)
const parsedData = parseFn(dataToParse)
if (parsedData?.message) {
yield parsedData.message.content
} else if (parsedData?.response) {
yield parsedData.response
}
boundary = buffer.indexOf('\n')
}
}
} finally {
reader.releaseLock()
}
} Then I called it with: import { OllamaStream, sendChatRequest } from '@lib/ollama/ollamaStream'
// <...snip...>
const ollamaResponse = await sendChatRequest(data)
const stream = OllamaStream(ollamaResponse)
const response = new StreamingTextResponse(stream)
return response |
Beta Was this translation helpful? Give feedback.
-
Ollama now has built-in compatibility with the OpenAI Chat Completions API blog post: https://ollama.com/blog/openai-compatibility usage: // app/api/chat/route.ts
import OpenAI from 'openai'
import { OpenAIStream, StreamingTextResponse } from 'ai'
export const runtime = 'edge'
const openai = new OpenAI({
baseURL: 'http://localhost:11434/v1',
apiKey: 'ollama', // required but not used
})
export async function POST(req: Request) {
const { messages } = await req.json()
const response = await openai.chat.completions.create({
model: 'llama2',
stream: true,
messages,
})
const stream = OpenAIStream(response)
return new StreamingTextResponse(stream)
} |
Beta Was this translation helpful? Give feedback.
-
With the Vercel AI SDK 3.1, there is a community provider for llama.cpp that works with the new AI functions: https://github.com/sgomez/ollama-ai-provider |
Beta Was this translation helpful? Give feedback.
A PR would be accepted.
You'll likely need to create a wrapper for the Ollama rest API, like here: https://github.com/vercel/ai/blob/main/packages/core/streams/anthropic-stream.ts