We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LLamaCpp embedding returns an empty array for long text. It seems that this problem will occur if the text length exceeds 680.
Steps to reproduce:
docker run -p 10080:80 -v $(pwd):/ggufs/ --rm ghcr.io/ggerganov/llama.cpp:server -m /ggufs/bge-large-zh-v1.5-f32.gguf --embedding -c 8192 --host 0.0.0.0 --port 80 -a bge-large-zh -ngl 100
Note: I can't find a parameter that can adjust this limit.
from typing import Any, Dict, List, Optional, Union import fire import os import uvicorn import subprocess from fastapi import FastAPI, Request from pydantic import BaseModel from langchain_community.embeddings import HuggingFaceEmbeddings app = FastAPI() class UsageInfo(BaseModel): """Usage information.""" prompt_tokens: int = 0 total_tokens: int = 0 completion_tokens: Optional[int] = 0 class EmbeddingsRequest(BaseModel): """Embedding request.""" model: str = None input: Union[str, List[Any]] user: Optional[str] = None class EmbeddingsResponse(BaseModel): """Embedding response.""" object: str = 'list' data: List[Dict[str, Any]] model: str usage: UsageInfo embeddings: Optional[HuggingFaceEmbeddings] = None @app.post('/v1/embeddings') async def create_embeddings(request: EmbeddingsRequest, raw_request: Request = None): """Creates embeddings for the text.""" embedding = await embeddings.aembed_query(request.input) data = [{'object': 'embedding', 'embedding': embedding, 'index': 0}] token_num = len(embedding) return EmbeddingsResponse( data=data, model=request.model, usage=UsageInfo( prompt_tokens=token_num, total_tokens=token_num, completion_tokens=None, ), ).dict(exclude_none=True) def main(model_name: str, host="0.0.0.0", port=8966, **kwargs: Dict[str, Any]): embeddings = HuggingFaceEmbeddings(model_name=model_name, model_kwargs=kwargs) uvicorn.run(app, host=host, port=port, workers=1) if __name__ == "__main__": fire.Fire(main)
The text was updated successfully, but these errors were encountered:
/embedding
No branches or pull requests
LLamaCpp embedding returns an empty array for long text. It seems that this problem will occur if the text length exceeds 680.
Steps to reproduce:
docker run -p 10080:80 -v $(pwd):/ggufs/ --rm ghcr.io/ggerganov/llama.cpp:server -m /ggufs/bge-large-zh-v1.5-f32.gguf --embedding -c 8192 --host 0.0.0.0 --port 80 -a bge-large-zh -ngl 100
Note: I can't find a parameter that can adjust this limit.
It's fine if you use HuggingFaceEmbeddings(The downside is that it's too bulky).
The text was updated successfully, but these errors were encountered: