server `/embedding` api doesn't handle cases when physical batch size < prompt length. #7422

wsxiaoys · 2024-05-20T23:03:33Z

HTTP Request

POST http://localhost:30888/embeddings HTTP/1.1
Content-Type: application/json

{
  "content": "## For more information about docker support in SkyPilot, please refer to the `image_id` section above.envs:MY_BUCKET:skypilot-temp-gcs-testMY_LOCAL_PATH:tmp-workdirMODEL_SIZE:13bfile_mounts:# Uses rsync to sync local files/directories to all nodes of the cluster.## If a relative path is used, it\"s evaluated relative to the location from# which `sky` is called.## If symlinks are present, they are copied as symlinks, and their targets# must also be synced using file_mounts to ensure correctness./remote/dir1/file:/local/dir1/file/remote/dir2:/local/dir2# Create a S3 bucket named sky-dataset, uploads the contents of# /local/path/datasets to the bucket, and marks the bucket as persistent# (it will not be deleted after the completion of this task).# Symlinks and their contents are NOT copied.## Mounts the bucket at /datasets-storage on every node of the cluster./datasets-storage:name:sky-dataset# Name of storage, optional when source is bucket URIsource:/local/path/datasets# Source path, can be local or s3/gcs URL. Optional, do not specify to create an empty bucket.store:s3# Could be either \"s3\", \"gcs\" or \"r2\"; default: None. Optional.persistent:True# Defaults to True; can be set to false to delete bucket after cluster is downed. Optional.mode:MOUNT# Either MOUNT or COPY. Defaults to MOUNT. Optional.# Copies a cloud object store URI to the cluster. Can be private buckets./datasets\n-s3:s3://my-awesome-dataset# Demoing env var usage./checkpoint/${MODEL_SIZE}:~/${MY_LOCAL_PATH}/mydir:name:${MY_BUCKET}# Name of the bucket.mode:MOUNT# Setup script (optional) to execute on every `sky launch`.# This is executed before the \"run\" commands.## The \"|\" separator indicates a multiline string. To specify a single command:#   setup: pip install -r requirements.txtsetup:|echo Begin setup.pip install -r requirements.txtecho Setup complete.# Main program (optional, but recommended) to run on every node of the cluster.run:|echo Beginning task.python train.py# Demoing env var usage.echo Env var MODEL_SIZE has value: ${MODEL_SIZE}\n\n"
}

Command to start:

# Return empty embedding
llama-server -m /Users/meng/Projects/models/nomic/ggml/model.gguf --port 30888 --ctx-size 4096 --embedding -ngl 9999 -cb -ub 512

# Return correct embedding, as input prompt has length of 613
llama-server -m /Users/meng/Projects/models/nomic/ggml/model.gguf --port 30888 --ctx-size 4096 --embedding -ngl 9999 -cb -ub 614

Error output

HTTP/1.1 500 Internal Server Error
Access-Control-Allow-Origin: 
Connection: close
Content-Length: 120
Content-Type: application/json; charset=utf-8
Server: llama.cpp

{
  "error": {
    "code": 500,
    "message": "input is too large to process. increase the physical batch size",
    "type": "server_error"
  }
}

Seems related: #6996

The text was updated successfully, but these errors were encountered:

wsxiaoys · 2024-05-20T23:11:20Z

Seems WAI according to #7389

wsxiaoys added the bug-unconfirmed label May 20, 2024

wsxiaoys closed this as completed May 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server `/embedding` api doesn't handle cases when physical batch size < prompt length. #7422

server `/embedding` api doesn't handle cases when physical batch size < prompt length. #7422

wsxiaoys commented May 20, 2024 •

edited

wsxiaoys commented May 20, 2024

server /embedding api doesn't handle cases when physical batch size < prompt length. #7422

server /embedding api doesn't handle cases when physical batch size < prompt length. #7422

Comments

wsxiaoys commented May 20, 2024 • edited

wsxiaoys commented May 20, 2024

server `/embedding` api doesn't handle cases when physical batch size < prompt length. #7422

server `/embedding` api doesn't handle cases when physical batch size < prompt length. #7422

wsxiaoys commented May 20, 2024 •

edited