StarCoder2 AWQ does not work correctly #1899

johan12345 · 2024-05-15T12:51:37Z

System Info

Latest Docker image (sha-a70b087)
Model: TechxGenus/starcoder2-15b-AWQ.
Options:

MODEL_ID: TechxGenus/starcoder2-15b-AWQ
MAX_INPUT_LENGTH: "3696"
MAX_TOTAL_TOKENS: "4096"
MAX_BATCH_PREFILL_TOKENS: "4096"
CUDA_MEMORY_FRACTION: "0.65"
QUANTIZE: awq

output of curl 127.0.0.1:8080/info:

{
  "model_id": "TechxGenus/starcoder2-15b-AWQ",
  "model_sha": "8b6696cfe913200c06086da98bdf2ed418b45ca0",
  "model_dtype": "torch.float16",
  "model_device_type": "cuda",
  "model_pipeline_tag": "text-generation",
  "max_concurrent_requests": 128,
  "max_best_of": 2,
  "max_stop_sequences": 4,
  "max_input_length": 3696,
  "max_total_tokens": 4096,
  "waiting_served_ratio": 0.3,
  "max_batch_total_tokens": 8192,
  "max_waiting_tokens": 20,
  "max_batch_size": null,
  "validation_workers": 2,
  "max_client_batch_size": 4,
  "router": "text-generation-router",
  "version": "2.0.2",
  "sha": "a70b087e71b8cf8df88736d934b05d91d883d524",
  "docker_label": "sha-a70b087"
}

Hardware: RTX 3090

Information

Docker
The CLI directly

Tasks

An officially supported command
My own modifications

Reproduction

Run the docker image with the TechxGenus/starcoder2-15b-AWQ model (options mentioned above)
Try to generate some text with the model

curl 10.42.1.102:8082/generate -X POST -d '{"inputs":"def calculate_pi():","parameters":{"max_new_tokens":100}}' -H 'Content-Type: application/json'`

Depending on the prompt, the output received is either empty:
```
{"generated_text":""}
```
or gibberish, such as:
```
{"generated_text":" ( ( ( ( ( ( ( ( ( ("}
```

The GPTQ variant of the same model works fine (but slow).

Expected behavior

The model should generate useful output.

There have already been discussions on a similar issue here:
https://huggingface.co/TechxGenus/starcoder2-7b-AWQ/discussions/1
huggingface/transformers#30225
huggingface/transformers#30074

It seems to be fixed in transformers now, but updating the transformers package in the TGI container to the latest main branch did not fix the issue. Probably this is because TGI uses a separate implementation of StarCoder2 and not the one from transformers?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StarCoder2 AWQ does not work correctly #1899

StarCoder2 AWQ does not work correctly #1899

johan12345 commented May 15, 2024 •

edited

StarCoder2 AWQ does not work correctly #1899

StarCoder2 AWQ does not work correctly #1899

Comments

johan12345 commented May 15, 2024 • edited

System Info

Information

Tasks

Reproduction

Expected behavior

johan12345 commented May 15, 2024 •

edited