Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StarCoder2 AWQ does not work correctly #1899

Open
2 of 4 tasks
johan12345 opened this issue May 15, 2024 · 0 comments
Open
2 of 4 tasks

StarCoder2 AWQ does not work correctly #1899

johan12345 opened this issue May 15, 2024 · 0 comments

Comments

@johan12345
Copy link

johan12345 commented May 15, 2024

System Info

Latest Docker image (sha-a70b087)
Model: TechxGenus/starcoder2-15b-AWQ.
Options:

MODEL_ID: TechxGenus/starcoder2-15b-AWQ
MAX_INPUT_LENGTH: "3696"
MAX_TOTAL_TOKENS: "4096"
MAX_BATCH_PREFILL_TOKENS: "4096"
CUDA_MEMORY_FRACTION: "0.65"
QUANTIZE: awq

output of curl 127.0.0.1:8080/info:

{
  "model_id": "TechxGenus/starcoder2-15b-AWQ",
  "model_sha": "8b6696cfe913200c06086da98bdf2ed418b45ca0",
  "model_dtype": "torch.float16",
  "model_device_type": "cuda",
  "model_pipeline_tag": "text-generation",
  "max_concurrent_requests": 128,
  "max_best_of": 2,
  "max_stop_sequences": 4,
  "max_input_length": 3696,
  "max_total_tokens": 4096,
  "waiting_served_ratio": 0.3,
  "max_batch_total_tokens": 8192,
  "max_waiting_tokens": 20,
  "max_batch_size": null,
  "validation_workers": 2,
  "max_client_batch_size": 4,
  "router": "text-generation-router",
  "version": "2.0.2",
  "sha": "a70b087e71b8cf8df88736d934b05d91d883d524",
  "docker_label": "sha-a70b087"
}

Hardware: RTX 3090

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

  1. Run the docker image with the TechxGenus/starcoder2-15b-AWQ model (options mentioned above)
  2. Try to generate some text with the model
curl 10.42.1.102:8082/generate -X POST -d '{"inputs":"def calculate_pi():","parameters":{"max_new_tokens":100}}' -H 'Content-Type: application/json'`
  1. Depending on the prompt, the output received is either empty:
    {"generated_text":""}
    or gibberish, such as:
    {"generated_text":" ( ( ( ( ( ( ( ( ( ("}

The GPTQ variant of the same model works fine (but slow).

Expected behavior

The model should generate useful output.

There have already been discussions on a similar issue here:
https://huggingface.co/TechxGenus/starcoder2-7b-AWQ/discussions/1
huggingface/transformers#30225
huggingface/transformers#30074

It seems to be fixed in transformers now, but updating the transformers package in the TGI container to the latest main branch did not fix the issue. Probably this is because TGI uses a separate implementation of StarCoder2 and not the one from transformers?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant