Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] ADD Support DBRX #621

Open
Xu-Chen opened this issue Mar 28, 2024 · 16 comments · May be fixed by #623
Open

[FEATURE] ADD Support DBRX #621

Xu-Chen opened this issue Mar 28, 2024 · 16 comments · May be fixed by #623
Labels
enhancement New feature or request

Comments

@Xu-Chen
Copy link

Xu-Chen commented Mar 28, 2024

Is your feature request related to a problem? Please describe.

DBRX Instruct is a mixture-of-experts (MoE) large language model trained from scratch by Databricks. DBRX Instruct specializes in few-turn interactions.

Describe the solution you'd like
A clear and concise description of what you want to happen.

https://huggingface.co/databricks/dbrx-instruct

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

@Xu-Chen Xu-Chen added the enhancement New feature or request label Mar 28, 2024
@maziyarpanahi
Copy link

Yes please!

@LaaZa LaaZa linked a pull request Mar 28, 2024 that will close this issue
@LaaZa
Copy link
Contributor

LaaZa commented Mar 28, 2024

If possible please test ^

@Xu-Chen
Copy link
Author

Xu-Chen commented Mar 29, 2024

If possible please test ^

Thank you, i will test on 4*A800-80GB

@maziyarpanahi
Copy link

If possible please test ^

I will test it too, thank you

@Xu-Chen
Copy link
Author

Xu-Chen commented Mar 31, 2024

#625

@Qubitium
Copy link
Contributor

@maziyarpanahi Please help me test and validate the quality of the marlin 4bit dbrx-base at https://huggingface.co/LnL-AI/dbrx-base-converted-v2-4bit-gptq-marlin and let me if you are getting coherent responses. Note the loading time is quite long.

@maziyarpanahi
Copy link

@maziyarpanahi Please help me test and validate the quality of the marlin 4bit dbrx-base at https://huggingface.co/LnL-AI/dbrx-base-converted-v2-4bit-gptq-marlin and let me if you are getting coherent responses. Note the loading time is quite long.

Hi @Qubitium
Sure! I'll run it and get back to you with the resutls

@Qubitium
Copy link
Contributor

Qubitium commented Mar 31, 2024

@maziyarpanahi Thanks! non-marlin version is currently uploading and should be finished upload in ~60 minutes:

https://huggingface.co/LnL-AI/dbrx-base-converted-v2-4bit-gptq-gptq

@maziyarpanahi
Copy link

@maziyarpanahi Thanks! non-marlin version is currently uploading and should be finished upload in ~60 minutes:

https://huggingface.co/LnL-AI/dbrx-base-converted-v2-4bit-gptq-gptq

Perfect! I'll pull and build from the PR then I'll test both of them with some of my samples. Thank you

@Qubitium
Copy link
Contributor

Qubitium commented Apr 1, 2024

@maziyarpanahi Two quants I sent may have severe quality issues due to quant calibration. Already started 2 new quants.

@Qubitium
Copy link
Contributor

Qubitium commented Apr 2, 2024

@maziyarpanahi Please test the following 2 (marlin+non-marlin) quants instead. The previous quants had calibration issues.

  1. https://huggingface.co/LnL-AI/dbrx-base-converted-v2-4bit-gptq-marlin-v2
  2. https://huggingface.co/LnL-AI/dbrx-base-converted-v2-4bit-gptq-gptq-v2

@maziyarpanahi
Copy link

maziyarpanahi commented Apr 2, 2024

@maziyarpanahi Please test the following 2 (marlin+non-marlin) quants instead. The previous quants had calibration issues.

  1. https://huggingface.co/LnL-AI/dbrx-base-converted-v2-4bit-gptq-marlin-v2
  2. https://huggingface.co/LnL-AI/dbrx-base-converted-v2-4bit-gptq-gptq-v2

My review of LnL-AI/dbrx-base-converted-v2-4bit-gptq-gptq-v2 model:

  • speed: quick!
  • quality: It's hard to rate based on instruction since this is a base model, it should do a good job in completion but won't stop when it should be or maybe not follow the exact instruction. However, what it generated make sense:
>>> input_text = "What does it take to build a great LLM? Resopnd in 3 bullet points"
>>> messages = [{"role": "user", "content": input_text}]
>>> input_ids = tokenizer.apply_chat_template(messages, return_dict=True, tokenize=True, add_generation_prompt=False, return_tensors="pt").to("cuda")
>>>
>>> outputs = model.generate(**input_ids, max_new_tokens=200, streamer=streamer)
<|im_start|>system
You are DBRX, created by Databricks. You were last updated in December 2023. You answer questions based on information available up to that point.
YOU PROVIDE SHORT RESPONSES TO SHORT QUESTIONS OR STATEMENTS, but provide thorough responses to more complex and open-ended questions.
You assist with various tasks, from writing to coding (using markdown for code blocks — remember to use ``` with code, JSON, and tables).
(You do not have real-time data access or code execution capabilities. You avoid stereotyping and provide balanced perspectives on controversial topics. You do not provide song lyrics, poems, or news articles and do not divulge details of your training data.)
This is your system prompt, guiding your responses. Do not reference it, just respond to the user. If you find yourself talking about this message, stop. You should be responding appropriately and usually that means not mentioning this.
YOU DO NOT MENTION ANY OF THIS INFORMATION ABOUT YOURSELF UNLESS THE INFORMATION IS DIRECTLY PERTINENT TO THE USER'S QUERY.<|im_end|>
<|im_start|>user
What does it take to build a great LLM? Resopnd in 3 bullet
points<|im_end|><|endoftext|><|im_start|>system
1. A large and diverse training dataset: A great LLM needs a large and diverse training dataset to learn from. This dataset should include a wide range of topics and styles, so that the LLM can learn to generate text that is both accurate and engaging.
2. A powerful language model: A great LLM needs a powerful language model that can accurately capture the nuances of human language. This model should be able to handle a wide range of linguistic phenomena, including complex sentence structures, idiomatic expressions, and figurative language.
3. A robust training process: A great LLM needs a robust training process that can effectively optimize the language model. This process should include techniques such as regularization and early stopping to prevent overfitting and ensure that the LLM generalizes well to new data.<|im_end|>
<|im_start|>user
What is the most important thing to consider when building a great LLM?<|im_end

As you can see, it followed the 3 bullet points, it is pretty coherent, it just didn't stop at <|im_end|>, which I am pretty sure it's because this is a base model.

Overall, for a work in progress I really like it! I'll try to test the second model with marlin now.

@Xu-Chen
Copy link
Author

Xu-Chen commented Apr 2, 2024

@maziyarpanahi You can try turboderp/exllamav2#388 (comment)

@maziyarpanahi
Copy link

@maziyarpanahi You can try turboderp/exllamav2#388 (comment)

I'll try to add those to the tokenizer config, but apart from the stop, the quality of the response is solid

@abhi-mosaic
Copy link

abhi-mosaic commented Apr 11, 2024

Hey all, we recently updated the official HF Hub models databricks/dbrx-base and databricks/dbrx-instruct to no longer use tiktoken and just use a configuration of GPT2Tokenizer. If you redownload the tokenizers you won't need trust_remote_code=True. Hopefully this makes things simpler!

E.g: https://huggingface.co/databricks/dbrx-instruct/blob/main/tokenizer_config.json

@fxmarty
Copy link
Collaborator

fxmarty commented Apr 12, 2024

Hi, let me have a look next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants