Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LFX Mentorship (Jun-Aug, 2024): Finetune LLM models for Rust coding assistance #3371

Open
juntao opened this issue Apr 28, 2024 · 18 comments
Open
Labels
enhancement New feature or request LFX Mentorship Tasks for LFX Mentorship participants

Comments

@juntao
Copy link
Member

juntao commented Apr 28, 2024

Summary

WasmEdge is a lightweight inference runtime for AI and LLM applications. We want to build specialized and finetuned models for WasmEdge community. The model should be supported by WasmEdge and its applications should benefit the WasmEdge community.

In this project, we will build and compare two finetuned model for Rust coding assistance.

  • A code review model. It aims to be a new backend for the PR review bot we currently use in the community.
  • A QA model. It should be able to answer user questions about the Rust language and provide explanations. Our goal is to provide an alternative to our Learn Rust app.

Details

Objective 1: Code review model

Create a dataset with the following two fields

We are looking for at least 200 Q and A pairs. The total length of each QA should be less than 3000 words.

Q: a code segment
A: explanation / review of the code

The QA could come from Rust documentation such as Rust by Example and The Rust Programming Language.

Assemble the dataset into the llama3 chat template

It is similar to the following. Each entry should be all in one line with linebreaks denoted as \n.

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are a reviewer of Rust source code.<|eot_id|><|start_header_id|>user<|end_header_id|>

{{ a code segment }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

{{ explanation / review of the code }}<|eot_id|>

Finetune

we will finetune based on the llama3-8b-instruct model.

You are free to use any finetune tools. But if you are unsure, we recommend using llama.cpp's finetune utility. See an example. It can run on CPUs. We will provide the computing resources required for the finetuing.

Objective 2: Code QA model

Create a dataset with the following three fields

We are looking for at least 100 chapter + Q + A rows.

C: A chapter from a Rust book
Q: A question related to the chapter
A: Explanation / answer for the question

You could use ChatGPT to generate these questions and answers based on the chapter content.

Assemble the dataset into the llama3 chat template

It is similar to the following. Each entry should be all in one line with linebreaks denoted as \n.

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are an expert of the Rust language. Please answer question based on the context below.\n---------\n{{ book chapter }}<|eot_id|><|start_header_id|>user<|end_header_id|>

{{ a code segment }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

{{ explanation / review of the code }}<|eot_id|>

Finetune

Due to the chapter-long context length used in this dataset, we will finetune based on the 262k long context length llama3-8b-instruct model.

You are free to use any finetune tools. But if you are unsure, we recommend using llama.cpp's finetune utility. See an example. It can run on CPUs. We will provide the computing resources required for the finetuing.

Objective 3: Compare the two finetuned models

Start the finetuned models using the LlamaEdge API server, and test them on commonly used scenarios.

LFX

Expected outcome: Two finetuned models based on Llama3-8b for Rust code review and QA.

Recommended skills:

  • Rust language
  • ChatGPT and Claude
  • LlamaEdge
  • llama.cpp

Mentor:

Application link: ttps://mentorship.lfx.linuxfoundation.org/project/d52d172e-598d-4817-be97-3338d5b96432

Appendix

@juntao juntao added the enhancement New feature or request label Apr 28, 2024
@juntao juntao changed the title feat: Create an LLM agent for Rust code review feat: Finetune LLM models for Rust coding assistance Apr 28, 2024
@juntao juntao added the LFX Mentorship Tasks for LFX Mentorship participants label Apr 28, 2024
@DhruvSinghiitmandi
Copy link

DhruvSinghiitmandi commented Apr 28, 2024

Hi @juntao!
I am Dhruv , Currently pursuing my undergrad in CS from IIT Mandi.
I am deeply interested in this mentorship project and I have some experience in creating dataset for finetuning llms and finetuning llms. Recently I did a project where i created medical Conversation dataset using ddxplus dataset using GPT-3.5 for further finetuning a medllama2(llama2 finetuned on medQA dataset) on a single RTX-3090 using QLora and further quantization using llama.cpp.

Could you please let me know if there's a pretest or any other steps I should take to participate?

@hydai hydai changed the title feat: Finetune LLM models for Rust coding assistance LFX Mentorship (Jun-Aug, 2024): Finetune LLM models for Rust coding assistance May 2, 2024
@debrupf2946
Copy link

@juntao
I am Debrup pursuing Engg at BITS Pilani, and a contributor at Keras, I like to solve NLP problems and have worked extensively with hugginface(transformers, PEFT, datasets), Langchain, Llama index(data injection and Indexing), Knowledge graphs(Neo4J), RAGs, Rust.
I was also selected for GSoC in a similar project building coding assistant using Knowledge Graphs, which assist with QA and Summarization tasks of any Github Repo,
I find this project interesting and similar, and I want to contribute, and take part in the LFX-Mentorship program
gsoc_pr

@debrupf2946
Copy link

@juntao Is there any community(Discord, slack..) where we contributors can interact? What are the timelines and selection criteria for this project I am really excited about the project?
If possible can share a cover letter for the previous term I wanted to get an idea of length,content.

@angad-singhh
Copy link

angad-singhh commented May 14, 2024

Hey @juntao | @hydai
I'm Angad Singh, a final year grad student. I have gone through all 4 - LFX mentorship project, which WasmEdge is participating with. As, there are no pre-test this time, I will try explore whole Wasm-Ecosystem, same as for this project.

I will go through all the resources , links mentioned above and update you both with my progress here to improve my chances of getting selected for the LFX program.

@angad-singhh
Copy link

angad-singhh commented May 16, 2024

Progress

I have started working and exploring the first project objective that is:

A code review model. It aims to be a new backend for the PR review bot we currently use in the community.

What I plan to do:

  1. Start with exploring the current PR review bot used in the community and why we need to improve this.
  2. Then I try setting up the current PR review bot, onto my own GitHub account
  3. Analyse the prior PR submitted on the WasmEdge repo and understand how the current PR review bot responds in all PR.

What all I did:

I was able to complete all the planned steps I have mentioned above, below are my findings till now about the current PR review Bot.

Findings | please verify this part (@juntao @hydai )

  • First thing I noticed is that WasmEdge currently use flow template github-pr-summary written in Rust language and is hosted on the WasmEdge runtime on flows.network. There is also a flow template named github-pr-review with slightly different working.
  • The working of current PR bot (github-pr-summary) is to collect the patches of commits from the PR and send it to the OpenAI to summarize is commit and afterward it combines all these summaries into one single summary and sends back the combine summary of all commits + summary of each commits in the PR
  • Thing to note is that, it automatically updates the PR review comment whenever a new commit is been made in the PR and appends the detail summary/review of that commit.

Issues

  1. Although I was able to set up the PR review bot bot onto my own GitHub account. But there was a small issue was regarding OpenAI API key authentication, for which I have a raised a detailed issue here.
  2. While I was exploring the previous PR's in the WasmEdge repo, I noticed that the PR review bot was not giving any output in all PR's made after the March 2024, like in these PR 1, 2,3, ....

flows.network setup
Screenshot 2024-05-15 013841

@angad-singhh
Copy link

angad-singhh commented May 16, 2024

Next

I'm currently exploring the Fine-tuning part, I will try to setup the fine-tuned chemistry assistant here which you provided and make some changes and understand its working.

Also, I will be exploring the llama.cpp model, its working , docs, to get better understand and update you all with any further progress.

Lastly, I will try to look for any { good first issues } in code/docs and work around it

@codemaster1104
Copy link

codemaster1104 commented May 16, 2024

Hello @juntao @hydai , myself Akshat Shrivastava sophomore at IIT BHU(Varanasi) in past few days i have understood the workflow of finetuning and running llm's locally, i was wondering weather for current task i could use llama-3-8b-bnb-4bit as its a direct modification of llama3 8b but in 4 bit to run everything in my current hardware smoothly

@angad-singhh
Copy link

angad-singhh commented May 16, 2024

Hello @juntao @hydai , myself Akshat Shrivastava sophomore at IIT BHU(Varanasi) in past few days i have understood the workflow of finetuning and running llm's locally, i was wondering weather for current task i could use llama-3-8b-bnb-4bit as its a direct modification of llama3 8b but in 4 bit to run everything in my current hardware smoothly

Hey @codemaster1104 ,
It has been mentioned in the project issue that we are supposed to use Llama-3-8B-Instruct-GGUF for fine-tuning.

For the tool, llama.cpp is preferred one, because of its capability to run on CPU itself, also the resources required for fine-tuning will be provide by Organization itself.

I hope I cleared some of your doubts !!

@SHAIMA-HAQUE
Copy link

Hi, @juntao. I am interested in this project.

I went through the example for fine-tuning the model and also used the LlamaEdge API server to run the chat in my browser.
Screenshot 2024-05-17 at 01 41 09
Screenshot 2024-05-17 at 13 29 45

I did not face any major issues while going through this, although I was initially confused about how LlamaEdge API works, this demo video - https://www.youtube.com/watch?v=KTquzmXVj9o from KubeCon helped me see concretely what we expect when running this.

Do you suggest anything else I should look into to understand the tasks better?

@angad-singhh
Copy link

angad-singhh commented May 17, 2024

Progress:

  1. I was able to successfully set up and run the required llama-3-8B-Instruct-GGUF for this project on my local machine using the LlamaEdge environment.
  2. Next I also went through the llamaEdge repo, llamaEdge API server and its working.
  3. I also found and issue in WasmEdge docs, discussed here in the comments.
  4. Understood the concept of quantization, diff types of quantization process.

llama-3-8b-Instruct-GGUF -test setup

Screenshot 2024-05-17 180758

Screenshot (442)

@codemaster1104
Copy link

Screenshot from 2024-05-18 01-59-41
Till now i have made a small dataset , and have successfully converted to required prompt template , example:

['You are a reviewer of Rust source code.\n\n\n### Code:\nprintln!({} days, 31);\n\n### Explanation:\nPrints 31 days using {} as a placeholder for the value (31).<|end_of_text|>', 'You are a reviewer of Rust source code.\n\n\n### Code:\nprintln!({0}, this is {1}. {1}, this is {0}, Alice, Bob);\n\n### Explanation:\nPrints Alice, this is Bob. Bob, this is Alice using positional arguments ({0}, {1}). Order matters!<|end_of_text|>', 'You are a reviewer of Rust source code.\n\n\n### Code:\nprintln!({subject} {verb} {object}, subject=the lazy dog, verb=jumps over, object=quick brown fox);\n\n### Explanation:\nPrints a sentence with named arguments, improving readability.<|end_of_text|>', "You are a reviewer of Rust source code.\n\n\n### Code:\nlet logical: bool = true;\n\n### Explanation:\nDeclares a boolean variable named 'logical' and initializes it with the value 'true'.<|end_of_text|>", "You are a reviewer of Rust source code.\n\n\n### Code:\nlet a_float: f64 = 1.0;\n\n### Explanation:\nDeclares a floating-point variable named 'a_float' and initializes it with the value '1.0'. The type 'f64' represents a 64-bit floating-point number.<|end_of_text|>",

Tokenizer is also working fine:
{'input_ids': tensor([[128000, 2675, 527, 264, 56614, 315, 34889, 2592, 2082,
4286, 14711, 6247, 512, 34755, 0, 2358, 92, 2919,
11, 220, 2148, 629, 14711, 72387, 512, 9171, 82,
220, 2148, 2919, 1701, 4792, 439, 264, 6002, 369,
279, 907, 320, 2148, 570]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}

Next will be working on training the model with current dataset and then will move to improve my dataset

@codemaster1104
Copy link

codemaster1104 commented May 18, 2024

Screenshot from 2024-05-18 14-13-37
I successfully fine tuned llama3-8b-Instruct model with my small dataset, the attached image is of training loss which is decreasing steadily which is a good sign, next will be working on creating the final dataset and fine tuning

@juntao could you please guide us with the metrics and tools we could use to compare models , currently i am assuming that we would be manually comparing a bunch of answers for specific questions in both models, but i am sure there would be a better way

@codemaster1104
Copy link

ll2
mm2
These are the direct comparison between vanilla llama3-8b-Instruct and my fine tuned model for Code Explanation, My model appears to be much more direct and on point which was expected as my dataset of training is similarly direct and on point

@juntao @hydai really need your guidance here is such direct behaviour desired?, according to me for pr review bot should give on point explanations, If this behaviour is desired i will start working on my final dataset with similar types of explanations to code.

@codemaster1104
Copy link

codemaster1104 commented May 22, 2024

Frame 1
I have created my dataset for task 2 and did some experimentation for objective 2 and found out that as asked in objective if i use 262k context length my fine tune model starts hallucinating on even simple prompts whereas for smaller context lengths my model is performing just fine, I have also attached my dataset snippet.
QA_dataset
Am I going in wrong direction in my dataset?, I don't fully understand the statement "Due to the chapter-long context length used in this dataset" used in objective 2, Can more clarification be provided on how dataset for this objective is supposed to be made. @juntao @hydai

@codemaster1104
Copy link

Screenshot from 2024-05-22 17-11-11
Got llamaedge up and running and inference via chat endpoint for my fine tuned model is working fine

@pawaspy
Copy link

pawaspy commented May 25, 2024

Hey @juntao I have keen interest in this project. I have good hands-on experience with Large language models and made several project similar to this QNA project, which utilizes the OpenAI API and answers the latest answers. I would really be glad to in this project and refining my knowledge in this project. Could you tell me how I could interact with the members and other people.?

@codemaster1104
Copy link

Frame 2
There are my results for now for fine tuned model of objective 1 and objective 2, both of these models are running on LlamaEdge API server , for model of objective 2 I improved on dataset and added content of entire chapter in chapter field , hence some of my fields reached more than 50000 words and to fine tune it i had to reduce context length to 131072 and change other parameters from recommended value to avoid running into out of RAM issue.

My model for objective 2 is still hallucinating and it seems to get stuck on the context of previous questions, I would like to fine tune this model with recommended parameters first for which i will need more computing resources, I have also started to read papers on the topic to get better understanding.

My progress

  • Created dataset for both objectives( can be viewed on my hugging face given in my cover letter)
  • Did initial fine tuning (Collab for same can be found on my cover letter)
  • Did some experimentation with parameters and compared the result
  • Improved some instructions for LllamaEdge API server and my PR got merged : Link
  • Fine tuned with different iterations of dataset to test how it changes the performance

@staru09
Copy link

staru09 commented Jun 1, 2024

Is there any test planned for this project?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request LFX Mentorship Tasks for LFX Mentorship participants
Projects
None yet
Development

No branches or pull requests

8 participants