LFX Mentorship (Jun-Aug, 2024): Finetune LLM models for Rust coding assistance #3371

juntao · 2024-04-28T10:31:20Z

Summary

WasmEdge is a lightweight inference runtime for AI and LLM applications. We want to build specialized and finetuned models for WasmEdge community. The model should be supported by WasmEdge and its applications should benefit the WasmEdge community.

In this project, we will build and compare two finetuned model for Rust coding assistance.

A code review model. It aims to be a new backend for the PR review bot we currently use in the community.
A QA model. It should be able to answer user questions about the Rust language and provide explanations. Our goal is to provide an alternative to our Learn Rust app.

Details

Objective 1: Code review model

Create a dataset with the following two fields

We are looking for at least 200 Q and A pairs. The total length of each QA should be less than 3000 words.

Q: a code segment
A: explanation / review of the code

The QA could come from Rust documentation such as Rust by Example and The Rust Programming Language.

Assemble the dataset into the llama3 chat template

It is similar to the following. Each entry should be all in one line with linebreaks denoted as \n.

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are a reviewer of Rust source code.<|eot_id|><|start_header_id|>user<|end_header_id|>

{{ a code segment }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

{{ explanation / review of the code }}<|eot_id|>

Finetune

we will finetune based on the llama3-8b-instruct model.

You are free to use any finetune tools. But if you are unsure, we recommend using llama.cpp's finetune utility. See an example. It can run on CPUs. We will provide the computing resources required for the finetuing.

Objective 2: Code QA model

Create a dataset with the following three fields

We are looking for at least 100 chapter + Q + A rows.

C: A chapter from a Rust book
Q: A question related to the chapter
A: Explanation / answer for the question

You could use ChatGPT to generate these questions and answers based on the chapter content.

Assemble the dataset into the llama3 chat template

It is similar to the following. Each entry should be all in one line with linebreaks denoted as \n.

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are an expert of the Rust language. Please answer question based on the context below.\n---------\n{{ book chapter }}<|eot_id|><|start_header_id|>user<|end_header_id|>

{{ a code segment }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

{{ explanation / review of the code }}<|eot_id|>

Finetune

Due to the chapter-long context length used in this dataset, we will finetune based on the 262k long context length llama3-8b-instruct model.

You are free to use any finetune tools. But if you are unsure, we recommend using llama.cpp's finetune utility. See an example. It can run on CPUs. We will provide the computing resources required for the finetuing.

Objective 3: Compare the two finetuned models

Start the finetuned models using the LlamaEdge API server, and test them on commonly used scenarios.

LFX

Expected outcome: Two finetuned models based on Llama3-8b for Rust code review and QA.

Recommended skills:

Rust language
ChatGPT and Claude
LlamaEdge
llama.cpp

Mentor:

Michael Yuan @juntao michael@secondstate.io

Application link: ttps://mentorship.lfx.linuxfoundation.org/project/d52d172e-598d-4817-be97-3338d5b96432

Appendix

The text was updated successfully, but these errors were encountered:

DhruvSinghiitmandi · 2024-04-28T20:01:55Z

Hi @juntao!
I am Dhruv , Currently pursuing my undergrad in CS from IIT Mandi.
I am deeply interested in this mentorship project and I have some experience in creating dataset for finetuning llms and finetuning llms. Recently I did a project where i created medical Conversation dataset using ddxplus dataset using GPT-3.5 for further finetuning a medllama2(llama2 finetuned on medQA dataset) on a single RTX-3090 using QLora and further quantization using llama.cpp.

Could you please let me know if there's a pretest or any other steps I should take to participate?

debrupf2946 · 2024-05-12T15:02:48Z

@juntao
I am Debrup pursuing Engg at BITS Pilani, and a contributor at Keras, I like to solve NLP problems and have worked extensively with hugginface(transformers, PEFT, datasets), Langchain, Llama index(data injection and Indexing), Knowledge graphs(Neo4J), RAGs, Rust.
I was also selected for GSoC in a similar project building coding assistant using Knowledge Graphs, which assist with QA and Summarization tasks of any Github Repo,
I find this project interesting and similar, and I want to contribute, and take part in the LFX-Mentorship program

debrupf2946 · 2024-05-12T15:06:24Z

@juntao Is there any community(Discord, slack..) where we contributors can interact? What are the timelines and selection criteria for this project I am really excited about the project?
If possible can share a cover letter for the previous term I wanted to get an idea of length,content.

angad-singhh · 2024-05-14T15:18:54Z

Hey @juntao | @hydai
I'm Angad Singh, a final year grad student. I have gone through all 4 - LFX mentorship project, which WasmEdge is participating with. As, there are no pre-test this time, I will try explore whole Wasm-Ecosystem, same as for this project.

I will go through all the resources , links mentioned above and update you both with my progress here to improve my chances of getting selected for the LFX program.

angad-singhh · 2024-05-16T10:17:34Z

Progress

I have started working and exploring the first project objective that is:

A code review model. It aims to be a new backend for the PR review bot we currently use in the community.

What I plan to do:

Start with exploring the current PR review bot used in the community and why we need to improve this.
Then I try setting up the current PR review bot, onto my own GitHub account
Analyse the prior PR submitted on the WasmEdge repo and understand how the current PR review bot responds in all PR.

What all I did:

I was able to complete all the planned steps I have mentioned above, below are my findings till now about the current PR review Bot.

Findings | please verify this part (@juntao @hydai )

First thing I noticed is that WasmEdge currently use flow template github-pr-summary written in Rust language and is hosted on the WasmEdge runtime on flows.network. There is also a flow template named github-pr-review with slightly different working.
The working of current PR bot (github-pr-summary) is to collect the patches of commits from the PR and send it to the OpenAI to summarize is commit and afterward it combines all these summaries into one single summary and sends back the combine summary of all commits + summary of each commits in the PR
Thing to note is that, it automatically updates the PR review comment whenever a new commit is been made in the PR and appends the detail summary/review of that commit.

Issues

Although I was able to set up the PR review bot bot onto my own GitHub account. But there was a small issue was regarding OpenAI API key authentication, for which I have a raised a detailed issue here.
While I was exploring the previous PR's in the WasmEdge repo, I noticed that the PR review bot was not giving any output in all PR's made after the March 2024, like in these PR 1, 2,3, ....

flows.network setup

angad-singhh · 2024-05-16T10:38:04Z

Hello @juntao @hydai , myself Akshat Shrivastava sophomore at IIT BHU(Varanasi) in past few days i have understood the workflow of finetuning and running llm's locally, i was wondering weather for current task i could use llama-3-8b-bnb-4bit as its a direct modification of llama3 8b but in 4 bit to run everything in my current hardware smoothly

angad-singhh · 2024-05-16T14:32:40Z

Hello @juntao @hydai , myself Akshat Shrivastava sophomore at IIT BHU(Varanasi) in past few days i have understood the workflow of finetuning and running llm's locally, i was wondering weather for current task i could use llama-3-8b-bnb-4bit as its a direct modification of llama3 8b but in 4 bit to run everything in my current hardware smoothly

Hey @codemaster1104 ,
It has been mentioned in the project issue that we are supposed to use Llama-3-8B-Instruct-GGUF for fine-tuning.

For the tool, llama.cpp is preferred one, because of its capability to run on CPU itself, also the resources required for fine-tuning will be provide by Organization itself.

I hope I cleared some of your doubts !!

SHAIMA-HAQUE · 2024-05-17T08:11:42Z

Hi, @juntao. I am interested in this project.

I went through the example for fine-tuning the model and also used the LlamaEdge API server to run the chat in my browser.

I did not face any major issues while going through this, although I was initially confused about how LlamaEdge API works, this demo video - https://www.youtube.com/watch?v=KTquzmXVj9o from KubeCon helped me see concretely what we expect when running this.

Do you suggest anything else I should look into to understand the tasks better?

angad-singhh · 2024-05-17T18:31:23Z

Progress:

I was able to successfully set up and run the required llama-3-8B-Instruct-GGUF for this project on my local machine using the LlamaEdge environment.
Next I also went through the llamaEdge repo, llamaEdge API server and its working.
I also found and issue in WasmEdge docs, discussed here in the comments.
Understood the concept of quantization, diff types of quantization process.

llama-3-8b-Instruct-GGUF -test setup

codemaster1104 · 2024-05-17T20:35:44Z

Till now i have made a small dataset , and have successfully converted to required prompt template , example:

['You are a reviewer of Rust source code.\n\n\n### Code:\nprintln!({} days, 31);\n\n### Explanation:\nPrints 31 days using {} as a placeholder for the value (31).<|end_of_text|>', 'You are a reviewer of Rust source code.\n\n\n### Code:\nprintln!({0}, this is {1}. {1}, this is {0}, Alice, Bob);\n\n### Explanation:\nPrints Alice, this is Bob. Bob, this is Alice using positional arguments ({0}, {1}). Order matters!<|end_of_text|>', 'You are a reviewer of Rust source code.\n\n\n### Code:\nprintln!({subject} {verb} {object}, subject=the lazy dog, verb=jumps over, object=quick brown fox);\n\n### Explanation:\nPrints a sentence with named arguments, improving readability.<|end_of_text|>', "You are a reviewer of Rust source code.\n\n\n### Code:\nlet logical: bool = true;\n\n### Explanation:\nDeclares a boolean variable named 'logical' and initializes it with the value 'true'.<|end_of_text|>", "You are a reviewer of Rust source code.\n\n\n### Code:\nlet a_float: f64 = 1.0;\n\n### Explanation:\nDeclares a floating-point variable named 'a_float' and initializes it with the value '1.0'. The type 'f64' represents a 64-bit floating-point number.<|end_of_text|>",

Tokenizer is also working fine:
{'input_ids': tensor([[128000, 2675, 527, 264, 56614, 315, 34889, 2592, 2082,
4286, 14711, 6247, 512, 34755, 0, 2358, 92, 2919,
11, 220, 2148, 629, 14711, 72387, 512, 9171, 82,
220, 2148, 2919, 1701, 4792, 439, 264, 6002, 369,
279, 907, 320, 2148, 570]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}

Next will be working on training the model with current dataset and then will move to improve my dataset

codemaster1104 · 2024-05-18T10:03:18Z

I successfully fine tuned llama3-8b-Instruct model with my small dataset, the attached image is of training loss which is decreasing steadily which is a good sign, next will be working on creating the final dataset and fine tuning

@juntao could you please guide us with the metrics and tools we could use to compare models , currently i am assuming that we would be manually comparing a bunch of answers for specific questions in both models, but i am sure there would be a better way

codemaster1104 · 2024-05-18T20:47:44Z

These are the direct comparison between vanilla llama3-8b-Instruct and my fine tuned model for Code Explanation, My model appears to be much more direct and on point which was expected as my dataset of training is similarly direct and on point

@juntao @hydai really need your guidance here is such direct behaviour desired?, according to me for pr review bot should give on point explanations, If this behaviour is desired i will start working on my final dataset with similar types of explanations to code.

codemaster1104 · 2024-05-22T09:42:47Z

I have created my dataset for task 2 and did some experimentation for objective 2 and found out that as asked in objective if i use 262k context length my fine tune model starts hallucinating on even simple prompts whereas for smaller context lengths my model is performing just fine, I have also attached my dataset snippet.

Am I going in wrong direction in my dataset?, I don't fully understand the statement "Due to the chapter-long context length used in this dataset" used in objective 2, Can more clarification be provided on how dataset for this objective is supposed to be made. @juntao @hydai

codemaster1104 · 2024-05-22T11:50:22Z

Got llamaedge up and running and inference via chat endpoint for my fine tuned model is working fine

pawaspy · 2024-05-25T09:25:35Z

Hey @juntao I have keen interest in this project. I have good hands-on experience with Large language models and made several project similar to this QNA project, which utilizes the OpenAI API and answers the latest answers. I would really be glad to in this project and refining my knowledge in this project. Could you tell me how I could interact with the members and other people.?

codemaster1104 · 2024-05-27T07:32:30Z

There are my results for now for fine tuned model of objective 1 and objective 2, both of these models are running on LlamaEdge API server , for model of objective 2 I improved on dataset and added content of entire chapter in chapter field , hence some of my fields reached more than 50000 words and to fine tune it i had to reduce context length to 131072 and change other parameters from recommended value to avoid running into out of RAM issue.

My model for objective 2 is still hallucinating and it seems to get stuck on the context of previous questions, I would like to fine tune this model with recommended parameters first for which i will need more computing resources, I have also started to read papers on the topic to get better understanding.

My progress

Created dataset for both objectives( can be viewed on my hugging face given in my cover letter)
Did initial fine tuning (Collab for same can be found on my cover letter)
Did some experimentation with parameters and compared the result
Improved some instructions for LllamaEdge API server and my PR got merged : Link
Fine tuned with different iterations of dataset to test how it changes the performance

staru09 · 2024-06-01T06:25:48Z

Is there any test planned for this project?

juntao added the enhancement New feature or request label Apr 28, 2024

juntao changed the title ~~feat: Create an LLM agent for Rust code review~~ feat: Finetune LLM models for Rust coding assistance Apr 28, 2024

juntao added the LFX Mentorship Tasks for LFX Mentorship participants label Apr 28, 2024

hydai changed the title ~~feat: Finetune LLM models for Rust coding assistance~~ LFX Mentorship (Jun-Aug, 2024): Finetune LLM models for Rust coding assistance May 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LFX Mentorship (Jun-Aug, 2024): Finetune LLM models for Rust coding assistance #3371

LFX Mentorship (Jun-Aug, 2024): Finetune LLM models for Rust coding assistance #3371

juntao commented Apr 28, 2024 •

edited by alabulei1

DhruvSinghiitmandi commented Apr 28, 2024 •

edited

debrupf2946 commented May 12, 2024

debrupf2946 commented May 12, 2024

angad-singhh commented May 14, 2024 •

edited

angad-singhh commented May 16, 2024 •

edited

angad-singhh commented May 16, 2024 •

edited

codemaster1104 commented May 16, 2024 •

edited

angad-singhh commented May 16, 2024 •

edited

SHAIMA-HAQUE commented May 17, 2024

angad-singhh commented May 17, 2024 •

edited

codemaster1104 commented May 17, 2024

codemaster1104 commented May 18, 2024 •

edited

codemaster1104 commented May 18, 2024

codemaster1104 commented May 22, 2024 •

edited

codemaster1104 commented May 22, 2024

pawaspy commented May 25, 2024

codemaster1104 commented May 27, 2024

staru09 commented Jun 1, 2024

LFX Mentorship (Jun-Aug, 2024): Finetune LLM models for Rust coding assistance #3371

LFX Mentorship (Jun-Aug, 2024): Finetune LLM models for Rust coding assistance #3371

Comments

juntao commented Apr 28, 2024 • edited by alabulei1

Summary

Details

Objective 1: Code review model

Objective 2: Code QA model

Objective 3: Compare the two finetuned models

LFX

Appendix

DhruvSinghiitmandi commented Apr 28, 2024 • edited

debrupf2946 commented May 12, 2024

debrupf2946 commented May 12, 2024

angad-singhh commented May 14, 2024 • edited

angad-singhh commented May 16, 2024 • edited

Progress

What I plan to do:

What all I did:

Issues

angad-singhh commented May 16, 2024 • edited

Next

codemaster1104 commented May 16, 2024 • edited

angad-singhh commented May 16, 2024 • edited

SHAIMA-HAQUE commented May 17, 2024

angad-singhh commented May 17, 2024 • edited

Progress:

codemaster1104 commented May 17, 2024

codemaster1104 commented May 18, 2024 • edited

codemaster1104 commented May 18, 2024

codemaster1104 commented May 22, 2024 • edited

codemaster1104 commented May 22, 2024

pawaspy commented May 25, 2024

codemaster1104 commented May 27, 2024

My progress

staru09 commented Jun 1, 2024

juntao commented Apr 28, 2024 •

edited by alabulei1

DhruvSinghiitmandi commented Apr 28, 2024 •

edited

angad-singhh commented May 14, 2024 •

edited

angad-singhh commented May 16, 2024 •

edited

angad-singhh commented May 16, 2024 •

edited

codemaster1104 commented May 16, 2024 •

edited

angad-singhh commented May 16, 2024 •

edited

angad-singhh commented May 17, 2024 •

edited

codemaster1104 commented May 18, 2024 •

edited

codemaster1104 commented May 22, 2024 •

edited