Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fine-tuning codellama dataset #1

Open
rajivpoddar opened this issue Sep 21, 2023 · 14 comments
Open

Fine-tuning codellama dataset #1

rajivpoddar opened this issue Sep 21, 2023 · 14 comments

Comments

@rajivpoddar
Copy link

Is there a particular dataset format required for finetuning codellama? I have the dataset in the OpenAI suggested format which is basically a jsonl with each entry having messages: [{role: 'system', content: ''}, {role: 'user', content: ''}, {role: 'assistant', content: ''}]} object. Will this format work?

@okuvshynov
Copy link
Owner

I only tried it on plaintext now. Do you have some example/link to more detailed format description and/or some open dataset in that format so I can check?

@Naozumi520
Copy link

Naozumi520 commented Sep 27, 2023

Can I use the Alpaca Dataset json format?

[
{
"instruction": "Give three tips for staying healthy.",
"input": "",
"output": "1. Eat a balanced and nutritious diet: Make sure your meals are inclusive of a variety of fruits and vegetables, lean protein, whole grains, and healthy fats. This helps to provide your body with the essential nutrients to function at its best and can help prevent chronic diseases.\n\n2. Engage in regular physical activity: Exercise is crucial for maintaining strong bones, muscles, and cardiovascular health. Aim for at least 150 minutes of moderate aerobic exercise or 75 minutes of vigorous exercise each week.\n\n3. Get enough sleep: Getting enough quality sleep is crucial for physical and mental well-being. It helps to regulate mood, improve cognitive function, and supports healthy growth and immune function. Aim for 7-9 hours of sleep each night."
},
{
"instruction": "What are the three primary colors?",
"input": "",
"output": "The three primary colors are red, blue, and yellow. These colors are called primary because they cannot be created by mixing other colors and all other colors can be made by combining them in various proportions. In the additive color system, used for light, the primary colors are red, green, and blue (RGB)."
},
{
"instruction": "Describe the structure of an atom.",
"input": "",
"output": "An atom is the basic building block of all matter and is made up of three types of particles: protons, neutrons, and electrons. The structure of an atom can be described as a nucleus at the center surrounded by a cloud of electrons.\n\nThe nucleus of an atom is made up of protons and neutrons. Protons are positively charged particles and neutrons are neutral particles with no charge. Both of these particles are located in the nucleus of the atom, which is at the center of the atom and contains most of the atom's mass.\n\nSurrounding the nucleus of the atom is a cloud of electrons. Electrons are negatively charged particles that are in constant motion around the nucleus. The electron cloud is divided into shells or orbitals, and each shell can hold a certain number of electrons. The number of electrons in the outermost shell, called the valence shell, determines the chemical properties of the atom. \n\nIn a neutral atom, the number of protons in the nucleus is equal to the number of electrons in the electron cloud, so the positive and negative charges balance out and the atom has no overall charge. The number of protons, also called the atomic number, determines what element the atom is."
}
]

@okuvshynov
Copy link
Owner

@Naozumi520 - let me try that out and update if needed

@okuvshynov
Copy link
Owner

@Naozumi520 -- needs some work, but should be possible. will update here once I get it running successfully.

@okuvshynov
Copy link
Owner

Testing a similar dataset (https://huggingface.co/datasets/databricks/databricks-dolly-15k) in https://github.com/okuvshynov/slowllama/tree/try_dolly

@okuvshynov
Copy link
Owner

it works in a sense the loss is going down. Here's the log of finetuning llama7b on first 100 samples from dolly15k:

2023-09-29 20:16:01,559 backprop done, loss after forward pass = 1.4820520877838135
2023-09-29 20:18:45,965 backprop done, loss after forward pass = 1.5263310670852661
2023-09-29 20:21:30,439 backprop done, loss after forward pass = 1.518573522567749
2023-09-29 20:24:15,125 backprop done, loss after forward pass = 1.504337191581726
2023-09-29 20:26:59,292 backprop done, loss after forward pass = 1.4659136533737183
2023-09-29 20:29:43,427 backprop done, loss after forward pass = 1.351263403892517
2023-09-29 20:32:27,682 backprop done, loss after forward pass = 1.4510427713394165
2023-09-29 20:35:11,790 backprop done, loss after forward pass = 1.3999525308609009
2023-09-29 20:37:56,044 backprop done, loss after forward pass = 1.481783390045166
2023-09-29 20:40:40,069 backprop done, loss after forward pass = 1.4426220655441284
2023-09-29 20:43:24,492 backprop done, loss after forward pass = 1.6404242515563965
2023-09-29 20:46:08,789 backprop done, loss after forward pass = 1.3555759191513062
2023-09-29 20:48:53,262 backprop done, loss after forward pass = 1.4560085535049438
2023-09-29 20:51:37,717 backprop done, loss after forward pass = 1.3061898946762085
2023-09-29 20:54:22,130 backprop done, loss after forward pass = 1.2744556665420532
2023-09-29 20:57:06,154 backprop done, loss after forward pass = 1.5918478965759277
2023-09-29 20:59:50,252 backprop done, loss after forward pass = 1.434745192527771
2023-09-29 21:02:34,730 backprop done, loss after forward pass = 1.3543779850006104
2023-09-29 21:05:19,146 backprop done, loss after forward pass = 1.5274088382720947
2023-09-29 21:08:03,576 backprop done, loss after forward pass = 1.5886287689208984
2023-09-29 21:16:07,726 backprop done, loss after forward pass = 1.475629448890686
2023-09-29 21:18:51,732 backprop done, loss after forward pass = 1.0501446723937988
2023-09-29 21:21:35,714 backprop done, loss after forward pass = 1.540245771408081
2023-09-29 21:24:19,404 backprop done, loss after forward pass = 1.3473966121673584
2023-09-29 21:27:03,198 backprop done, loss after forward pass = 1.5435492992401123
2023-09-29 21:29:46,962 backprop done, loss after forward pass = 1.4546515941619873
2023-09-29 21:32:30,513 backprop done, loss after forward pass = 1.342623233795166
2023-09-29 21:35:14,475 backprop done, loss after forward pass = 1.1281847953796387
2023-09-29 21:37:58,219 backprop done, loss after forward pass = 1.432375192642212
2023-09-29 21:40:42,139 backprop done, loss after forward pass = 1.2846266031265259
2023-09-29 21:43:25,957 backprop done, loss after forward pass = 1.2692879438400269
2023-09-29 21:46:09,749 backprop done, loss after forward pass = 1.5995171070098877
2023-09-29 21:48:53,466 backprop done, loss after forward pass = 1.1306545734405518
2023-09-29 21:51:37,299 backprop done, loss after forward pass = 0.7813082933425903
2023-09-29 21:54:20,902 backprop done, loss after forward pass = 1.208915114402771
2023-09-29 21:57:04,558 backprop done, loss after forward pass = 1.2014524936676025
2023-09-29 21:59:48,101 backprop done, loss after forward pass = 1.4039008617401123
2023-09-29 22:02:31,802 backprop done, loss after forward pass = 1.302533507347107
2023-09-29 22:05:15,574 backprop done, loss after forward pass = 1.2679609060287476
2023-09-29 22:07:59,473 backprop done, loss after forward pass = 1.1783111095428467
2023-09-29 22:15:56,178 backprop done, loss after forward pass = 1.2000060081481934
2023-09-29 22:18:39,904 backprop done, loss after forward pass = 1.295318841934204
2023-09-29 22:21:23,685 backprop done, loss after forward pass = 1.1048883199691772
2023-09-29 22:24:07,373 backprop done, loss after forward pass = 1.2765086889266968
2023-09-29 22:26:51,062 backprop done, loss after forward pass = 1.2825970649719238
2023-09-29 22:29:35,141 backprop done, loss after forward pass = 1.2523887157440186
2023-09-29 22:32:18,925 backprop done, loss after forward pass = 1.2723214626312256
2023-09-29 22:35:02,857 backprop done, loss after forward pass = 1.279806137084961
2023-09-29 22:37:46,635 backprop done, loss after forward pass = 0.8528009057044983
2023-09-29 22:40:30,629 backprop done, loss after forward pass = 1.33704674243927
2023-09-29 22:43:14,493 backprop done, loss after forward pass = 1.0138554573059082
2023-09-29 22:45:58,450 backprop done, loss after forward pass = 1.1424448490142822
2023-09-29 22:48:42,447 backprop done, loss after forward pass = 0.7578861713409424
2023-09-29 22:51:26,508 backprop done, loss after forward pass = 1.2063584327697754
2023-09-29 22:54:10,273 backprop done, loss after forward pass = 1.001901388168335
2023-09-29 22:56:54,241 backprop done, loss after forward pass = 1.0977585315704346
2023-09-29 22:59:37,826 backprop done, loss after forward pass = 1.0939297676086426
2023-09-29 23:02:21,662 backprop done, loss after forward pass = 0.9243329167366028
2023-09-29 23:05:05,658 backprop done, loss after forward pass = 0.9887094497680664
2023-09-29 23:07:49,294 backprop done, loss after forward pass = 1.1260327100753784
2023-09-29 23:15:52,142 backprop done, loss after forward pass = 1.0478556156158447
2023-09-29 23:18:36,116 backprop done, loss after forward pass = 0.8907672762870789
2023-09-29 23:21:19,710 backprop done, loss after forward pass = 1.172255516052246
2023-09-29 23:24:03,280 backprop done, loss after forward pass = 0.9995911717414856
2023-09-29 23:26:46,982 backprop done, loss after forward pass = 0.8766883611679077
2023-09-29 23:29:30,923 backprop done, loss after forward pass = 0.7651273012161255
2023-09-29 23:32:14,824 backprop done, loss after forward pass = 0.7353774309158325
2023-09-29 23:34:58,596 backprop done, loss after forward pass = 0.9498685598373413
2023-09-29 23:37:42,186 backprop done, loss after forward pass = 0.8874177932739258
2023-09-29 23:40:26,129 backprop done, loss after forward pass = 1.0564229488372803
2023-09-29 23:43:09,843 backprop done, loss after forward pass = 0.7835224866867065
2023-09-29 23:45:53,778 backprop done, loss after forward pass = 0.4765768051147461
2023-09-29 23:48:37,538 backprop done, loss after forward pass = 0.8063074350357056
2023-09-29 23:51:21,305 backprop done, loss after forward pass = 0.8783722519874573
2023-09-29 23:54:05,004 backprop done, loss after forward pass = 0.6556289196014404
2023-09-29 23:56:48,755 backprop done, loss after forward pass = 0.9239391088485718
2023-09-29 23:59:32,323 backprop done, loss after forward pass = 0.8580363988876343
2023-09-30 00:02:16,070 backprop done, loss after forward pass = 0.8075824975967407

However, there are a few important things to do to improve:

Still, you can try it by doing something like

python prepare_model.py # <-- use path where you have the model
python finetune_dolly.py # change the dataset and path as well.

@Naozumi520
Copy link

Wow, thank you!

@Naozumi520
Copy link

I tried on my intel Mac as my m2 Mac is not in my home. However, I got the error MPS backend out of memory. Does I have to use the m2 Mac?

@okuvshynov
Copy link
Owner

okuvshynov commented Sep 30, 2023

I don't know much about intel Mac - I assumed 'mps' device is a GPU was first introduced in Apple silicon devices, starting with Apple M1. Maybe that framework works with older GPUs too, not sure. Could you post your intel mac config? Maybe it's possible to run it on its GPU? Running on CPU will be likely way too slow.

@Naozumi520
Copy link

Yes, mps works with older GPUs too, including intel Macs (AMD gpu). My Mac is i9 cpu with 5500m gpu (4gb vram). Apple silicon use ram as vram so the ops backend memory should not be a problem. However, the intel Mac I'm now using have only 4gb of vram, and because of the long holiday now in Hong Kong I couldn't take my m2 Mac. :(

@okuvshynov
Copy link
Owner

I see. 4Gb is a little too low - might be still possible to get it working with short sequence + tiny batch size (=1). In this case it is probably critical to implement gradient accumulation though.

@okuvshynov
Copy link
Owner

It is probably also possible to do 2-level prefetch (HDD->RAM->vRAM), while with unified memory I did everything in 1 level.

@okuvshynov
Copy link
Owner

@Naozumi520 -- after b87cd7c it's possible to do both storage and finetuning in fp16 datatype which cuts both compute and RAM requirements considerably.

@Naozumi520
Copy link

wow, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants