Understanding the Instruction Mixture for Large Language Model Fine-tuning

1. Benchmark Performance and Alignment Skills

Benchmarks

LLaMA-2-7B

Data	ARC	Wino-grande	PIQA	MMLU	Race	Hella-Swag	Average	Human @1	Eval @10
None	43.09	69.53	77.97	40.81	39.23	57.20	54.64	13.72	21.34
A	47.78	67.64	78.24	42.19	44.50	61.09	56.91	13.48	17.07
C	46.08	69.46	78.50	40.99	41.05	60.96	56.17	16.22	24.39
P	49.57	71.43	79.00	45.98	43.45	59.44	58.15	4.63	7.93
AC	47.10	66.93	78.13	40.42	44.21	59.70	56.08	17.50	25
AP	48.38	70.01	78.07	43.84	42.87	58.46	56.94	13.84	17.68
CP	47.95	71.27	78.40	44.91	44.40	60.69	57.94	16.77	20.12
ACP	49.66	68.03	77.86	43.52	44.59	58.73	57.07	15.98	23.78

LLaMA-2-13B

Data	ARC	Wino-grande	PIQA	MMLU	Race	Hella-Swag	Average	Human @1	Eval @10
None	48.55	71.90	79.16	52.12	40.67	60.12	58.75	15.43	26.22
A	54.10	71.19	80.03	47.86	47.08	65.58	60.97	15.06	20.73
C	49.66	73.40	80.79	51.50	45.36	63.63	60.72	17.87	24.39
P	54.27	74.19	80.03	50.30	45.55	62.46	61.13	0.30	1.83
AC	51.62	68.75	80.58	48.68	44.40	62.97	59.50	17.07	27.44
AP	54.79	71.74	80.30	51.15	45.17	62.72	60.98	8.29	14.63
CP	55.38	74.59	80.52	51.42	45.55	63.85	61.89	18.23	25
ACP	54.44	71.51	80.03	49.98	47.08	63.14	61.03	20.24	32.93

Alignment Skills

LLaMA-2-7B

Data	Corr.	Fact.	Comm.	Compr.	Compl.	Insight.	Read.	Conc.	Avg.
A	47.6	55.4	58.8	54.8	48.0	50.4	88.0	81.6	60.6
C	48.8	52.0	58.4	52.0	40.2	46.2	83.8	78.4	57.4
P	47.2	40.0	48.8	38.4	29.0	30.4	64.4	68.6	45.8
AC	49.0	54.4	59.6	56.4	48.2	49.8	86.6	85.6	61.2
AP	48.4	51.4	57.6	52.6	45.0	46.0	84.2	80.8	58.2
CP	47.0	49.6	54.2	48.8	36.2	41.8	78.2	77.2	54.2
ACP	50.4	53.0	59.0	53.8	47.2	46.8	85.0	81.8	59.6

LLaMA-2-13B

Data	Corr.	Fact.	Comm.	Compr.	Compl.	Insight.	Read.	Conc.	Avg.
A	53.6	58.8	63.8	60.0	47.6	55.2	89.2	84.0	64.0
C	57.2	58.8	61.0	57.8	43.8	52.4	85.6	82.2	62.4
P	49.4	42.4	51.8	42.0	28.2	32.0	66.8	70.4	47.8
AC	55.6	61.0	66.6	61.2	51.4	54.0	88.4	86.6	65.6
AP	53.0	55.4	60.6	56.2	47.0	48.0	85.0	83.4	61.0
CP	53.0	53.2	57.4	53.4	39.0	45.2	81.2	82.6	58.2
ACP	51.6	55.6	61.8	57.0	47.0	48.6	87.0	83.0	61.4

2. Training

The following is a command using deepspeed with 4 GPUs, training LLaMA-2-7B on Alpaca dataset.

deepspeed --num_gpus=4 train.py \
  --model_name_or_path meta-llama/Llama-2-7b \
  --deepspeed src/deepspeed_z3_config.json \
  --architecture causal \
  --output_dir /ckpts/Llama-2-7b-A \
  --save_strategy no \
  --learning_rate 5e-5 \
  --warmup_ratio 0.03 \
  --num_p3_data 0 \
  --num_code_data 0 \
  --num_instruction_data 20000 \
  --per_device_train_batch_size 4 \
  --gradient_accumulation_steps 16 \
  --num_train_epochs 2 \
  --gradient_checkpointing False \
  --bf16 \
  --logging_steps 10

3. Checkpoints

LLaMA-2-7B	LLaMA-2-13B
Llama-2-7b-A	Llama-2-13b-A
Llama-2-7b-C	Llama-2-13b-C
Llama-2-7b-P	Llama-2-13b-P
Llama-2-7b-AC	Llama-2-13b-AC
Llama-2-7b-AP	Llama-2-13b-AP
Llama-2-7b-ACP	Llama-2-13b-ACP

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
train.py		train.py
train_lora.py		train_lora.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

scripts

scripts

src

src

.gitignore

.gitignore

README.md

README.md

train.py

train.py

train_lora.py

train_lora.py

Repository files navigation

Understanding the Instruction Mixture for Large Language Model Fine-tuning

1. Benchmark Performance and Alignment Skills

Benchmarks

LLaMA-2-7B

LLaMA-2-13B

Alignment Skills

LLaMA-2-7B

LLaMA-2-13B

2. Training

3. Checkpoints

About

Releases

Packages

Languages

Reason-Wang/InstructLLM

Folders and files

Latest commit

History

Repository files navigation

Understanding the Instruction Mixture for Large Language Model Fine-tuning

1. Benchmark Performance and Alignment Skills

Benchmarks

LLaMA-2-7B

LLaMA-2-13B

Alignment Skills

LLaMA-2-7B

LLaMA-2-13B

2. Training

3. Checkpoints

About

Topics

Resources

Stars

Watchers

Forks

Languages