Simple and Scalable Strategies to Continually Pre-train Large Language Models #95

rasbt · 2024-03-31T11:53:29Z

rasbt
Mar 31, 2024
Maintainer

I was reading the Simple and Scalable Strategies to Continually Pre-train Large Language Models paper (Ibrahim et al. 2024) that came out 2 weeks ago, and interestingly, the linear warmup + cosine decay I am using for the pretraining here (moved to Appendix D since Chapter 5 was exceeding the Manning page limits), is exactly the same schedule that works for well for continued pretraining, too!

I.e., the schedule we are using in Appendix D is as follows:

For continued pretraining, you would simply repeat it:

I thought that's a nice tidbit to share 😊.

I also uploaded a longer write-up discussing that research paper here: Tips for LLM Pretraining and Evaluating Reward Models

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simple and Scalable Strategies to Continually Pre-train Large Language Models #95

{{title}}

Replies: 0 comments

Select a reply

Simple and Scalable Strategies to Continually Pre-train Large Language Models #95

rasbt Mar 31, 2024 Maintainer

Replies: 0 comments

rasbt
Mar 31, 2024
Maintainer