🚀 Efficiently Serving Large Language Models

💻 Welcome to the "Efficiently Serving Large Language Models" course! Instructed by Travis Addair, Co-Founder and CTO at Predibase, this course will deepen your understanding of serving LLM applications efficiently.

Course Summary

In this course, you'll delve into the optimization techniques necessary to efficiently serve Large Language Models (LLMs) to a large number of users. Here's what you can expect to learn and experience:

🤖 Auto-Regressive Models: Understand how auto-regressive large language models generate text token by token.

💻 LLM Inference Stack: Implement foundational elements of a modern LLM inference stack, including KV caching, continuous batching, and model quantization.

🛠️ LoRA Adapters: Explore the details of how Low Rank Adapters (LoRA) work and how batching techniques allow different LoRA adapters to be served to multiple customers simultaneously.

🚀 Hands-On Experience: Get hands-on with Predibase’s LoRAX framework inference server to see optimization techniques in action.

Key Points

🔎 Learn techniques like KV caching to speed up text generation in Large Language Models (LLMs).
💻 Write code to efficiently serve LLM applications to a large number of users while considering performance trade-offs.
🛠️ Explore the fundamentals of Low Rank Adapters (LoRA) and how Predibase implements them in the LoRAX framework inference server.

About the Instructor

🌟 Travis Addair is the Co-Founder and CTO at Predibase, bringing extensive expertise to guide you through efficiently serving Large Language Models (LLMs).

🔗 To enroll in the course or for further information, visit deeplearning.ai.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
images		images
.gitignore		.gitignore
Lesson_1-Text_Generation.ipynb		Lesson_1-Text_Generation.ipynb
Lesson_2-Batching.ipynb		Lesson_2-Batching.ipynb
Lesson_3-Continuous_Batching.ipynb		Lesson_3-Continuous_Batching.ipynb
Lesson_4-Quantization.ipynb		Lesson_4-Quantization.ipynb
Lesson_5-Low-Rank_Adaptation.ipynb		Lesson_5-Low-Rank_Adaptation.ipynb
Lesson_6-Multi-LoRA.ipynb		Lesson_6-Multi-LoRA.ipynb
Lesson_7-predibase_lorax.ipynb		Lesson_7-predibase_lorax.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

images

images

.gitignore

.gitignore

Lesson_1-Text_Generation.ipynb

Lesson_1-Text_Generation.ipynb

Lesson_2-Batching.ipynb

Lesson_2-Batching.ipynb

Lesson_3-Continuous_Batching.ipynb

Lesson_3-Continuous_Batching.ipynb

Lesson_4-Quantization.ipynb

Lesson_4-Quantization.ipynb

Lesson_5-Low-Rank_Adaptation.ipynb

Lesson_5-Low-Rank_Adaptation.ipynb

Lesson_6-Multi-LoRA.ipynb

Lesson_6-Multi-LoRA.ipynb

Lesson_7-predibase_lorax.ipynb

Lesson_7-predibase_lorax.ipynb

README.md

README.md

Repository files navigation

🚀 Efficiently Serving Large Language Models

Course Summary

Key Points

About the Instructor

About

Releases

Packages

Languages

ksm26/Efficiently-Serving-LLMs

Folders and files

Latest commit

History

Repository files navigation

🚀 Efficiently Serving Large Language Models

Course Summary

Key Points

About the Instructor

About

Topics

Resources

Stars

Watchers

Forks

Languages