Skip to content

v0.1.33

Compare
Choose a tag to compare
@github-actions github-actions released this 28 Apr 17:51
· 196 commits to main since this release
9164b01

Llama 3

New models:

  • Llama 3: a new model by Meta, and the most capable openly available LLM to date
  • Phi 3 Mini: a new 3.8B parameters, lightweight, state-of-the-art open model by Microsoft.
  • Moondream moondream is a small vision language model designed to run efficiently on edge devices.
  • Llama 3 Gradient 1048K: A Llama 3 fine-tune by Gradient to support up to a 1M token context window.
  • Dolphin Llama 3: The uncensored Dolphin model, trained by Eric Hartford and based on Llama 3 with a variety of instruction, conversational, and coding skills.
  • Qwen 110B: The first Qwen model over 100B parameters in size with outstanding performance in evaluations

What's Changed

  • Fixed issues where the model would not terminate, causing the API to hang.
  • Fixed a series of out of memory errors on Apple Silicon Macs
  • Fixed out of memory errors when running Mixtral architecture models

Experimental concurrency features

New concurrency features are coming soon to Ollama. They are available

  • OLLAMA_NUM_PARALLEL: Handle multiple requests simultaneously for a single model
  • OLLAMA_MAX_LOADED_MODELS: Load multiple models simultaneously

To enable these features, set the environment variables for ollama serve. For more info see this guide:

OLLAMA_NUM_PARALLEL=4 OLLAMA_MAX_LOADED_MODELS=4 ollama serve

New Contributors

Full Changelog: v0.1.32...v0.1.33