Skip to content

Latest commit

 

History

History
38 lines (25 loc) · 1.92 KB

news.md

File metadata and controls

38 lines (25 loc) · 1.92 KB

News

  • [2023/09] The newest llama2-wrapper>=0.1.14 supports llama.cpp's gguf models.

  • [2023/08] 🔥 For developers, we offer a web server that acts as a drop-in replacement for the OpenAI API.

    • Usage:

      python3 -m llama2_wrapper.server
      
  • [2023/08] 🔥 For developers, we released llama2-wrapper as a llama2 backend wrapper in PYPI.

    • Install: pip install llama2-wrapper

    • Usage:

      from llama2_wrapper import LLAMA2_WRAPPER, get_prompt 
      llama2_wrapper = LLAMA2_WRAPPER(
          model_path="./models/Llama-2-7B-Chat-GGML/llama-2-7b-chat.ggmlv3.q4_0.bin",
          backend_type="llama.cpp", #options: llama.cpp, transformers, gptq
      )
      prompt = "Do you know Pytorch"
      llama2_promt = get_prompt(prompt)
      answer = llama2_wrapper(llama2_promt, temperature=0.9)
  • [2023/08] 🔥 We added benchmark.py for users to benchmark llama2 models on their local devices.

    • Check/contribute the performance of your device in the full performance doc.
  • [2023/07] We released llama2-webui, a gradio web UI to run Llama 2 on GPU or CPU from anywhere (Linux/Windows/Mac).