Skip to content

Releases: li-plus/chatglm.cpp

v0.3.2

24 Apr 08:20
a46f474
Compare
Choose a tag to compare
  • Support p-tuning v2 finetuned models for ChatGLM family
  • Fix convert.py for lora models & chatglm3-6b-128k
  • Fix RoPE theta config for 32k/128k sequence length
  • Better cuda cmake script respecting nvcc version

v0.3.1

20 Jan 16:14
eff7f44
Compare
Choose a tag to compare
  • Support function calling in OpenAI api server
  • Faster repetition penalty sampling
  • Support max_new_tokens generation option

v0.3.0

22 Nov 03:08
b071907
Compare
Choose a tag to compare
  • Full functionality of ChatGLM3 including system prompt, function call and code interpreter
  • Brand new OpenAI-style chat API
  • Add token usage information in OpenAI api server to be compatible with LangChain frontend
  • Fix conversion error for chatglm3-6b-32k

v0.2.10

30 Oct 06:35
972b0de
Compare
Choose a tag to compare
  • Support ChatGLM3 in conversation mode.
  • Coming soon: new prompt format for system message and function call.

v0.2.9

22 Oct 03:03
02a6963
Compare
Choose a tag to compare
  • Support InternLM 7B & 20B model architectures

v0.2.8

10 Oct 16:24
f114c58
Compare
Choose a tag to compare
  • Metal backend support for all models (ChatGLM & ChatGLM2 & Baichuan-7B & Baichuan-13B)
  • Fix GLM generation on CUDA for long context

v0.2.7

28 Sep 13:23
9be06f0
Compare
Choose a tag to compare
  • Support Baichuan-7B model architecture (works for both Baichuan v1 & v2).
  • Minor bug fix and enhancement.

v0.2.6

31 Aug 11:50
bbf91da
Compare
Choose a tag to compare
  • Support Baichuan-13B on CPU & CUDA backends
  • Bug fix for Windows and Metal

v0.2.5

22 Aug 16:52
1cfac4a
Compare
Choose a tag to compare
  • Optimize context computing (GEMM) for metal backend
  • Support repetition penalty option for generation
  • Update Dockerfile for CPU & CUDA backends with full functionality, hosted on GHCR

v0.2.4

11 Aug 17:30
4055560
Compare
Choose a tag to compare
  • Python binding enhancement: support load-and-convert directly from original Hugging Face models. Intermediate GGML model files are no longer necessary.
  • Small fix for CLI demo on Windows.