Releases: li-plus/chatglm.cpp
Releases · li-plus/chatglm.cpp
v0.3.2
- Support p-tuning v2 finetuned models for ChatGLM family
- Fix convert.py for lora models & chatglm3-6b-128k
- Fix RoPE theta config for 32k/128k sequence length
- Better cuda cmake script respecting nvcc version
v0.3.1
- Support function calling in OpenAI api server
- Faster repetition penalty sampling
- Support max_new_tokens generation option
v0.3.0
- Full functionality of ChatGLM3 including system prompt, function call and code interpreter
- Brand new OpenAI-style chat API
- Add token usage information in OpenAI api server to be compatible with LangChain frontend
- Fix conversion error for chatglm3-6b-32k
v0.2.10
- Support ChatGLM3 in conversation mode.
- Coming soon: new prompt format for system message and function call.
v0.2.9
- Support InternLM 7B & 20B model architectures
v0.2.8
- Metal backend support for all models (ChatGLM & ChatGLM2 & Baichuan-7B & Baichuan-13B)
- Fix GLM generation on CUDA for long context
v0.2.7
- Support Baichuan-7B model architecture (works for both Baichuan v1 & v2).
- Minor bug fix and enhancement.
v0.2.6
- Support Baichuan-13B on CPU & CUDA backends
- Bug fix for Windows and Metal
v0.2.5
- Optimize context computing (GEMM) for metal backend
- Support repetition penalty option for generation
- Update Dockerfile for CPU & CUDA backends with full functionality, hosted on GHCR
v0.2.4
- Python binding enhancement: support load-and-convert directly from original Hugging Face models. Intermediate GGML model files are no longer necessary.
- Small fix for CLI demo on Windows.