Skip to content

Releases: ggerganov/llama.cpp

b2918

18 May 02:15
0583484
Compare
Choose a tag to compare
ggml : fix quants nans when all the group weights are very close to z…

…ero (#7313)

b2917

18 May 02:15
ef277de
Compare
Choose a tag to compare
cmake : fix typo in AMDGPU_TARGETS (#7356)

b2916

18 May 00:20
b43272a
Compare
Choose a tag to compare
Unicode codepoint flags for custom regexs (#7245)

* Replace CODEPOINT_TYPE_* with codepoint_flags
* Update and bugfix brute force random test
* Deterministic brute force random test
* Unicode normalization NFD
* Get rid of BOM

b2915

17 May 17:55
0fc1e82
Compare
Choose a tag to compare
CUDA: faster large batch FA without tensor cores (#7314)

b2914

17 May 15:49
82ca83d
Compare
Choose a tag to compare
ROCm: use native CMake HIP support (#5966)

Supercedes #4024 and #4813.

CMake's native HIP support has become the
recommended way to add HIP code into a project (see
[here](https://rocm.docs.amd.com/en/docs-6.0.0/conceptual/cmake-packages.html#using-hip-in-cmake)).
This PR makes the following changes:

1. The environment variable `HIPCXX` or CMake option
`CMAKE_HIP_COMPILER` should be used to specify the HIP
compiler. Notably this shouldn't be `hipcc`, but ROCm's clang,
which usually resides in `$ROCM_PATH/llvm/bin/clang`. Previously
this was control by `CMAKE_C_COMPILER` and `CMAKE_CXX_COMPILER`.
Note that since native CMake HIP support is not yet available on
Windows, on Windows we fall back to the old behavior.

2. CMake option `CMAKE_HIP_ARCHITECTURES` is used to control the
GPU architectures to build for. Previously this was controled by
`GPU_TARGETS`.

3. Updated the Nix recipe to account for these new changes.

4. The GPU targets to build against in the Nix recipe is now
consistent with the supported GPU targets in nixpkgs.

5. Added CI checks for HIP on both Linux and Windows. On Linux, we test
both the new and old behavior.

The most important part about this PR is the separation of the
HIP compiler and the C/C++ compiler. This allows users to choose
a different C/C++ compiler if desired, compared to the current
situation where when building for ROCm support, everything must be
compiled with ROCm's clang.

~~Makefile is unchanged. Please let me know if we want to be
consistent on variables' naming because Makefile still uses
`GPU_TARGETS` to control architectures to build for, but I feel
like setting `CMAKE_HIP_ARCHITECTURES` is a bit awkward when you're
calling `make`.~~ Makefile used `GPU_TARGETS` but the README says
to use `AMDGPU_TARGETS`. For consistency with CMake, all usage of
`GPU_TARGETS` in Makefile has been updated to `AMDGPU_TARGETS`.

Thanks to the suggestion of @jin-eld, to maintain backwards
compatibility (and not break too many downstream users' builds), if
`CMAKE_CXX_COMPILER` ends with `hipcc`, then we still compile using
the original behavior and emit a warning that recommends switching
to the new HIP support. Similarly, if `AMDGPU_TARGETS` is set but
`CMAKE_HIP_ARCHITECTURES` is not, then we forward `AMDGPU_TARGETS`
to `CMAKE_HIP_ARCHITECTURES` to ease the transition to the new
HIP support.

Signed-off-by: Gavin Zhao <git@gzgz.dev>

b2913

17 May 14:58
f4bd8b3
Compare
Choose a tag to compare
rpc : set SO_REUSEADDR for the server socket (#7320)

ref: #7293

b2910

17 May 12:03
27b0406
Compare
Choose a tag to compare
llama : use n_embd_head_v when reshaping kqv (#7327)

* llama : use n_embd_head_v instead of n_embd_head_k when reshaping kqv

* llama : use n_embd_v_gqa and n_embd_head_v instead of n_embd_k_gqa and n_embd_head_k when making a view of cached value vectors.

---------

Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>

b2909

17 May 11:05
29c60d8
Compare
Choose a tag to compare
tokenization: add warning for double BOS (#7332)

b2908

17 May 11:00
359cbe3
Compare
Choose a tag to compare
ggml-quants, llama : removed excess checks (#7274)

b2906

17 May 09:52
ee94172
Compare
Choose a tag to compare
server : add support for the RPC backend (#7305)

ref: #7292