Releases · ggerganov/llama.cpp

18 May 02:15

0583484

b2918

ggml : fix quants nans when all the group weights are very close to z…

…ero (#7313)

Assets 21

18 May 02:15

github-actions

b2917

ef277de

b2917 Latest

Latest

cmake : fix typo in AMDGPU_TARGETS (#7356)

Assets 21

cudart-llama-bin-win-cu11.7.1-x64.zip

293 MB 2024-05-18T02:15:36Z
cudart-llama-bin-win-cu12.2.0-x64.zip

413 MB 2024-05-18T02:15:45Z
llama-b2917-bin-macos-arm64.zip

41.3 MB 2024-05-18T02:15:57Z
llama-b2917-bin-macos-x64.zip

37.9 MB 2024-05-18T02:15:59Z
llama-b2917-bin-ubuntu-x64.zip

45.8 MB 2024-05-18T02:16:00Z
llama-b2917-bin-win-avx-x64.zip

6.61 MB 2024-05-18T02:16:02Z
llama-b2917-bin-win-avx2-x64.zip

6.59 MB 2024-05-18T02:16:03Z
llama-b2917-bin-win-avx512-x64.zip

6.61 MB 2024-05-18T02:16:04Z
llama-b2917-bin-win-clblast-x64.zip

7.79 MB 2024-05-18T02:16:05Z
llama-b2917-bin-win-cuda-cu11.7.1-x64.zip

65 MB 2024-05-18T02:16:06Z
Source code (zip)

2024-05-18T00:39:25Z
Source code (tar.gz)

2024-05-18T00:39:25Z

18 May 00:20

github-actions

b2916

b43272a

b2916

Unicode codepoint flags for custom regexs (#7245)

* Replace CODEPOINT_TYPE_* with codepoint_flags
* Update and bugfix brute force random test
* Deterministic brute force random test
* Unicode normalization NFD
* Get rid of BOM

Assets 21

17 May 17:55

github-actions

b2915

0fc1e82

b2915

CUDA: faster large batch FA without tensor cores (#7314)

Assets 21

17 May 15:49

github-actions

b2914

82ca83d

b2914

ROCm: use native CMake HIP support (#5966)

Supercedes #4024 and #4813.

CMake's native HIP support has become the
recommended way to add HIP code into a project (see
[here](https://rocm.docs.amd.com/en/docs-6.0.0/conceptual/cmake-packages.html#using-hip-in-cmake)).
This PR makes the following changes:

1. The environment variable `HIPCXX` or CMake option
`CMAKE_HIP_COMPILER` should be used to specify the HIP
compiler. Notably this shouldn't be `hipcc`, but ROCm's clang,
which usually resides in `$ROCM_PATH/llvm/bin/clang`. Previously
this was control by `CMAKE_C_COMPILER` and `CMAKE_CXX_COMPILER`.
Note that since native CMake HIP support is not yet available on
Windows, on Windows we fall back to the old behavior.

2. CMake option `CMAKE_HIP_ARCHITECTURES` is used to control the
GPU architectures to build for. Previously this was controled by
`GPU_TARGETS`.

3. Updated the Nix recipe to account for these new changes.

4. The GPU targets to build against in the Nix recipe is now
consistent with the supported GPU targets in nixpkgs.

5. Added CI checks for HIP on both Linux and Windows. On Linux, we test
both the new and old behavior.

The most important part about this PR is the separation of the
HIP compiler and the C/C++ compiler. This allows users to choose
a different C/C++ compiler if desired, compared to the current
situation where when building for ROCm support, everything must be
compiled with ROCm's clang.

~~Makefile is unchanged. Please let me know if we want to be
consistent on variables' naming because Makefile still uses
`GPU_TARGETS` to control architectures to build for, but I feel
like setting `CMAKE_HIP_ARCHITECTURES` is a bit awkward when you're
calling `make`.~~ Makefile used `GPU_TARGETS` but the README says
to use `AMDGPU_TARGETS`. For consistency with CMake, all usage of
`GPU_TARGETS` in Makefile has been updated to `AMDGPU_TARGETS`.

Thanks to the suggestion of @jin-eld, to maintain backwards
compatibility (and not break too many downstream users' builds), if
`CMAKE_CXX_COMPILER` ends with `hipcc`, then we still compile using
the original behavior and emit a warning that recommends switching
to the new HIP support. Similarly, if `AMDGPU_TARGETS` is set but
`CMAKE_HIP_ARCHITECTURES` is not, then we forward `AMDGPU_TARGETS`
to `CMAKE_HIP_ARCHITECTURES` to ease the transition to the new
HIP support.

Signed-off-by: Gavin Zhao <git@gzgz.dev>

Assets 21

17 May 14:58

github-actions

b2913

f4bd8b3

b2913

rpc : set SO_REUSEADDR for the server socket (#7320)

ref: #7293

Assets 21

17 May 12:03

github-actions

b2910

27b0406

b2910

llama : use n_embd_head_v when reshaping kqv (#7327)

* llama : use n_embd_head_v instead of n_embd_head_k when reshaping kqv

* llama : use n_embd_v_gqa and n_embd_head_v instead of n_embd_k_gqa and n_embd_head_k when making a view of cached value vectors.

---------

Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>

Assets 21

17 May 11:05

github-actions

b2909

29c60d8

b2909

tokenization: add warning for double BOS (#7332)

Assets 21

17 May 11:00

github-actions

b2908

359cbe3

b2908

ggml-quants, llama : removed excess checks (#7274)

Assets 21

17 May 09:52

github-actions

b2906

ee94172

b2906

server : add support for the RPC backend (#7305)

ref: #7292

Assets 21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ggerganov/llama.cpp

b2918

b2917

b2916

b2915

b2914

b2913

b2910

b2909

b2908

b2906