Releases · ggerganov/llama.cpp

28 May 21:47

6bd12ce

b3027

sycl : fix assert (#7563)

Assets 21

28 May 21:19

github-actions

b3026

5442939

b3026

llama : support small Granite models (#7481)

* Add optional MLP bias for Granite models

Add optional MLP bias for ARCH_LLAMA to support Granite models.
Partially addresses ggerganov/llama.cpp/issues/7116
Still needs some more changes to properly support Granite.

* llama: honor add_space_prefix from the model configuration

propagate the add_space_prefix configuration from the HF model
configuration to the gguf file and honor it with the gpt2 tokenizer.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

* llama: add support for small granite models

it works only for the small models 3b and 8b.

The convert-hf-to-gguf.py script uses the vocabulary size of the
granite models to detect granite and set the correct configuration.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

---------

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Co-authored-by: Steffen Roecker <sroecker@redhat.com>

Assets 21

28 May 19:50

github-actions

b3025

56411a9

b3025

vulkan: properly initialize vulkan devices for LLAMA_SPLIT_MODE_NONE …

…(#7552)

Assets 21

28 May 18:37

github-actions

b3024

2b737ca

b3024

rpc : resource management rework (#7562)

* rpc : resource management rework

* address review comments

Assets 21

28 May 17:41

github-actions

b3023

ee3dff6

b3023

Add support for DeepseekV2ForCausalLM (#7519)

* common : increase max number of experts to 160

* common : add tensors ATTN_Q_A, ATTN_Q_A_NORM, ATTN_Q_B, ATTN_KV_A_MQA, ATTN_KV_A_NORM, ATTN_KV_B needed by DeepSeek-V2 MLA (multi-head latent attention) architecture

* common : add model header parameters: leading_dense_block_count, expert_feed_forward_length, expert_shared_count, expert_weights_scale, attention.q_lora_rank, attention.kv_lora_rank, rope.scaling.yarn_log_multiplier

* convert-hf : add model conversion support for DeepseekV2ForCausalLM

* llama : add model types for DeepSeek-V2 and DeepSeek-V2-Lite models

* llama : add two new llm_build_moe_ffn() arguments: scale_w (whether to scale weights of selected MoE experts) and w_scale (numerical value of the scaling factor)

* llama : add inference support for LLM_ARCH_DEEPSEEK2

---------

Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>

Assets 21

28 May 15:55

github-actions

b3021

8b99e2a

b3021

llama : handle unknown utf8 bytes (#7588)

Assets 21

28 May 12:04

github-actions

b3019

e2b0650

b3019

[SYCL]fix ggml_sycl_mul_mat_id() to match the change of api (#7436)

* fix mul_mat_id to match the change of api

* rm comment

* rm unused or duplicated code, rename as review comment

Assets 21

28 May 09:04

github-actions

b3018

0548a41

b3018

ggml : generalize GGML_OP_CONCAT (#7563)

* ggml : generalize GGML_OP_CONCAT (WIP)

ggml-ci

* tests : add dim != 2 tests

* metal : generalize concat kernel

* tests : naming

* cuda : generalize concat kernel

ggml-ci

* sycl : add warning and assert

* ggml : fix op params handling

* metal : bugfix kernel

ggml-ci

* ggml : reimplement CPU and Metal

* cuda : add asserts

ggml-ci

* ggml : fix ptrs

ggml-ci

Assets 21

28 May 04:52

github-actions

b3015

74b239b

b3015

llava : update clip.h (#7580)

overriden -> overridden

Assets 21

28 May 01:07

github-actions

b3014

852aafb

b3014

update HIP_UMA #7399 (#7414)

* update HIP_UMA #7399

add use of hipMemAdviseSetCoarseGrain when LLAMA_HIP_UMA is enable.
- get x2 on prompte eval and x1.5 on token gen with rocm6.0 on ryzen 7940HX iGPU (780M/gfx1103)

* simplify code, more consistent style

---------

Co-authored-by: slaren <slarengh@gmail.com>

Assets 21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ggerganov/llama.cpp

b3027

b3026

b3025

b3024

b3023

b3021

b3019

b3018

b3015

b3014