Skip to content

Releases: ggerganov/llama.cpp

b3027

28 May 21:47
6bd12ce
Compare
Choose a tag to compare
sycl : fix assert (#7563)

b3026

28 May 21:19
5442939
Compare
Choose a tag to compare
llama : support small Granite models (#7481)

* Add optional MLP bias for Granite models

Add optional MLP bias for ARCH_LLAMA to support Granite models.
Partially addresses ggerganov/llama.cpp/issues/7116
Still needs some more changes to properly support Granite.

* llama: honor add_space_prefix from the model configuration

propagate the add_space_prefix configuration from the HF model
configuration to the gguf file and honor it with the gpt2 tokenizer.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

* llama: add support for small granite models

it works only for the small models 3b and 8b.

The convert-hf-to-gguf.py script uses the vocabulary size of the
granite models to detect granite and set the correct configuration.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

---------

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Co-authored-by: Steffen Roecker <sroecker@redhat.com>

b3025

28 May 19:50
56411a9
Compare
Choose a tag to compare
vulkan: properly initialize vulkan devices for LLAMA_SPLIT_MODE_NONE …

…(#7552)

b3024

28 May 18:37
2b737ca
Compare
Choose a tag to compare
rpc : resource management rework (#7562)

* rpc : resource management rework

* address review comments

b3023

28 May 17:41
ee3dff6
Compare
Choose a tag to compare
Add support for DeepseekV2ForCausalLM (#7519)

* common : increase max number of experts to 160

* common : add tensors ATTN_Q_A, ATTN_Q_A_NORM, ATTN_Q_B, ATTN_KV_A_MQA, ATTN_KV_A_NORM, ATTN_KV_B needed by DeepSeek-V2 MLA (multi-head latent attention) architecture

* common : add model header parameters: leading_dense_block_count, expert_feed_forward_length, expert_shared_count, expert_weights_scale, attention.q_lora_rank, attention.kv_lora_rank, rope.scaling.yarn_log_multiplier

* convert-hf : add model conversion support for DeepseekV2ForCausalLM

* llama : add model types for DeepSeek-V2 and DeepSeek-V2-Lite models

* llama : add two new llm_build_moe_ffn() arguments: scale_w (whether to scale weights of selected MoE experts) and w_scale (numerical value of the scaling factor)

* llama : add inference support for LLM_ARCH_DEEPSEEK2

---------

Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>

b3021

28 May 15:55
8b99e2a
Compare
Choose a tag to compare
llama : handle unknown utf8 bytes (#7588)

b3019

28 May 12:04
e2b0650
Compare
Choose a tag to compare
[SYCL]fix ggml_sycl_mul_mat_id() to match the change of api (#7436)

* fix mul_mat_id to match the change of api

* rm comment

* rm unused or duplicated code, rename as review comment

b3018

28 May 09:04
0548a41
Compare
Choose a tag to compare
ggml : generalize GGML_OP_CONCAT (#7563)

* ggml : generalize GGML_OP_CONCAT (WIP)

ggml-ci

* tests : add dim != 2 tests

* metal : generalize concat kernel

* tests : naming

* cuda : generalize concat kernel

ggml-ci

* sycl : add warning and assert

* ggml : fix op params handling

* metal : bugfix kernel

ggml-ci

* ggml : reimplement CPU and Metal

* cuda : add asserts

ggml-ci

* ggml : fix ptrs

ggml-ci

b3015

28 May 04:52
74b239b
Compare
Choose a tag to compare
llava : update clip.h (#7580)

overriden -> overridden

b3014

28 May 01:07
852aafb
Compare
Choose a tag to compare
update HIP_UMA #7399 (#7414)

* update HIP_UMA #7399

add use of hipMemAdviseSetCoarseGrain when LLAMA_HIP_UMA is enable.
- get x2 on prompte eval and x1.5 on token gen with rocm6.0 on ryzen 7940HX iGPU (780M/gfx1103)

* simplify code, more consistent style

---------

Co-authored-by: slaren <slarengh@gmail.com>