skywork.cpp | 天工大模型C语言的实现

本项目为个人研究使用，还在完善中
基于 llama.cpp 通过 C/C++ 来实现的大模型运行环境，可以通过 CPU 就可以直接运行天工大模型。适合Mac Apple Silicon的笔记本或手头上没有显卡的同学

使用说明

克隆本仓库

git clone https://github.com/yxq321/skywork.cpp
cd skywork.cpp

安装相关依赖并编译

在 Linux 或 MacOS上，运行make:

make

从Huggingface上下载大模型

安装依赖:

pip3 install huggingface_hub

运行download-skywork.py下载天工大模型，默认存放在 ~/.cache/huggingface/hub/ 目录下

skywork.cpp# python3 download-skywork.py
Downloading (…)bcec297346/README.md: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 21.8k/21.8k [00:00<00:00, 110MB/s]
Downloading (…)5%8D%8F%E8%AE%AE.pdf: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 266k/266k [00:00<00:00, 32.3MB/s]
Downloading (…)iguration_skywork.py: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 3.12k/3.12k [00:00<00:00, 22.2MB/s]
Downloading (…)neration_config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 182/182 [00:00<00:00, 1.79MB/s]
Downloading (…)ec297346/config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 733/733 [00:00<00:00, 6.77MB/s]
Downloading (…)sc/skywork_logo.jpeg: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 78.9k/78.9k [00:00<00:00, 55.9MB/s]
Downloading chat_demo_1.gif: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.37M/2.37M [00:00<00:00, 205MB/s]
Downloading chat_demo_2.gif: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 108k/108k [00:00<00:00, 46.2MB/s]
Downloading (…)isc/skywork_icon.png: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 8.10k/8.10k [00:00<00:00, 48.7MB/s]
Downloading (…)97346/.gitattributes: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 5.46k/5.46k [00:00<00:00, 39.0MB/s]
Downloading chat_demo_3.gif: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 556k/556k [00:00<00:00, 51.1MB/s]
Downloading (…)c/stage1_metrics.png: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 270k/270k [00:00<00:00, 87.1MB/s]
Downloading (…)isc/stage2_ceval.png: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 128k/128k [00:00<00:00, 66.0MB/s]
Downloading (…)sc/training_loss.png: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 30.7k/30.7k [00:00<00:00, 52.7MB/s]
...
Downloading (…)okenizer_config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 857/857 [00:00<00:00, 5.30MB/s]
Downloading (…)l-00049-of-00053.bin: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 510M/510M [00:02<00:00, 235MB/s]
Downloading (…)l-00051-of-00053.bin: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 510M/510M [00:02<00:00, 217MB/s]
Downloading (…)l-00046-of-00053.bin: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 510M/510M [00:05<00:00, 88.1MB/s]
Downloading (…)l-00050-of-00053.bin: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 510M/510M [00:04<00:00, 111MB/s]
Downloading (…)l-00052-of-00053.bin: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 510M/510M [00:04<00:00, 116MB/s]
Downloading (…)l-00053-of-00053.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.21G/1.21G [00:05<00:00, 203MB/s]

转换成GGUF格式

参照下面运行 convert-skywork-hf-to-gguf.py，请替换成本机实际路径:

skywork.cpp# python3 convert-skywork-hf-to-gguf.py  ~/.cache/huggingface/hub/models--Skywork--Skywork-13B-Base/snapshots/2f15ad62f302f9e0015ec941dd4eeabcec297346/ 1 --outfile=skywork-f16.gguf
gguf: Conversion Endianess 0
gguf: loading model 2f15ad62f302f9e0015ec941dd4eeabcec297346
hello print:  SkyworkForCausalLM
gguf: found 53 model parts
num_parts:53

This gguf file is for Little Endian only
gguf: get model metadata
gguf: get tokenizer metadata
gguf: get sentencepiece tokenizer vocab, scores and token types
gguf: Setting special token type bos to 1
gguf: Setting special token type eos to 2
gguf: Setting special token type pad to 0
gguf: get tensor metadata
gguf: loading model part 'pytorch_model-00001-of-00053.bin'
model.layers.0.input_layernorm.weight -> blk.0.attn_norm.weight, n_dims = 1, torch.bfloat16 --> float32
model.layers.0.post_attention_layernorm.weight -> blk.0.ffn_norm.weight, n_dims = 1, torch.bfloat16 --> float32
model.layers.0.self_attn.q_proj.weight -> blk.0.attn_q.weight, n_dims = 2, torch.bfloat16 --> float16
model.layers.0.self_attn.k_proj.weight -> blk.0.attn_k.weight, n_dims = 2, torch.bfloat16 --> float16
model.layers.0.self_attn.v_proj.weight -> blk.0.attn_v.weight, n_dims = 2, torch.bfloat16 --> float16
model.layers.0.self_attn.o_proj.weight -> blk.0.attn_output.weight, n_dims = 2, torch.bfloat16 --> float16
...
model.layers.51.self_attn.k_proj.weight -> blk.51.attn_k.weight, n_dims = 2, torch.bfloat16 --> float16
model.layers.51.self_attn.v_proj.weight -> blk.51.attn_v.weight, n_dims = 2, torch.bfloat16 --> float16
model.layers.51.self_attn.o_proj.weight -> blk.51.attn_output.weight, n_dims = 2, torch.bfloat16 --> float16
model.layers.51.mlp.gate_proj.weight -> blk.51.ffn_gate.weight, n_dims = 2, torch.bfloat16 --> float16
model.layers.51.mlp.up_proj.weight -> blk.51.ffn_up.weight, n_dims = 2, torch.bfloat16 --> float16
model.layers.51.mlp.down_proj.weight -> blk.51.ffn_down.weight, n_dims = 2, torch.bfloat16 --> float16
gguf: loading model part 'pytorch_model-00053-of-00053.bin'
model.norm.weight -> output_norm.weight, n_dims = 1, torch.bfloat16 --> float32
model.embed_tokens.weight -> token_embd.weight, n_dims = 2, torch.bfloat16 --> float16
lm_head.weight -> output.weight, n_dims = 2, torch.bfloat16 --> float16
gguf: write header
gguf: write metadata
gguf: write tensors
gguf: model successfully exported to 'skywork-f16.gguf'

(可选)量化

如果本机内存不够，可以在上面基础上进一步量化，由16位量化成4位，方法如下:

skywork.cpp# ./quantize skywork-f16.gguf skywork-q4_0.gguf q4_0
main: build = 1498 (e980088)
main: built with cc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609 for x86_64-linux-gnu
main: quantizing 'skywork-f16.gguf' to 'skywork-q4_0.gguf' as Q4_0
llama_model_loader: loaded meta data with 18 key-value pairs and 471 tensors from skywork-f16.gguf (version GGUF V3 (latest))
llama_model_loader: - tensor    0:           blk.0.attn_norm.weight f32      [  4608,     1,     1,     1 ]
llama_model_loader: - tensor    1:            blk.0.ffn_norm.weight f32      [  4608,     1,     1,     1 ]
llama_model_loader: - tensor    2:              blk.0.attn_q.weight f16      [  4608,  4608,     1,     1 ]
llama_model_loader: - tensor    3:              blk.0.attn_k.weight f16      [  4608,  4608,     1,     1 ]
llama_model_loader: - tensor    4:              blk.0.attn_v.weight f16      [  4608,  4608,     1,     1 ]
llama_model_loader: - tensor    5:         blk.0.attn_output.weight f16      [  4608,  4608,     1,     1 ]
llama_model_loader: - tensor    6:            blk.0.ffn_gate.weight f16      [  4608, 12288,     1,     1 ]
[ 468/ 471]               blk.51.ffn_down.weight - [12288,  4608,     1,     1], type =    f16, quantizing to q4_0 .. size =   108.00 MB ->    30.38 MB | hist: 0.036 0.015 0.024 0.037 0.055 0.076 0.097 0.115 0.122 0.115 0.097 0.076 0.055 0.037 0.024 0.020
[ 469/ 471]                   output_norm.weight - [ 4608,     1,     1,     1], type =    f32, size =    0.018 MB
[ 470/ 471]                    token_embd.weight - [ 4608, 65519,     1,     1], type =    f16, quantizing to q4_0 .. size =   575.85 MB ->   161.96 MB | hist: 0.036 0.015 0.025 0.038 0.056 0.077 0.097 0.112 0.118 0.112 0.097 0.077 0.056 0.038 0.025 0.021
[ 471/ 471]                        output.weight - [ 4608, 65519,     1,     1], type =    f16, quantizing to q6_K .. size =   575.85 MB ->   236.19 MB | hist:
llama_model_quantize_internal: model size  = 26425.55 MB
llama_model_quantize_internal: quant size  =  7507.74 MB
llama_model_quantize_internal: hist: 0.036 0.016 0.025 0.039 0.056 0.077 0.096 0.112 0.118 0.112 0.096 0.077 0.056 0.039 0.025 0.021

main: quantize time = 43989.90 ms
main:    total time = 43989.90 ms
skywork.cpp# ls -lh *.gguf
-rw-r--r-- 1 root root  26G Nov  8 07:23 skywork-f16.gguf
-rw-r--r-- 1 root root 7.4G Nov  8 08:05 skywork-q4_0.gguf

运行

skywork.cpp# ./main -m skywork-f16.gguf -p "陕西的省会是西安"
Log start
main: build = 1498 (e980088)
main: built with cc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609 for x86_64-linux-gnu
main: seed  = 1699428993
llama_model_loader: loaded meta data with 18 key-value pairs and 471 tensors from skywork-f16.gguf (version GGUF V3 (latest))
llama_model_loader: - tensor    0:           blk.0.attn_norm.weight f32      [  4608,     1,     1,     1 ]
llama_model_loader: - tensor    1:            blk.0.ffn_norm.weight f32      [  4608,     1,     1,     1 ]
llama_model_loader: - tensor    2:              blk.0.attn_q.weight f16      [  4608,  4608,     1,     1 ]
llama_model_loader: - tensor    3:              blk.0.attn_k.weight f16      [  4608,  4608,     1,     1 ]
llama_model_loader: - tensor    4:              blk.0.attn_v.weight f16      [  4608,  4608,     1,     1 ]
...
llama_model_loader: - type  f32:  105 tensors
llama_model_loader: - type  f16:  366 tensors
llm_load_vocab: mismatch in special tokens definition ( 1847/65519 vs 259/65519 ).
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = skywork
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 65519
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: n_ctx_train      = 131072
llm_load_print_meta: n_embd           = 4608
llm_load_print_meta: n_head           = 36
llm_load_print_meta: n_head_kv        = 36
llm_load_print_meta: n_layer          = 52
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_gqa            = 1
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-06
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff             = 12288
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 131072
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: model type       = 13B
llm_load_print_meta: model ftype      = mostly F16 (guessed)
llm_load_print_meta: model params     = 13.85 B
llm_load_print_meta: model size       = 25.81 GiB (16.00 BPW)
llm_load_print_meta: general.name   = 2f15ad62f302f9e0015ec941dd4eeabcec297346
llm_load_print_meta: BOS token = 1 '<s>'
llm_load_print_meta: EOS token = 2 '</s>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: PAD token = 0 '<unk>'
llm_load_print_meta: LF token  = 13 '<0x0A>'
llm_load_tensors: ggml ctx size =    0.17 MB
llm_load_tensors: mem required  = 26425.72 MB
.................................................................................................
llama_new_context_with_model: n_ctx      = 512
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: kv self size  =  468.00 MB
llama_build_graph: non-view tensors processed: 1096/1096
llama_new_context_with_model: compute buffer total size = 143.60 MB

system_info: n_threads = 20 / 40 | AVX = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS
= 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
sampling:
        repeat_last_n = 64, repeat_penalty = 1.100, frequency_penalty = 0.000, presence_penalty = 0.000
        top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
        mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 0


陕西的省会是西安，在古代就是著名的长安城，位于现在的关中地区。西安市，也被誉为十三朝古都。西安，秦始皇兵马俑坑，是世界八大奇迹之一。
所以，秦朝时期是我国历史上第一个大一统封建王朝。西北的首都所在。秦朝、唐朝、宋朝等都属于中原文化区。

公元前221年，唐太宗李世民的长安城，唐长安是当时世界上最大的城市，它是唐朝都城长安是当时亚洲首屈一指的商业中心，拥有人口最多最繁华的国际贸易都市，是丝绸之路起点！
西安人吃“十三绝”
唐代长安城的中轴线朱雀大街就是今天的解放路。【西安钟楼、西安城墙、西安交通枢纽、西安钟楼、西安钟楼、西安钟楼，以西安古为背景，这个地方应该有很多历史遗迹、文化古迹和博物馆，是西安最繁华地段，是游客最多的地方
所以在晚上最热闹了，现在已经不是旅游景点，但依然是西安最古老的一条街。
中山路，西边就是以前的街道，东边是城墙，这里非常适合散步，而到回民区！
陕西省博物馆是一定要去看看。西安这座城市的标志性建筑之一。2013年才开放，在历史上有着十分重要的地位，有很多人在那里的地方，是由很多的，而且还有很多的景点都可以参观，
这里有非常多的人文景观，很值得来打卡拍照呢。

TODO

运行效果不太理想。有可能转换模型的时候出的问题，或者提示词配置不对。有时间再研究一下
直接提交到 Llama.cpp 上面让作者支持?

Name		Name	Last commit message	Last commit date
Latest commit History 1,508 Commits
.devops		.devops
.github		.github
ci		ci
cmake		cmake
common		common
docs		docs
examples		examples
gguf-py		gguf-py
grammars		grammars
media		media
models		models
pocs		pocs
prompts		prompts
scripts		scripts
spm-headers		spm-headers
tests		tests
.clang-tidy		.clang-tidy
.dockerignore		.dockerignore
.ecrc		.ecrc
.editorconfig		.editorconfig
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
Makefile		Makefile
Package.swift		Package.swift
README.md		README.md
SHA256SUMS		SHA256SUMS
build.zig		build.zig
codecov.yml		codecov.yml
convert-bloom-hf-to-gguf.py		convert-bloom-hf-to-gguf.py
convert-falcon-hf-to-gguf.py		convert-falcon-hf-to-gguf.py
convert-gptneox-hf-to-gguf.py		convert-gptneox-hf-to-gguf.py
convert-llama-ggml-to-gguf.py		convert-llama-ggml-to-gguf.py
convert-lora-to-ggml.py		convert-lora-to-ggml.py
convert-mpt-hf-to-gguf.py		convert-mpt-hf-to-gguf.py
convert-persimmon-to-gguf.py		convert-persimmon-to-gguf.py
convert-refact-hf-to-gguf.py		convert-refact-hf-to-gguf.py
convert-skywork-hf-to-gguf.py		convert-skywork-hf-to-gguf.py
convert-starcoder-hf-to-gguf.py		convert-starcoder-hf-to-gguf.py
convert.py		convert.py
download-skywork.py		download-skywork.py
flake.lock		flake.lock
flake.nix		flake.nix
ggml-alloc.c		ggml-alloc.c
ggml-alloc.h		ggml-alloc.h
ggml-backend.c		ggml-backend.c
ggml-backend.h		ggml-backend.h
ggml-cuda.cu		ggml-cuda.cu
ggml-cuda.h		ggml-cuda.h
ggml-impl.h		ggml-impl.h
ggml-metal.h		ggml-metal.h
ggml-metal.m		ggml-metal.m
ggml-metal.metal		ggml-metal.metal
ggml-mpi.c		ggml-mpi.c
ggml-mpi.h		ggml-mpi.h
ggml-opencl.cpp		ggml-opencl.cpp
ggml-opencl.h		ggml-opencl.h
ggml-quants.c		ggml-quants.c
ggml-quants.h		ggml-quants.h
ggml.c		ggml.c
ggml.h		ggml.h
llama.cpp		llama.cpp
llama.h		llama.h
mypy.ini		mypy.ini
requirements.txt		requirements.txt
run_with_preset.py		run_with_preset.py
unicode.h		unicode.h

License

yxq321/skywork.cpp

Folders and files

Latest commit

History

Repository files navigation

skywork.cpp | 天工大模型C语言的实现