-
Notifications
You must be signed in to change notification settings - Fork 715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: ggml: support more parameters from llama.cpp #3314
Comments
is this issue open for contributions? if yes I would love to look into this. |
Yes, this issue is open for contributions. We welcome your input and any code related to this issue. |
some parameters, such as Abstract of integrating `--parallel` `--draft` and parsing it as an optional parameter in WasmEdgestruct Graph {
// ...
uint64_t NParallel = 1;
uint64_t NDraft = 1;
}
Expect<ErrNo> compute(WasiNNEnvironment &Env, uint32_t ContextId) noexcept {
// ...
// if --draft and --parallel are set
ReturnCode = SpeculativeDecoding(GraphRef, CxtRef);
// else use current implementation
// ...
}
ErrNo SpeculativeDecoding(Graph &GraphRef, Context &CxtRef) noexcept {
// implementation like https://github.com/ggerganov/llama.cpp/blob/3292733f95d4632a956890a438af5192e7031c12/examples/speculative/speculative.cpp
} detailed code: https://github.com/Fusaaaann/WasmEdge/blob/ae718df452658df555e2b4fe35e8c90e69c5c55f/plugins/wasi_nn/strategies/strategies.cpp#L234 what is WasmEdge's future planning for supporting these parameters, if wasi-nn functions could become too complex to fit in one ggml.cpp file due to support for these parameters? |
Hi @Fusaaaann |
Summary
We currently support some parameters from llama.cpp, such as
n_gpu_layers
,cox-size
,thread
, etc., and we expect to support even more parameters.Details
Refer to llama.cpp/common/common.cpp/gpt_params_find_arg(), planning to support additional parameters.
Appendix
List all options:
--seed
--threads
--threads-batch
--threads-draft
--threads-batch-draft
--prompt
--escape
--prompt-cache
--prompt-cache-all
--prompt-cache-ro
--binary-file
--file
--n-predict
--top-k
--ctx-size
--grp-attn-n
--grp-attn-w
--rope-freq-base
--rope-freq-scale
--rope-scaling
--rope-scale
--yarn-orig-ctx
--yarn-ext-factor
--yarn-attn-factor
--yarn-beta-fast
--yarn-beta-slow
--pooling
--defrag-thold
--samplers
--sampling-seq
--top-p
--min-p
--temp
--tfs
--typical
--repeat-last-n
--repeat-penalty
--frequency-penalty
--presence-penalty
--dynatemp-range
--dynatemp-exp
--mirostat
--mirostat-lr
--mirostat-ent
--cfg-negative-prompt
--cfg-negative-prompt-file
--cfg-scale
--batch-size
--ubatch-size
--keep
--draft
--chunks
--parallel
--sequences
--p-split
--model
--model-draft
--alias
--model-url
--hf-repo
--hf-file
--lora
--lora-scaled
--lora-base
--control-vector
--control-vector-scaled
--control-vector-layer-range
--mmproj
--image
--interactive
--embedding
--interactive-first
--instruct
--chatml
--infill
--dump-kv-cache
--no-kv-offload
--cache-type-k
--cache-type-v
--multiline-input
--simple-io
--cont-batching
--color
--mlock
--gpu-layers
--n-gpu-layers
--gpu-layers-draft
--n-gpu-layers-draft
--main-gpu
--split-mode
--tensor-split
--no-mmap
--numa
--verbose-prompt
--no-display-prompt
--reverse-prompt
--logdir
--lookup-cache-static
--lookup-cache-dynamic
--save-all-logits
--kl-divergence-base
--perplexity
--all-logits
--ppl-stride
--print-token-count
--ppl-output-type
--hellaswag
--hellaswag-tasks
--winogrande
--winogrande-tasks
--multiple-choice
--multiple-choice-tasks
--kl-divergence
--ignore-eos
--no-penalize-nl
--logit-bias
--help
--version
--random-prompt
--in-prefix-bos
--in-prefix
--in-suffix
--grammar
--grammar-file
--override-kv
The text was updated successfully, but these errors were encountered: