New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Option to split during conversion #6942
base: master
Are you sure you want to change the base?
Conversation
26ebf83
to
874c341
Compare
I've added support for The counterpoint I can see to doing this is that |
This is already a good start. Could you add an end to end usage in the summary? |
Sure thing (I assume you mean examples of usage and expected outputs). I also plan to rework the implementation by consolidating code into a new |
I'll need to implement for Anyway, |
|
Got it - will only implement for |
You can modify the gguf package in the |
That's what I've been doing so far; will check out instructions to contribute, thanks! |
Testing on Mistral 7B Instruct, this branch's |
Running tests on my side for all |
Will keep track of tests here as I go. Picking one model from each architecture in It also seems like the current
|
Leaving a note for myself to watch merge conflicts with #6511. Development on this branch has slowed down as I'm pretty busy. |
Noting time to convert baichuan-inc/Baichuan2-7B-Chat. New branch, New branch, no split: master: Note that these conversions were done writing the outfile over 2.5GbE, so there was considerable time spent just saving the file. Will test more later, but it doesn't seem like the change increases conversion time too significantly. |
Merge attempted. Some ambiguous lines, so @christianazinn should give this a lookover to make sure the intent is still correct. |
I'll check in a few hours and fix conflicts. |
The new |
This PR introduces additional options to
convert.py
that allow users to split a model into shards while converting rather than having to do it after conversion, including a default small first shard as outlined in #6463.Other functionality we ought to have includes
--split-max-size
(so far it's just--split-max-tensors
), displaying estimated shard sizes, dry running, and adding sharding for the otherconvert-*-to-*.py
scripts. This will be considered a draft until those are worked out. Also needs considerable testing, but luckily as this deals with the Python scripts, it can be tested easily.Usage
(examples are using zephyr-smol_llama-100m-sft-full)
Example,
--split-max-size
python3 convert.py --outfile /path/to/outfile.gguf --outtype f16 /path/to/safetensors --split --split-max-size 64M
Output: equal to what's printed to stdout from
master
, thenWith
--split-max-size 200M
(or any number greater than the total resultant size), it gives:Example,
--split-max-tensors
with--dry-run
,--large-first-shard
python3 convert.py --outfile /path/to/outfile.gguf --outtype f16 /path/to/safetensors --split --split-max-tensors 20 --dry-run --large-first-shard
Output: equal to what's printed to stdout from
master
, thenWith
--split-max-tensors 64
(or any number greater than the total tensor count), it gives:References
gguf-split
add a default option to not include tensors data in first shard #6463