Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What does save/load do? #13

Open
saul-jb opened this issue Jun 12, 2023 · 15 comments
Open

What does save/load do? #13

saul-jb opened this issue Jun 12, 2023 · 15 comments

Comments

@saul-jb
Copy link

saul-jb commented Jun 12, 2023

Type '/save','/load' to save network state into a binary file.

After I run save and load it back into a new session it seems to have forgotten my previous prompts, what exactly do these commands do and what are they used for?

@kuvaus
Copy link
Owner

kuvaus commented Jun 12, 2023

Good point!
Uhh, right now it does nothing... :)

At the moment in v0.2.4, you can save_log and load_log which allows you to have a memory of previous conversations.

The point of the /save and /load was to be able to save the snapshot of the whole state of the network to disk in a binary file and then load it. That saves be an exact copy of the state but the file will be big, 1Gb or more.

Currently saving seems to work, it does make a binary file to disk, but loading seems buggy. I don't really understand what's wrong there at the moment...

I'll bump the binary Release to v0.2.5 version once I get it working.

Maybe I should have put the progress into a different branch to avoid confusions...

Anyway, use save_log and load_log for save/resume for now, its almost the same thing and uses much less space.

@saul-jb
Copy link
Author

saul-jb commented Jun 12, 2023

Thanks for that response, the save_log and load_log can only be specified as a startup parameter reducing their usefulness, it would be nice to have a way to efficiently land quickly load contexts in and out of the model leaving it still running (not having to re-load it into memory) thus being able to be used in more situations or users at the same time.

Is /load & /save going to be useful for this or is it effectively doing the same thing as a restart and is better to just do a full restart to switch between contexts?

Another thing that would be useful for /save & /load would be being able to specify a path, e.g. /save /path/to/my/saves/context.bin.

@kuvaus
Copy link
Owner

kuvaus commented Jun 13, 2023

Got save/load working! I think.

Is /load & /save going to be useful for this or is it effectively doing the same thing as a restart and is better to just do a full restart to switch between contexts?

Yeah, should be useful. You'll still have to load the 1+Gb. I'm not sure if there is a way to optimize that.

Another thing that would be useful for /save & /load would be being able to specify a path, e.g. /save /path/to/my/saves/context.bin.

Yep. For now the program only makes a saves folder and saves the state there. But I agree, naming the saves would be useful.

@saul-jb
Copy link
Author

saul-jb commented Jun 13, 2023

Got save/load working! I think.

Thanks again for your fast work!

Yeah, should be useful. You'll still have to load the 1+Gb. I'm not sure if there is a way to optimize that.

Even if it can't be optimized, surely it is more efficient than reloading the model from startup.

Yep. For now the program only makes a saves folder and saves the state there. But I agree, naming the saves would be useful.

For now the workaround is to move the save file elsewhere after saving and move it back before loading.

I'll leave this issue open until naming saves or specifying paths is implemented unless you want to create a new issue specifically for that to keep track of it there.

@kuvaus
Copy link
Owner

kuvaus commented Jun 14, 2023

Added save naming in v0.2.6.

/save NAME and /load NAME will now save/load the state in saves/NAME.bin
Because it's really easy to fill your hard drive with those multi-Gb files, I also added a --no-saves toggle so one can turn the whole thing off.

I also kinda wanted to keep all the saves in one folder so its easy to just find and delete them all. Especially when one happens to make a typo in the save names.

For now the workaround is to move the save file elsewhere after saving and move it back before loading.

Yeah this was tedious. Renaming was a good suggestion!

I'll leave this issue open until naming saves or specifying paths is implemented unless you want to create a new issue specifically for that to keep track of it there.

This is good. Let me know if it works or not. :) I just pushed the update as soon as it seemed to work.

@saul-jb
Copy link
Author

saul-jb commented Jun 14, 2023

/save NAME and /load NAME will now save/load the state in saves/NAME.bin
Because it's really easy to fill your hard drive with those multi-Gb files, I also added a --no-saves toggle so one can turn the whole thing off.

That is good but there should really be a CLI parameter --save-dir for the save folder location if we can't specify paths for individual saves.

This is good. Let me know if it works or not. :) I just pushed the update as soon as it seemed to work.

Unfortunately it seems to be bugged. Saving seems to work fine but loading seems to have something odd about it.

./chat -m ggml-gpt4all-j-v1.3-groovy.bin -s 1
> What is a dog?
...
> /save dog

Then kill it and start it again loading the model and querying it with:

> /load dog
> Provide more information.
I apologize for the confusion earlier. Can you please clarify what you would like me to provide additional information on?

But if I copy the dog.bin and dog.ctx saves to model_state.bin and model_state.ctx and run it:

> /load
> Provide more information.
I apologize for the confusion earlier. Dogs are domesticated animals that belong to the family Canidae. They have been around since ancient times and were used as hunting dogs in many cultures. They are known for their loyalty, affection, and ability to provide companionship.

I have tested it multiple times and it seems to just not have any memory when loading using the named save. This would be easier to debug if we could produce deterministic results (#14).

@saul-jb
Copy link
Author

saul-jb commented Jun 14, 2023

Another thing to note is that /save <NAME> and /load <NAME> alters the state of params:

if (input2 != "" && (input2.find("..") == std::string::npos) ) { params.state = input2; }

if (input2 != "" && (input2.find("..") == std::string::npos) ) { params.state = input2; }

Therefore calling /load after a /save <NAME> will use <NAME> instead of whatever --state is set to:

> /save test
Model data saved to: saves/test.bin size: 0.94 Gb
> /load     
Model data loaded from: saves/test.bin size: 0.94 Gb
> /save
Model data saved to: saves/test.bin size: 0.94 Gb

Is this the desired behavior?

@saul-jb
Copy link
Author

saul-jb commented Jun 15, 2023

Another thing I have noticed that loading model state is fairly quick but the it makes the next query(s) are much slower - taking around 5x longer. Why does this happen? Is this expected?

It turns out calling /load causes the memory to permanently increase by around a GB and this made it just exceed the RAM of the machine I was testing this on causing a lot of swapping to happen, I suspect this is the cause of the slow running after loading.

Is it possible to combine the loaded state with the internal memory so we don't have increased memory requirements after loading? It's not a big deal either way - I was cutting it real close with the RAM as is.

@kuvaus
Copy link
Owner

kuvaus commented Jun 15, 2023

Lots of good comments! Thank you! :)

That is good but there should really be a CLI parameter --save-dir for the save folder location if we can't specify paths for individual saves.

I just have to make sure that it does not cause any security issues first. There needs to be some checks that you can't accidentally overwrite any important binaries...

Another thing to note is that /save <NAME> and /load <NAME> alters the state of params:
Is this the desired behavior?

It was until you pointed it out :) Fixed now so that --state is the default if no NAME is used.

I have tested it multiple times and it seems to just not have any memory when loading using the named save. This would be easier to debug if we could produce deterministic results (#14).

I hope this works now with the name changes and --temp.

It turns out calling /load causes the memory to permanently increase by around a GB and this made it just exceed the RAM of the machine I was testing this on causing a lot of swapping to happen, I suspect this is the cause of the slow running after loading.

Memory speed is really the bottleneck with these models. When it goes to swap, it becomes atrociously slow. On mac (and I assume on linux too) you could perhaps use sudo purge to clear your memory before starting the program for the first time.

@kuvaus
Copy link
Owner

kuvaus commented Jun 15, 2023

Is it possible to combine the loaded state with the internal memory so we don't have increased memory requirements after loading? It's not a big deal either way - I was cutting it real close with the RAM as is.

This is a really tough question. I honestly don't know! I only have some thoughts but raw memory management is something I don't know much about.

  1. you have the 4Gb model in memory and then you have to also stream the 1Gb state from disk. I do delete the stream buffers at the end of the loading function but I'm not sure what's going on.
  2. the backend handles the raw pointers and the memory for the model. Maybe there's a way to change those without altering the backend...

But I get what you're saying. Ideally you'd want /load to just replace the state of the model with the loaded file and free all the other memory.

Now, someone smarter than me can say "oh its easy, you just do this (solution)". So if someone figures it out, I'm super happy to add that feature. :)

I mean its probably likely that I could just be doing something very inefficient here.

@kuvaus
Copy link
Owner

kuvaus commented Jun 15, 2023

with v0.2.7:

./chat -m ggml-gpt4all-j-v1.3-groovy.bin --temp 0.001

> What is a dog?
 A dog is a domesticated animal that belongs to the family Canidae. They are known for their loyalty, affection, and ability to provide companionship. Dogs have been around since ancient times when they were used as hunting dogs or guard animals. Today, there are many different breeds of dogs with unique characteristics such as size, temperament, and coat color.
> /save dog
Model data saved to: saves/dog.bin size: 0.94 Gb
> quit

Now, with loading:

> /load dog
Model data loaded from: saves/dog.bin size: 0.94 Gb
> Provide more information.
 I would like you to provide me with more details about what you are looking for. Can you please specify your request?

But if you say this instead:

> /load dog
Model data loaded from: saves/dog.bin size: 0.94 Gb
> tell me more
 A dog is a domesticated animal that belongs to the family Canidae. They are known for their loyalty, affection, and ability to provide companionship. Dogs have been around since ancient times when they were used as hunting dogs or guard animals. Today, there are many different breeds of dogs with unique characteristics such as size, temperament, and coat color.

Dogs come in a variety of shapes, sizes, and colors. Some common breeds include the Labrador Retriever, Golden Retriever, Beagle, Boxer, Bulldog, Poodle, and Dachshund. Each breed has its own unique traits such as energy levels, shedding habits, and grooming needs.

Dogs are known for their intelligence and ability to learn new things quickly. They have been trained in a variety of skills including obedience, agility, tracking, and protection. Some dogs even have specialized jobs like police dogs or therapy animals.

Overall, dogs are an important part of our lives

With low temperature you should hopefully get the same results.

@saul-jb
Copy link
Author

saul-jb commented Jun 15, 2023

With low temperature you should hopefully get the same results.

LlamaGPTJ-chat (v. 0.2.7)

Yes, and no, even using that temperature I am getting slightly different results, for example my first query:

A dog is a domesticated animal that belongs to the family Canidae. They are known for their loyalty, affection, and ability to provide companionship. Dogs have been around since ancient times when they were used as hunting dogs or guard animals. Today, there are many different breeds of dogs with unique characteristics such as size, temperament, and physical features.

Perhaps --top_k 1 would be more reliable for determinism?

Then after saving and loading running your query tell me more gives a similar result but Tell me more. (capitalization and full stop) gives:

 I would love to know more about you! What's your name? Where are you from originally? Do you have any hobbies or interests that you enjoy doing in your free time?

Now if I give it the prompt Tell me more. directly after the prompt What is a dog? in the same session I get a relevant result about dogs (compared to the one above). So save/load seems to not load the exact state back but it is clearly loading something back.

@kuvaus
Copy link
Owner

kuvaus commented Jun 15, 2023

Very interesting!

I didn't even realize about --top_k 1. Thank you! It's absolutely better to use that.

And yet I still got:

... such as size, temperament, and coat color.

even with --temp 0.001 or --top_k 1 or --top_k 1 --temp 0.001.

I get the same result with all three cases but it does indeed differ from yours. Maybe there's some machine specific things going on as well. (How floating points are calculated on the processor etc.) I'm running this on a mac with AVX1 only.

Also got with --top_k 1

> /load dog
Model data loaded from: saves/dog.bin size: 0.94 Gb
> Tell me more.
 I would love to know more about you! What's your name? Where are you from originally? Do you have any pets?

vs:

> What is a dog?
 A dog is a domesticated animal that belongs to the family Canidae. They are known for their loyalty, affection, and ability to provide companionship. Dogs have been around since ancient times when they were used as hunting dogs or guard animals. Today, there are many different breeds of dogs with unique characteristics such as size, temperament, and coat color.
> Tell me more.
 I would love to know more about you! What's your name? Where are you from originally? Do you have any pets?

Well... I guess there could be a bug in loading, in the backend, or the models just are inherently little bit unpredictable. But the fact that "Tell me more." is so different from "tell me more" is kinda unsettling.

So save/load seems to not load the exact state back but it is clearly loading something back.

You're right. Something is off...

@saul-jb
Copy link
Author

saul-jb commented Jun 15, 2023

I get the same result with all three cases but it does indeed differ from yours. Maybe there's some machine specific things going on as well. (How floating points are calculated on the processor etc.) I'm running this on a mac with AVX1 only.

Yeah, it must be slight calculation difference. For reference I have tested on:

Linux 5.15 x86_64 (Ubuntu) AMD (Supports AVX2)
Linux 5.10 x86_64 (Debian) Intel (Supports AVX2)

And get the physical features variation.

@kuvaus
Copy link
Owner

kuvaus commented Jun 15, 2023

Great that you tested on multiple computers!

Its good that you found this bug! :) I'll look into this more at some point. But at the moment I don't have any ideas on how to make loading more accurate.

Meanwhile v0.2.8:

Added --save_dir and --state is now --save_name.

There's now a sanity check that /save and /load can only overwrite/load binaries of same size as the model state is. For example GPTJ is 941686804 bytes.

So you can overwrite a previous save or an empty file, but not an existing binary of different size. That should be enough to prevent accidents.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants