Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

str_concat memory leak #92

Open
dorjeduck opened this issue Apr 24, 2024 · 4 comments
Open

str_concat memory leak #92

dorjeduck opened this issue Apr 24, 2024 · 4 comments

Comments

@dorjeduck
Copy link

dorjeduck commented Apr 24, 2024

I am also working on a port of llama2 to mojo. Your port is excellent, just do it myself for the sake of learning, probably sticking closer to the C source, will see how it goes. Just struggling with the tokenizer in the Andrej's C code and taking a look at your code, i wonder if there is a probably not problematic memory leak in the str_concat method, cant see right now that the memory allocated there is freed at one pint ... i might be completely wrong, just thought to drop you a line.

@mikowals
Copy link
Contributor

Hi Martin. I think you are correct that the token memory is not explicitly freed before the program exits end the tokens that are being merged are also not freed which is probably more problematic.

There is already a fix for this on this draft PR which puts all the pointers ultimately into Strings and Lists of Strings so it all memory should be freed without calling free directly.

@dorjeduck
Copy link
Author

sounds very good - i am right now wondering if i should go for Lists and Strings in my port or deal with memory management myself, need to find out if it is performance crucial but i guess its not. Thanks for the reply, great project

@dorjeduck
Copy link
Author

dorjeduck commented Apr 27, 2024

Use a Dict to lookup token indices. This allows removing Quicksort and binary search code.

did you check performance of this change - i am playing around with Tokenizer and noticed that Dict is really pretty slow compared to python {} in my tests. I have not yet checked if a Mojo sort/search implementation following the C-Code is faster .
Here is on discussion on this modularml/mojo#1747 , i just asked on Discord if there are any updates on this issue

@mikowals
Copy link
Contributor

mikowals commented Apr 27, 2024

As I said in the same comment you quoted:

I didn't mess with Llamatune since this is going across Mojo versions but locally there was no change in tokens / sec. It is probably loading faster and more memory efficiently since this avoids the vocab sort and no longer reads entire tokenizer.bin.

I didn't play with prompts much which is when the dict lookup would be used. It is outside the transformer and not measured in the tokens/sec benchmark. In previous benchmarks of the vocab lookup it was only taking 6ms so I doubt there will be a difference with Dict big enough to warrant deciding on performance rather than simplifying the code base. But Llamatune reports the total runtime as well tokens / sec so any problems I have overlooked would be noticed there.

The Llama2 vocab is 30,000 tokens though and the binary search being replaced is O(Log(N)) average time while Dict is O(1).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants