str_concat memory leak #92

dorjeduck · 2024-04-24T10:15:33Z

I am also working on a port of llama2 to mojo. Your port is excellent, just do it myself for the sake of learning, probably sticking closer to the C source, will see how it goes. Just struggling with the tokenizer in the Andrej's C code and taking a look at your code, i wonder if there is a probably not problematic memory leak in the str_concat method, cant see right now that the memory allocated there is freed at one pint ... i might be completely wrong, just thought to drop you a line.

mikowals · 2024-04-24T11:35:55Z

Hi Martin. I think you are correct that the token memory is not explicitly freed before the program exits end the tokens that are being merged are also not freed which is probably more problematic.

There is already a fix for this on this draft PR which puts all the pointers ultimately into Strings and Lists of Strings so it all memory should be freed without calling free directly.

dorjeduck · 2024-04-24T12:33:33Z

sounds very good - i am right now wondering if i should go for Lists and Strings in my port or deal with memory management myself, need to find out if it is performance crucial but i guess its not. Thanks for the reply, great project

dorjeduck · 2024-04-27T04:24:34Z

Use a Dict to lookup token indices. This allows removing Quicksort and binary search code.

did you check performance of this change - i am playing around with Tokenizer and noticed that Dict is really pretty slow compared to python {} in my tests. I have not yet checked if a Mojo sort/search implementation following the C-Code is faster .
Here is on discussion on this modularml/mojo#1747 , i just asked on Discord if there are any updates on this issue

mikowals · 2024-04-27T05:30:47Z

As I said in the same comment you quoted:

I didn't mess with Llamatune since this is going across Mojo versions but locally there was no change in tokens / sec. It is probably loading faster and more memory efficiently since this avoids the vocab sort and no longer reads entire tokenizer.bin.

I didn't play with prompts much which is when the dict lookup would be used. It is outside the transformer and not measured in the tokens/sec benchmark. In previous benchmarks of the vocab lookup it was only taking 6ms so I doubt there will be a difference with Dict big enough to warrant deciding on performance rather than simplifying the code base. But Llamatune reports the total runtime as well tokens / sec so any problems I have overlooked would be noticed there.

The Llama2 vocab is 30,000 tokens though and the binary search being replaced is O(Log(N)) average time while Dict is O(1).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

str_concat memory leak #92

str_concat memory leak #92

dorjeduck commented Apr 24, 2024 •

edited

mikowals commented Apr 24, 2024

dorjeduck commented Apr 24, 2024

dorjeduck commented Apr 27, 2024 •

edited

mikowals commented Apr 27, 2024 •

edited

str_concat memory leak #92

str_concat memory leak #92

Comments

dorjeduck commented Apr 24, 2024 • edited

mikowals commented Apr 24, 2024

dorjeduck commented Apr 24, 2024

dorjeduck commented Apr 27, 2024 • edited

mikowals commented Apr 27, 2024 • edited

dorjeduck commented Apr 24, 2024 •

edited

dorjeduck commented Apr 27, 2024 •

edited

mikowals commented Apr 27, 2024 •

edited