Pass more tokens #42

Acidbuk · 2023-04-11T17:33:53Z

Hi, So, if I can ask, what have you set as the default for tokens? also is there a settings file I can tinker with to give it more tokens for context and replies? I don't mind if I make it a bit slower (still faster than trying to run on my GPU), but sometimes when you really get it going with just the right prompt it writes gold. then when it cuts itself off in the middle of a sentence in the middle of a story after running out of tokens, I can't make it remember the context of the previous reply and continue where it left off.

ItsPi3141 · 2023-04-11T18:07:23Z

The context size is set to max (2048) already

Acidbuk · 2023-04-11T22:50:00Z

is 2048 a hard limit with llama.cpp? or is that a function of the model? I know GPT' 3.5 is somewhere around 4000 but it seems to have a better memory for longer before it goes senile? I'm not sure how they achieve that I suspect they might be offloading a summarised version of the previous post to keep the 'Bot on track?

ItsPi3141 · 2023-04-12T02:19:06Z

is 2048 a hard limit with llama.cpp? or is that a function of the model?

Yes, 2048 seems to be the hard limit. It allows you to make context go more than 2048, but it warns that performance may be negatively impacted.

kendevco · 2023-04-14T01:19:22Z

Are you planning on implementing context? By "context," I mean compressing previous messages and placing them in the prompt like GPT3/4's createChatCompletion. I recently managed to run the Vicuna ggml-vicuna-13b-4bit-rev1.bin model by browsing to its file and loading it. Unfortunately, I encountered a bug in the prompt that generated infinite text output. For instance, when I asked the model to make a song in the style of Marshal Mathers about AI and Humans coexisting, it printed 33 verses before I had to stop it. I suspect the text would have gone on indefinitely. Otherwise, my Lenovo Legion 5I Pro ran both ggml-model-q4_1.bin and the Vicuna model satisfactorily — nearly as fast as GPT-4.

The Chatbot-Ui really caught my attention, and I'm fascinated by the idea of combining it with other UIs like Next.js and Electron. How challenging would this be, and is it even possible?

ItsPi3141 · 2023-04-14T01:55:08Z

Are you planning on implementing context? By "context," I mean compressing previous messages and placing them in the prompt like GPT3/4's createChatCompletion.

In theory, I could do that. But it would make the performance very poor on most computer. OpenAI can do this because they have a bunch of beefy GPUs at their disposal. But this runs locally, sometimes on near-potato hardware.

The Chatbot-Ui really caught my attention, and I'm fascinated by the idea of combining it with other UIs like Next.js and Electron. How challenging would this be, and is it even possible?

I don't know how to use Next.js because I hate HTML frameworks (e.g. bootstrap, vue, angular, react). It probably wouldn't be hard to turn it into an electron app though. If it runs in the web browser, you could just embed that very same page into an electron app and that's it.

erkkimon · 2023-04-27T19:07:34Z

I haven't tried this yet but it might help increasing prompt size by compressing the prompt.

https://github.com/yasyf/compress-gpt

ItsPi3141 · 2023-04-27T20:18:22Z

I haven't tried this yet but it might help increasing prompt size by compressing the prompt.

https://github.com/yasyf/compress-gpt

I'll take a look at how it works later. If it's not too complicated, I'll try to implement something similar.

erkkimon · 2023-04-28T12:40:39Z

I'm not sure how it affects performance but at least it might be good to be aware of that possibility. At least it's implemented as drop-in-replacement which is quite cool, imho.

ItsPi3141 added the question Further information is requested label Apr 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pass more tokens #42

Pass more tokens #42

Acidbuk commented Apr 11, 2023

ItsPi3141 commented Apr 11, 2023

Acidbuk commented Apr 11, 2023

ItsPi3141 commented Apr 12, 2023

kendevco commented Apr 14, 2023

ItsPi3141 commented Apr 14, 2023

erkkimon commented Apr 27, 2023

ItsPi3141 commented Apr 27, 2023

erkkimon commented Apr 28, 2023

Pass more tokens #42

Pass more tokens #42

Comments

Acidbuk commented Apr 11, 2023

ItsPi3141 commented Apr 11, 2023

Acidbuk commented Apr 11, 2023

ItsPi3141 commented Apr 12, 2023

kendevco commented Apr 14, 2023

ItsPi3141 commented Apr 14, 2023

erkkimon commented Apr 27, 2023

ItsPi3141 commented Apr 27, 2023

erkkimon commented Apr 28, 2023