feat: add stopword checker + iterable generate function #106

Nintorac · 2023-04-29T03:19:31Z

This PR adds two sets of functionality.

It adds an iterable generate function that is useful for streaming usecases where the upstream caller may want to stop execution before the model is ready
Adds a stopwords argument to the pipeline args, this mimics the OpenAI complete endpoint and is useful halting execution when certain strings are produced, eg bob: could be used to yield control when the model outputs bob which would indicate it is the user turn to speak.

Let me know what you think :)

eg. usage

args = PIPELINE_ARGS(temperature=1e-1,top_p=1e-1, 
                     stop_words=['bob:']
                     )
instr = """
bob: hello, how are you today?

alice: I'm fine, thanks.

bob: that's good. Write me a function in python to calculate pi please

alice:"""

for i in pipeline.igenerate(instr, args=args):
    print(i, end='', flush=True)

BlinkDL · 2023-05-03T01:32:55Z

Nice :) Actually a better method is to "recover" the state when you see Bob: / Alice:, as in #87

I use \n\n for now, because I replace all \n\n in ChatGPT generations by \n, so whenever you see \n\n it must be endoftext

Nintorac · 2023-05-09T09:24:13Z

hmm, can't really grasp what the PR is doing after reviewing, the out state doesn't get used? eg here and I don't think it's solving the same problem. Also looks to use some global state load_all_stat which is better for me not to use as I would like to run this in a production (in a very loose sense) environment

The target use case here supports arbitrary custom stopwords, this can be used in langchain to stop the LLM when it needs to. eg

Task: Find a list of cheeses
Action: search google for cheese
Observation:

in such a chain you might put Observation: as a stop word in order to stop there and inject your own observation based on the search results.

I replace all \n\n in ChatGPT generations by \n

yes! are you thinking of including an <end of turn> token or something? Is it even possible to finetune in new tokens?

cahya-wirawan · 2023-05-12T18:36:27Z

I use \n\n for now, because I replace all \n\n in ChatGPT generations by \n, so whenever you see \n\n it must be endoftext

Sometimes the text generation stop even it doesn’t finish the text because it creates \n\n for example if i ask the bot to write code, it starts with the sentence “here is the python code to write X algorithm” and then “\n\n”. Actually it will write the code, but the chatrwkv will stop the generation because it sees “\n\n”

Nintorac · 2023-05-16T07:52:29Z

Oh actually I did notice something, based on this section of the HF article; do you use str.replace('\n\n', '\n') or re.replace('\n+', '\n')? I don't think the former is actually removing all double new lines? Might explain the behaviour @cahya-wirawan is seeing since you would might not expect to see that if the model was not trained on it?

Big GZ on the HF release by the way!

Nintorac mentioned this pull request Apr 29, 2023

Add RWKV example lhenault/simpleAI#24

Closed

feat: add stopword checker + iterable generate function

f905496

Nintorac force-pushed the stopword-checker branch from 6a58e66 to f905496 Compare May 12, 2023 03:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add stopword checker + iterable generate function #106

feat: add stopword checker + iterable generate function #106

Nintorac commented Apr 29, 2023 •

edited

BlinkDL commented May 3, 2023 •

edited

Nintorac commented May 9, 2023 •

edited

cahya-wirawan commented May 12, 2023

Nintorac commented May 16, 2023

feat: add stopword checker + iterable generate function #106

Are you sure you want to change the base?

feat: add stopword checker + iterable generate function #106

Conversation

Nintorac commented Apr 29, 2023 • edited

BlinkDL commented May 3, 2023 • edited

Nintorac commented May 9, 2023 • edited

cahya-wirawan commented May 12, 2023

Nintorac commented May 16, 2023

Nintorac commented Apr 29, 2023 •

edited

BlinkDL commented May 3, 2023 •

edited

Nintorac commented May 9, 2023 •

edited