Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use chat models properly (prompt tags already fixed) #480

Open
pieroit opened this issue Oct 8, 2023 · 11 comments
Open

Use chat models properly (prompt tags already fixed) #480

pieroit opened this issue Oct 8, 2023 · 11 comments
Assignees
Labels
enhancement New feature or request LLM Related to language model / embedder

Comments

@pieroit
Copy link
Member

pieroit commented Oct 8, 2023

At the moment we insert both system prompt (aka prompt_prefix) and conversation history in the prompt, without respecting model-specific prompt tags and treating every model as a completion model.

Let's try to design and implement a solid way to both leverage prompt tags and chat models, as suggested by @AlessandroSpallina.
As an hypothesis, tags could be described in factory classes and used when cat._llm or the agent is used.

Notes:

  • I don't know how langchain deals with this (we should research it because maybe it solves it already)
  • HuggingFace chat templates can also be of inspiration as their solution is quite elegant
  • we can tackle the tags and the completion vs chat issues in two different PRs as it may get complicated
@pieroit pieroit added enhancement New feature or request LLM Related to language model / embedder labels Oct 8, 2023
@pieroit
Copy link
Member Author

pieroit commented Oct 8, 2023

@AlessandroSpallina please comment so I can assign you. Thanks :)

@AlessandroSpallina
Copy link
Contributor

i’m here!

@pieroit
Copy link
Member Author

pieroit commented Oct 15, 2023

Looks to me Langchain is already doing this, we probably can rely on it by passing chat history and system prompt as HumanMessage, AIMessage and SystemMessage objects from within cat.llm.

The API for cat.llm I suggest is:

def llm(self, prompt, chat=False, stream=False):
  # here we retrieve `chat_history` from working memory and convert it to langchain objects 
  pass

Not sure about the SystemMessage though?

@valentimarco
Copy link
Collaborator

I wanna help with this but:

  1. There are already langchain's chat models that do some work, but the ollama is implemention is bad: They hard coded the Template crafting section with the llama2 template... (We use the llm classes that in case of ollama, call the llm using RestAPI. We are safe!)
  2. Langchain implements messege type system (Right now there are System, Human, AI messege, with a Chatmessage to handle custom types) and methods to craft the prompt:
#Zephyr llm
template="<|system|>\nYou are a helpful assistant that translates {input_language} to {output_language}</s>\n"
system_message_prompt = SystemMessagePromptTemplate.from_template(template)
human_template="<|user|>\n{text}</s>\n"
human_message_prompt = HumanMessagePromptTemplate.from_template(human_template)
final_part_prompt = ChatPromptTemplate.from_template("<|assistant|>\n")
final_prompt = chat_prompt = ChatPromptTemplate.from_messages([system_message_prompt, human_message_prompt,final_part_prompt])

chain = LLMChain(llm=llm, prompt=final_prompt)

out = chain.run(input_language="English", output_language="French", text="My family are going to visit me next week.")
  1. Separate the cat prompt in "layers":
    Right now there isn't a System Prompt or a User prompt or even Agent prompt that we can dinamically attach on the final prompt, infact we have only Prefix and suffix prompt. If we define this spec, we get even more customization of the prompt, bc there is a hook for each type of prompt!
  2. We need to define a spec to be able to parse this model templates: Hugging face is trying to impose the ChatML format, but many open llm like llama or zephyr have different model templates. My possible solution are: Simple "parsing by replacing strings" or use Jinja template strings like hugging face!
class PromptTemplateTags:
    """Class that create Prompt from llm Template. Must be the exact same as the one provide to the llm model."""
    templateTags: str
    SystemTag: str
    UserTag: str
    def __init__(self, templateTags: str, SystemTag: str, UserTag: str):
        self.templateTags = templateTags
        self.SystemTag = SystemTag 
        self.UserTag = UserTag
    
    def create_prompt(self, system_message: str = "", user_message: str = "") -> PromptTemplate:
        prompt = self.templateTags.replace(self.SystemTag, system_message).replace(self.UserTag, user_message)
        return prompt

 prompt_model ="""<|system|>
{{ .System }}
</s>
<|user|>
{{ .Prompt }}
</s>
<|assistant|>"""

prompt = PromptTemplateTags(prompt_model, "{{ .System }}", "{{ .Prompt }}").create_prompt()


print(prompt)

I gather all of this infos with very little time but i think we can define a good Design Base!

@valentimarco
Copy link
Collaborator

To answer the 3. point,
i design this diagram by splitting the prompt in 5 Hookable Message: System, LongTerm, ToolUsage, ShortTerm and ChatMessage.
The idea behind LongTerm and ShortTerm is a Message that can by change in the fly by applying filter or mappers. (right now with before_agent_starts hook )
Same for ToolUsage but i know there is already a filter for allowed ones, so i am now sure about this one

Why do we need to split the prompt and be hookable if we have already the prefix e suffix hooks?
The answer is simple:

  1. Modify only the neccessary part of the prompt
  2. Create a distinct separation from the System and Conversation prompt (with we need for the template!)
  3. Define more atomic hooks for the cat!

(The Prompt Merge block is only for the schematic purpose!)

Obsidian_aEFvL5Z70R

@pieroit
Copy link
Member Author

pieroit commented Nov 6, 2023

@valentimarco thanks for the diagram looks reasonable, also the PromptTemplateTags.

To be totally honest I am scared about all this fragmentation we have to deal with.
Here a few consideration:

  • Even if langchain does not do it properly, Ollama itself will probably handle soon model-specific tags and have completion vs chat endpoints. Is it worth it to do all of this in the Cat?
  • If an open standard or a de-facto standard stabilizes (like the ChatML you mention), we would have to switch back again. Think how many things changed in 2/3 months... I expect a standard on this soon (and models trained on the standard, and runners with adeguate endpoints)
  • Priority for us is passing langchain the chat history for chat models instead of serializing convo manually

Can we focus on the last point? I mean inside here we can pass chat history from working memory directly to langchain ChatGPT and ChatOllama, as in here as you showed above. I know it's not peferct, but is the right direction without the risk of overengineering

Thanks a lot for dedicating the time

@valentimarco
Copy link
Collaborator

Maybe we can resolve with a temp plugin with PromptTemplateTags class, so people can use in an efficient way local LLMs...
Also:

  • Ollama handles specific model-specific tags but only if you use with ollama run <model:version> (from my testing)
  • We can do a issue to langchain to inform discrepancy when using models different from llama2 (but they will probably risponde with the 2 point you describe)

I agree with you, in few months this changes maybe be revert but i don't see any possible solution for a good customability rather than those explained early

@pieroit pieroit changed the title Use proper prompt tags and completion vs chat models Use chat models properly (prompt tags already fixed) Dec 16, 2023
@valentimarco
Copy link
Collaborator

We saw that most of the runner:

  1. use the OpenAi RESTAPI schema
  2. Handles Prompt tags for each model

Now we need to use chat models properly by:

  1. Create a list of message that represent the chat history (maybe using chat prompts of langchain)
  2. Support only chat models.

@pieroit
Copy link
Member Author

pieroit commented Feb 10, 2024

Also Ollama now supports the OpenAI pseudo standard
https://github.com/ollama/ollama/blob/main/docs/openai.md

@valentimarco
Copy link
Collaborator

Yep, we need to wait a little more and we can use only one class for most of the runners!

@pieroit
Copy link
Member Author

pieroit commented May 5, 2024

Work in progress in PR #783

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request LLM Related to language model / embedder
Projects
Status: 📋 Backlog
Development

No branches or pull requests

3 participants