14 May 15:26

zhudotexe

be3f743

v1.0.1 Latest

Latest

OpenAI: Added support for GPT-4o

Assets 2

09 May 21:28

zhudotexe

v1.0.0

2a0302e

v1.0.0

New Features

Streaming

kani now supports streaming to print tokens from the engine as they are received! Streaming is designed to be a drop-in superset of the chat_round and full_round methods, allowing you to gradually refactor your code without ever leaving it in a broken state.

To request a stream from the engine, use Kani.chat_round_stream() or Kani.full_round_stream(). These methods will return a StreamManager, which you can use in different ways to consume the stream.

The simplest way to consume the stream is to iterate over it with async for, which will yield a stream of str.

# CHAT ROUND:
stream = ai.chat_round_stream("What is the airspeed velocity of an unladen swallow?")
async for token in stream:
    print(token, end="")
msg = await stream.message()

# FULL ROUND:
async for stream in ai.full_round_stream("What is the airspeed velocity of an unladen swallow?"):
    async for token in stream:
        print(token, end="")
    msg = await stream.message()

After a stream finishes, its contents will be available as a ChatMessage. You can retrieve the final message or BaseCompletion with:

msg = await stream.message()
completion = await stream.completion()

The final ChatMessage may contain non-yielded tokens (e.g. a request for a function call). If the final message or completion is requested before the stream is iterated over, the stream manager will consume the entire stream.

Tip

For compatibility and ease of refactoring, awaiting the stream itself will also return the message, i.e.:

msg = await ai.chat_round_stream("What is the airspeed velocity of an unladen swallow?")

(note the await that is not present in the above examples). This allows you to refactor your code by changing chat_round to chat_round_stream without other changes.

- msg = await ai.chat_round("What is the airspeed velocity of an unladen swallow?")
+ msg = await ai.chat_round_stream("What is the airspeed velocity of an unladen swallow?")

Issue: #30

New Models

kani now has bundled support for the following new models:

Hosted

Claude 3 (including function calling)

Open Source

Llama 3 (all sizes)
Command R and Command R+ (including function calling)
Mistral-7B and Mixtral-8x7B
Gemma (all sizes)

Although these models have built-in support, kani supports every chat model available on Hugging Face through transformers or llama.cpp using the new Prompt Pipelines feature (see below)!

Issue: #34

llama.cpp

To use GGUF-quantized versions of models, kani now supports the LlamaCppEngine, which uses the llama-cpp-python library to interface with the llama.cpp library. Any model with a GGUF version is compatible with this engine!

Prompt Pipelines

A prompt pipeline creates a reproducible pipeline for translating a list of ChatMessage into an engine-specific format using fluent-style chaining.

To build a pipeline, create an instance of PromptPipeline() and add steps by calling the step methods documented below. Most pipelines will end with a call to one of the terminals, which translates the intermediate form into the desired output format.

Pipelines come with a built-in explain() method to print a detailed explanation of the pipeline and multiple examples (selected based on the pipeline steps).

Here’s an example using the PromptPipeline to build a LLaMA 2 chat-style prompt:

from kani import PromptPipeline, ChatRole

LLAMA2_PIPELINE = (
    PromptPipeline()

    # System messages should be wrapped with this tag. We'll translate them to USER
    # messages since a system and user message go together in a single [INST] pair.
    .wrap(role=ChatRole.SYSTEM, prefix="<<SYS>>\n", suffix="\n<</SYS>>\n")
    .translate_role(role=ChatRole.SYSTEM, to=ChatRole.USER)

    # If we see two consecutive USER messages, merge them together into one with a
    # newline in between.
    .merge_consecutive(role=ChatRole.USER, sep="\n")
    # Similarly for ASSISTANT, but with a space (kani automatically strips whitespace from the ends of
    # generations).
    .merge_consecutive(role=ChatRole.ASSISTANT, sep=" ")

    # Finally, wrap USER and ASSISTANT messages in the instruction tokens. If our
    # message list ends with an ASSISTANT message, don't add the EOS token
    # (we want the model to continue the generation).
    .conversation_fmt(
        user_prefix="<s>[INST] ",
        user_suffix=" [/INST]",
        assistant_prefix=" ",
        assistant_suffix=" </s>",
        assistant_suffix_if_last="",
    )
)

# We can see what this pipeline does by calling explain()...
LLAMA2_PIPELINE.explain()

# And use it in our engine to build a string prompt for the LLM.
prompt = LLAMA2_PIPELINE(ai.get_prompt())

Integration with HuggingEngine and LlamaCppEngine

Previously, to use a model with a different prompt format than the ones bundled with the library, one had to create a subclass of the HuggingEngine to implement the prompting scheme. With the release of Prompt Pipelines, you can now supply a PromptPipeline in addition to the model ID to use the HuggingEngine directly!

For example, the LlamaEngine (huggingface) is now equivalent to the following:

engine = HuggingEngine(
  "meta-llama/Llama-2-7b-chat-hf",
  prompt_pipeline=LLAMA2_PIPELINE
)

The engine will use the passed pipeline to automatically infer a model's token usage, making it easier than ever to implement new models.

Issue: #32

Improvements

The OpenAIEngine now uses the official openai-python package. (#31)
- This means that aiohttp is no longer a direct dependency, and the HTTPClient has been deprecated. For API-based models, we recommend using the httpx library.
Added arguments to the chat_in_terminal helper to control maximum width, echo user inputs, show function call arguments and results, and other interactive utilities (#33)
The HuggingEngine can now automatically determine a model's context length.
Added a warning message if an @ai_function is missing a docstring. (#37)
Added WrapperEngine to make writing wrapper extensions easier.

Breaking Changes

All kani models (e.g. ChatMessage) are no longer immutable. This means that you can edit the chat history directly, and token counting will still work correctly.
As the ctransformers library does not appear to be maintained, we have removed the CTransformersEngine and replaced it with the LlamaCppEngine.
The arguments to chat_in_terminal (except the first) are now keyword-only.
The arguments to HuggingEngine (except model_id, max_context_size, and prompt_pipeline) are now keyword-only.
Generation arguments for OpenAI models now take dictionaries rather than kani.engines.openai.models.* models. (If you aren't sure if you're affected by this, you probably aren't.)

Bug Fixes

Fixed an issue with Claude 3 and parallel function calling.

It should be a painless upgrade from kani v0.x to kani v1.0! We tried our best to ensure that we didn't break any existing code. If you encounter any issues, please reach out on our Discord.

Assets 2

22 Apr 15:59

zhudotexe

v1.0.0rc1

36440d0

v1.0.0rc1 Pre-release

Pre-release

Added support for Llama 3
Added WrapperEngine to make writing wrapper extensions easier
Refactored internal Command R prompt building for easier runtime extension
Updated documentation

Assets 2

12 Apr 17:22

zhudotexe

v1.0.0rc0

b1533b4

v1 Release Candidate 0 Pre-release

Pre-release

New Features

Streaming

The simplest way to consume the stream is to iterate over it with async for, which will yield a stream of str.

# CHAT ROUND:
stream = ai.chat_round_stream("What is the airspeed velocity of an unladen swallow?")
async for token in stream:
    print(token, end="")
msg = await stream.message()

# FULL ROUND:
async for stream in ai.full_round_stream("What is the airspeed velocity of an unladen swallow?"):
    async for token in stream:
        print(token, end="")
    msg = await stream.message()

After a stream finishes, its contents will be available as a ChatMessage. You can retrieve the final message or BaseCompletion with:

msg = await stream.message()
completion = await stream.completion()

Tip

For compatibility and ease of refactoring, awaiting the stream itself will also return the message, i.e.:

msg = await ai.chat_round_stream("What is the airspeed velocity of an unladen swallow?")

(note the await that is not present in the above examples). This allows you to refactor your code by changing chat_round to chat_round_stream without other changes.

- msg = await ai.chat_round("What is the airspeed velocity of an unladen swallow?")
+ msg = await ai.chat_round_stream("What is the airspeed velocity of an unladen swallow?")

Issue: #30

New Models

kani now has bundled support for the following new models:

Hosted

Claude 3 (including function calling)

Open Source

Command R and Command R+ (including function calling)
Mistral-7B and Mixtral-8x7B
Gemma (all sizes)

Although these models have built-in support, kani supports every chat model available on Hugging Face through transformers or llama.cpp using the new Prompt Pipelines feature (see below)!

Issue: #34

llama.cpp

Prompt Pipelines

A prompt pipeline creates a reproducible pipeline for translating a list of ChatMessage into an engine-specific format using fluent-style chaining.

Pipelines come with a built-in explain() method to print a detailed explanation of the pipeline and multiple examples (selected based on the pipeline steps).

Here’s an example using the PromptPipeline to build a LLaMA 2 chat-style prompt:

from kani import PromptPipeline, ChatRole

LLAMA2_PIPELINE = (
    PromptPipeline()

    # System messages should be wrapped with this tag. We'll translate them to USER
    # messages since a system and user message go together in a single [INST] pair.
    .wrap(role=ChatRole.SYSTEM, prefix="<<SYS>>\n", suffix="\n<</SYS>>\n")
    .translate_role(role=ChatRole.SYSTEM, to=ChatRole.USER)

    # If we see two consecutive USER messages, merge them together into one with a
    # newline in between.
    .merge_consecutive(role=ChatRole.USER, sep="\n")
    # Similarly for ASSISTANT, but with a space (kani automatically strips whitespace from the ends of
    # generations).
    .merge_consecutive(role=ChatRole.ASSISTANT, sep=" ")

    # Finally, wrap USER and ASSISTANT messages in the instruction tokens. If our
    # message list ends with an ASSISTANT message, don't add the EOS token
    # (we want the model to continue the generation).
    .conversation_fmt(
        user_prefix="<s>[INST] ",
        user_suffix=" [/INST]",
        assistant_prefix=" ",
        assistant_suffix=" </s>",
        assistant_suffix_if_last="",
    )
)

# We can see what this pipeline does by calling explain()...
LLAMA2_PIPELINE.explain()

# And use it in our engine to build a string prompt for the LLM.
prompt = LLAMA2_PIPELINE(ai.get_prompt())

Integration with HuggingEngine and LlamaCppEngine

For example, the LlamaEngine (huggingface) is now equivalent to the following:

engine = HuggingEngine(
  "meta-llama/Llama-2-7b-chat-hf",
  prompt_pipeline=LLAMA2_PIPELINE
)

Issue: #32

Improvements

The OpenAIEngine now uses the official openai-python package. (#31)
- This means that aiohttp is no longer a direct dependency, and the HTTPClient has been deprecated. For API-based models, we recommend using the httpx library.
Added arguments to the chat_in_terminal helper to control maximum width, echo user inputs, show function call arguments and results, and other interactive utilities (#33)
The HuggingEngine can now automatically determine a model's context length.
Added a warning message if an @ai_function is missing a docstring. (#37)

Breaking Changes

All kani models (e.g. ChatMessage) are no longer immutable. This means that you can edit the chat history directly, and token counting will still work correctly.
As the ctransformers library does not appear to be maintained, we have removed the CTransformersEngine and replaced it with the LlamaCppEngine.
The arguments to chat_in_terminal (except the first) are now keyword-only.
The arguments to HuggingEngine (except model_id, max_context_size, and prompt_pipeline) are now keyword-only.
Generation arguments for OpenAI models now take dictionaries rather than kani.engines.openai.models.* models. (If you aren't sure if you're affected by this, you probably aren't.)

It should be a painless upgrade from kani v0.x to kani v1.0! We tried our best to ensure that we didn't break any existing code. If you encounter any issues, please reach out on our Discord.

Assets 2

25 Mar 18:41

zhudotexe

v0.8.0

e48d422

v0.8.0 Pre-release

Pre-release

Most likely the last release before v1.0! This update mostly contains improvements to chat_in_terminal to improve usability in interactive environments like Jupyter Notebook.

Possible Breaking Change

All arguments to chat_in_terminal except the Kani instance must now be keyword arguments; positional arguments are no longer accepted.

For example, chat_in_terminal(ai, 1, "!stop") must now be written chat_in_terminal(ai, rounds=1, stopword="!stop").

Improvements

You may now specify None as the user query in chat_round and full_round. This will request a new ASSISTANT message without adding a USER message to the chat history (e.g. to continue an unfinished generation).

Added the following keyword args to chat_in_terminal to improve usability in interactive environments like Jupyter Notebook:

echo: Whether to echo the user's input to stdout after they send a message (e.g. to save in interactive notebook outputs; default false)
ai_first: Whether the user should send the first message (default) or the model should generate a completion before prompting the user for a message.
width: The maximum width of the printed outputs (default unlimited).
show_function_args: Whether to print the arguments the model is calling functions with for each call (default false).
show_function_returns: Whether to print the results of each function call (default false).
verbose: Equivalent to setting echo, show_function_args, and show_function_returns to True.

Assets 2

05 Feb 21:36

zhudotexe

v0.7.2

5371b84

v0.7.2 Pre-release

Pre-release

OpenAI: Added support for Jan 25 models without specifying max_context_length explicitly
OpenAI: Fixed an issue where the token count for parallel function calls would only consider the first function call

Assets 2

04 Dec 09:28

zhudotexe

v0.7.1

7db107d

v0.7.1 Pre-release

Pre-release

OpenAI: Fixes an issue where a tool call could have an unbound tool call ID when using always_included_messages near the maximum context length

Assets 2

23 Nov 17:52

zhudotexe

v0.7.0

dcb37f0

v0.7.0 Pre-release

Pre-release

New Features

Added support for the Claude API through the AnthropicEngine
- Currently, this is only for chat messages - we don't yet have access to the new function calling API. We plan to add Claude function calling to Kani as soon as we get access!
Renamed ToolCallError to a more general PromptError
- Technically a minor breaking change, though a search of GitHub shows that no one has used ToolCallError yet

Fixes

Fixed an issue where parallel tool calls could not be validated (thanks @arturoleon!)

Contributors

arturoleon

Assets 2

11 Nov 19:16

zhudotexe

v0.6.2

2a57e63

v0.6.2 Pre-release

Pre-release

Fixes an issue where emoji in a chat history might cause issues when saving/loading the kani state
(OpenAI) Fixes an issue where the content field might get omitted in certain requests, causing an API error

Assets 2

08 Nov 20:54

zhudotexe

v0.6.1

f025eb4

v0.6.1 Pre-release

Pre-release

Internal changes to the OpenAIEngine to make extending it easier
No consumer-facing changes

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New Features

Streaming

New Models

llama.cpp

Prompt Pipelines

Integration with HuggingEngine and LlamaCppEngine

Improvements

Breaking Changes

Bug Fixes

New Features

Streaming

New Models

llama.cpp

Prompt Pipelines

Integration with HuggingEngine and LlamaCppEngine

Improvements

Breaking Changes

Possible Breaking Change

Improvements

New Features

Fixes

Contributors

Releases: zhudotexe/kani

v1.0.1

v1.0.0

New Features

Streaming

New Models

llama.cpp

Prompt Pipelines

Integration with HuggingEngine and LlamaCppEngine

Improvements

Breaking Changes

Bug Fixes

v1.0.0rc1

v1 Release Candidate 0

New Features

Streaming

New Models

llama.cpp

Prompt Pipelines

Integration with HuggingEngine and LlamaCppEngine

Improvements

Breaking Changes

v0.8.0

Possible Breaking Change

Improvements

v0.7.2

v0.7.1

v0.7.0

New Features

Fixes

Contributors

v0.6.2

v0.6.1