Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client/server tool invocations with useChat and streamText #1514

Merged
merged 16 commits into from May 13, 2024

Conversation

lgrammel
Copy link
Collaborator

@lgrammel lgrammel commented May 7, 2024

Summary

Adds support for client/server tool calls with useChat and streamText. Special focus is on enabling client side user interactions as tools.

  • Add tool-call and tool-result stream parts
  • streamText sends tool calls and tool results in AI stream
  • assistant messages (UI) can contain tool invocations (= tool results + tool calls, where tool results superseed tool calls)
  • useChat adds tool invocations to assistant messages
  • useChat (React) adds experimental_addToolResult helper to send tool results from e.g. user interactions
  • convertToCoreMessages converts UI messages with tool invocations to assistant and tool messages for AI/Core

Examples

  • examples/next-openai/app/use-chat-client-tool: confirmation dialog and tool that requires confirmation
  • examples/next-openai/app/use-chat-tool-result: display in-progress call & result for server side tool
Screen.Recording.2024-05-13.at.15.22.47.mov

Subtasks

  • add tool call stream part
  • add tool result stream part
  • send tool call & tool result stream parts from streamText
    • re-implement callback support
  • add tool calls and tool results to assistant ui message
  • add experimental_addToolResult helper to useChat
    • React
  • add convertToCoreMessages helper
  • updates examples & test end-to-end
    • add confirmation example
    • update weather example
    • remove old tool call example
  • changeset

@rajdtta
Copy link

rajdtta commented May 7, 2024

Thanks sm for working on this.

When you say "parallel client/server tool calls", would it also support one either: (1) allowing a server-side tool call to call a client-side tool, or (2) passing the result of a server-side tool to the client?

I've been trying to find a workable impl for either of the two above scenarios, but the closest thing I've seen is serializing the result of server-side tool calls via JSON and sending it back as an assistant message, which comes w/ its own set of caveats like the lack of a follow-up message from the assistant w/o an additional user message.

@lgrammel
Copy link
Collaborator Author

lgrammel commented May 7, 2024

When you say "parallel client/server tool calls", would it also support one either: (1) allowing a server-side tool call to call a client-side tool, or (2) passing the result of a server-side tool to the client?

It will support both client and server side tool calls, and they can be mixed and parallel.

Example

You define 2 tools, one that is executed client side (i.e. has no execute method), and one that is executed server-side (i.e. with an execute method). The client side code then receives 3 related events:

  • tool call for client side tool
  • tool call for server side tool
  • tool result for server side tool

On the client, you'd execute the client-side tool, do whatever you want with the server side tool result, and then use onFinish to send back both tool results to the server, so that the AI can continue the conversation (this is important for this to work at all, because OpenAI expects tool results for all tool calls it invokes).

On the next roundtrip, the LLM will know the tool results and can come up with text responses or more tool calls.

Please note that this is a WIP and the design might change.

@Archimagus
Copy link
Contributor

Nice

@lgrammel lgrammel changed the title [WIP] mixed parallel client/server tool call/result support for ai/ui & ai/core [WIP] parallel client/server tool invocations for useChat / ai-core May 13, 2024
@lgrammel lgrammel changed the title [WIP] parallel client/server tool invocations for useChat / ai-core client/server tool invocations with useChat and ai/core May 13, 2024
@lgrammel lgrammel changed the title client/server tool invocations with useChat and ai/core client/server tool invocations with useChat and streamText May 13, 2024
@lgrammel lgrammel requested a review from shuding May 13, 2024 14:00
@lgrammel lgrammel marked this pull request as ready for review May 13, 2024 14:00
@lgrammel lgrammel merged commit f617b97 into main May 13, 2024
6 checks passed
@lgrammel lgrammel deleted the lg/ai-stream-tool-calls branch May 13, 2024 17:22
@aychang95
Copy link

@lgrammel this is awesome...
We've been extensively using experimental_ontoolcall for a while now, but i think this PR with v3.1.7 release might be enough to migrate to streamText for the tool invocation objects

@rajdtta
Copy link

rajdtta commented May 13, 2024

When you say "parallel client/server tool calls", would it also support one either: (1) allowing a server-side tool call to call a client-side tool, or (2) passing the result of a server-side tool to the client?

It will support both client and server side tool calls, and they can be mixed and parallel.

Example

You define 2 tools, one that is executed client side (i.e. has no execute method), and one that is executed server-side (i.e. with an execute method). The client side code then receives 3 related events:

  • tool call for client side tool
  • tool call for server side tool
  • tool result for server side tool

On the client, you'd execute the client-side tool, do whatever you want with the server side tool result, and then use onFinish to send back both tool results to the server, so that the AI can continue the conversation (this is important for this to work at all, because OpenAI expects tool results for all tool calls it invokes).

On the next roundtrip, the LLM will know the tool results and can come up with text responses or more tool calls.

Please note that this is a WIP and the design might change.

I've been playing around w/ this branch since it got merged into the main package. Is it intentional that the model doesn't interpret the results of a pure server tool call? (i.e. no client / mixed calls, just server-side tools)

The scenario I'm exploring is where a model is querying a vector DB to find similar documents (for Q&A). Before, the results of the server tool call would be "interpreted" by the model to formulate a coherent response, but right now the model will send back an empty content string in its message (but w/ a populated toolInovcations array containing the call & the raw response).

@lgrammel
Copy link
Collaborator Author

@rajdtta not sure what you mean. here is the example of a pure server side call: https://github.com/vercel/ai/blob/main/examples/next-openai/app/use-chat-tool-result/page.tsx

That being said, feeding pure server-side calls back to the LLM automatically to generate the 2nd assistant response is not implemented yet.

@aychang95
Copy link

I'm assuming you could generate the 2nd assistant response with a callback or automated message workflow. We had a similar system as @rajdtta where we got a textual response on tool calls, but i think the toolInvocations array provides better control even if it's an unintentional tradeoff.

However, I'm sure most users will intuitively expect a coherent text response on server tool calls so it may be good to implement anyways. It shows a very "magic-like" response for anyone who gets tools/function calls working

@rajdtta
Copy link

rajdtta commented May 13, 2024

That being said, feeding pure server-side calls back to the LLM automatically to generate the 2nd assistant response is not implemented yet.

Ah, this was exactly what I was trying to describe haha.

However, I'm sure most users will intuitively expect a coherent text response on server tool calls so it may be good to implement anyways. It shows a very "magic-like" response for anyone who gets tools/function calls working

I agree that it might be good to have it built-in to the library (either as an opt-in or opt-out?). I'm not against making my own impl, but I'd want to make sure that I'm following best practices / not calling the API unnecessarily.


On a side note, has anyone figured out how to persistently save / store tool calls when using streamText? The onFinal callback recommended by the docs returns an empty string whenever a tool call occurs (and ofc will only trigger on the final tool call and not for any intermediaries, which would presumably be caught by onCompletion).

@connorblack
Copy link

connorblack commented May 14, 2024

@lgrammel @aychang95 @rajdtta checking to make sure I understand correctly - so experimental_addToolResult is meant to handle both server-side and client-side implementations for sending the results back to the LLM for the 2nd response, effectively replacing the experimental_onToolCall -> return ChatRequest flow? And the way we handle the 2nd invocation flow is now when the Message that's passed to onFinish contains toolInvocation results, we call experimental_addToolResult, for both client and server tools?

How do we send back the tool results to the LLM in the server-side tool case without direct access to triggerRequest - or does experimental_addToolResult handle de-duplicating the toolInvocation results array and we can do something like this:

  // handle sending response of server-side tool back to LLM
  
  const { experimental_addToolResult } = useChat({
    ....
    onFinish: async (message) => {
      if (
        message.role === 'assistant' &&
        message.toolInvocations &&
        message.toolInvocations.every(
          (toolInvocation) => 'result' in toolInvocation,
        )
      ) {
        return message.toolInvocations.forEach((toolInvocation) => {
          if ('result' in toolInvocation) {
            experimental_addToolResult({
              result: toolInvocation.result,
              toolCallId: toolInvocation.toolCallId,
            })
          }
        })
      }

      return onFinish(message)
    },
  ...
  })

@rajdtta
Copy link

rajdtta commented May 14, 2024

@lgrammel @aychang95 @rajdtta checking to make sure I understand correctly - so experimental_addToolResult is meant to handle both server-side and client-side implementations for sending the results back to the LLM for the 2nd response, effectively replacing the experimental_onToolCall -> return ChatRequest flow? And the way we handle the 2nd invocation flow is now when the Message that's passed to onFinish contains toolInvocation results, we call experimental_addToolResult, for both client and server tools?

How do we send back the tool results to the LLM in the server-side tool case without direct access to triggerRequest - or does experimental_addToolResult handle de-duplicating the toolInvocation results array and we can do something like this:

  // handle sending response of server-side tool back to LLM
  
  const { experimental_addToolResult } = useChat({
    ....
    onFinish: async (message) => {
      if (
        message.role === 'assistant' &&
        message.toolInvocations &&
        message.toolInvocations.every(
          (toolInvocation) => 'result' in toolInvocation,
        )
      ) {
        return message.toolInvocations.forEach((toolInvocation) => {
          if ('result' in toolInvocation) {
            experimental_addToolResult({
              result: toolInvocation.result,
              toolCallId: toolInvocation.toolCallId,
            })
          }
        })
      }

      return onFinish(message)
    },
  ...
  })

From my understanding, experimental_addToolResult is used to pipe back results from client tool invocations back to the LLM (so that it's able to interpret the results and use it to decide its next step).

As for having the LLM interpret server-side tool calls, that's something I'm still trying to figure out in my sandbox. Quickly subbing in the snippet you sent above just causes the model to repeatedly call the server tool in a loop, and trying to use append to add a user message makes the model trigger a duplicate tool call (but interestingly, the result of the duplicate/second tool call doesn't get appended to the toolInvocation object despite it executing on the server-side).

@connorblack
Copy link

@lgrammel have you looked at the runTools helper from the openai-node package? I feel like it could be adapted for the server tool functionality:

@webholics
Copy link

It seems that all tool invocations are now exposed to the client. But what if I don't want this? In my case a tool call is something which happens internally and should not be exposed to the client because it contains private data.

@rajdtta
Copy link

rajdtta commented May 16, 2024

It seems that all tool invocations are now exposed to the client. But what if I don't want this? In my case a tool call is something which happens internally and should not be exposed to the client because it contains private data.

@webholics maybe some sort of flag could be implemented in the tool definition that controls whether a tool response / invocation gets sent to the client (or what aspects are sent to the client, e.g. only sending the tool name, only sending tool response, sending both, etc.)

@webholics
Copy link

webholics commented May 17, 2024

Or some kind of callback on the AI stream to be able to filter out or change messages programmatically.

@lgrammel
Copy link
Collaborator Author

@webholics @rajdtta filtering tool calls needs to be handled very carefully, because it will impact the messages send to the provider in the next roundtrip. If they are left out, it would impact the LLM result, potentially leading to weird answer or in the case of inconsistent history, errors.

@webholics
Copy link

@webholics @rajdtta filtering tool calls needs to be handled very carefully, because it will impact the messages send to the provider in the next roundtrip. If they are left out, it would impact the LLM result, potentially leading to weird answer or in the case of inconsistent history, errors.

I see. Thats why this whole topic is so complex. In pure server side tools we would need some kind of server side persistence for those calls and responses. Which is of course a bit out of scope for the AI SDK.

Its just very likely now to run into security issues because the backend exposes a lot of information per default to the client which the developer may not expect.

@lgrammel
Copy link
Collaborator Author

@webholics if you want to filter manually, you could use .fullStream from the streamText response and formatStreamPart to append to any stream you want. It's more complicated, but gives you full control.

@rajdtta
Copy link

rajdtta commented May 21, 2024

@lgrammel do you happen to know what the best practices are for the following:

  1. Storing server-side tool calls & results persistently as part of the conversation/message history.

Right now, I'm upserting the latest 2 messages sent to the API route whenever it's called (which, most of the time, includes the tool invocations/results). I'm not sure if this will work for a scenario where a tool error occurs & the model decides to re-call a tool after spitting out some filler message (i.e. "Sorry I was having a hard time doing this, let me try again...")

const LATEST_2_MESSAGES = messages.slice(-2);

for (const message of LATEST_2_MESSAGES) {
  await ChatService.postMessage({
    conversationID,
    message: {
      id: message.id,
      role: message.role,
      message: message.content,
      toolInvocations: message.toolInvocations ?? undefined,
    },
  });
}
  1. Piping the result of a server-side tool call back to the LLM so that it can be interpreted by the model.

My current implementation of this involves looking at the tool invocations for the latest message (if it exists), and checking if a specific flag is present in the tool results that determines whether a tool call should be interpreted by the model (and another flag to check whether it has already been processed or not).

const LATEST_MESSAGE = messages.slice(-1);

if (LATEST_MESSAGE && LATEST_MESSAGE.toolInvocations) {
  LATEST_MESSAGE.toolInvocations.forEach((invocation) => {
    if ("result" in invocation && invocation.result.interpretable) {
      if (!invocation.result.interpreted) {
        experimental_addToolResult({
          toolCallId: invocation.toolCallId,
          result: {
            ...invocation.result,
            interpreted: true,
          },
        });
      }
    }
  });
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants