Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add streaming support for runs #5

Open
transitive-bullshit opened this issue Nov 15, 2023 · 2 comments
Open

Add streaming support for runs #5

transitive-bullshit opened this issue Nov 15, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@transitive-bullshit
Copy link
Owner

transitive-bullshit commented Nov 15, 2023

This isn't supported in the official OpenAI API yet, but it was mentioned at the OpenAI dev day that it will be coming soon, possibly via websocket and/or webhook support.

See this related issue in the OpenAI developer community.

The toughest part of this is that the runner is completely disparate from the HTTP server, as it should be, to process thread runs in an async task queue. The runner is responsible for making chat completion calls, which are streamable, so we'd have to either:

  • do some plumbing to connect the runner's execution to the result of the createRun or createThreadAndRun operations, and then pipe the chat completion calls into this stream
  • or we could move the run implementation to not be handled by an async task queue, but rather live within createRun / createThreadAndRun
    • this approach would be quite a bit simpler, but I have a feeling it's the wrong approach long-term, as runs conceptually lend themselves to being decoupled from the HTTP call. this also makes the most sense from a sandboxing perspective, and to keep the HTTP server lightweight without long-running HTTP responses
  • or move to a websocket and/or webhook approach, which is fine in and of itself, but has the huge downside of being completely different from the current SSE streaming that the chat completion API has embraced, and thinking about building apps that would potentially have to support both of these streaming approaches would make me a really sad panda
@transitive-bullshit transitive-bullshit added the enhancement New feature or request label Nov 15, 2023
@dacom-dark-sun
Copy link

dacom-dark-sun commented Jan 13, 2024

Hello! We solved the stream problem bypassing it. Our solution is probably not the best, but it will do as a temporary solution.

What we did: we passed the callback to the runner’s chat model and received chunks with which we update the message in the prism database as updates arrive.

const handleUpdate = async (chunk) => {
          messageText += chunk
          await prisma.message.update({
            where: { id: newMessageId },
            data: { content: [
                {
                  type: 'text',
                  text: {
                    value: messageText,
                    annotations: []
                  }
                }
              ]
            }
          })
        }

const chatCompletionParams: Parameters<typeof chatModel.run>[0] = {
          messages: chatMessages,
          model: assistant.model,
          handleUpdate: handleUpdate,
          tools: convertAssistantToolsToChatMessageTools(assistant.tools),
          tool_choice:
            runSteps.length >= config.runs.maxRunSteps ? 'none' : 'auto',
}       

Also, we slightly changed the order of adding a message entry to the database. So now the answer from the assistant appears immediately, and then is updated.

On the frontend service we use visual character pooling to update new text. So, the frontend service updates the message once a second, but the user sees a smooth set (as far as possible in accordance with the current load).

If our solution is satisfactory, we can prepare a pull request. Or it can wait for more optimal solutions to emerge.

@phact
Copy link

phact commented Jan 16, 2024

Would be interesting to see the community come up with what this should look like from a purely end user UX perspective.

To me the end user would be able to pass stream: True to run creation and get an SSE stream maybe directly or maybe from the message once a new status is reached. Thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants