Advanced Proxy Integration for Consumption Management and Caching in TaskingAI #105

leandrosilvaferreira · 2024-04-15T20:47:39Z

Is your feature request related to a problem? Please describe.

Currently, when developing applications that rely on large language models (LLMs) with TaskingAI, we face challenges in efficiently managing token consumption and optimizing costs through caching. The inability to integrate proxy solutions, such as Helicone, LangSmith, LangFuse, and Lunary, limits our ability to monitor token usage by project, assistant, and user, as well as to reuse responses to reduce costs with paid models.

Describe the solution you'd like

I would like TaskingAI to implement functionalities that would allow easy configuration of proxies for LLM models. This would include the ability to replace the base URL of the LLM model API and add specific authentication and configuration headers in requests to the models. This functionality would enable the use of market solutions like Helicone, LangSmith, LangFuse, and Lunary to:

Intercept all requests to LLM models, enabling detailed tracking of token consumption by project, assistant, and user.
Implement caching for LLM model responses, significantly reducing operational costs with paid models.
Configure custom rate limits, retry attempts, and optimize latency through advanced request management.

Describe alternatives you've considered

Given TaskingAI's current limitations in not allowing the customization of LLM model configurations, traditional workarounds or alternative solutions are not feasible. Without the ability to modify the base URL and add necessary headers for authentication and configuration, integrating third-party proxy solutions to manage token consumption and caching is impossible. This lack of flexibility significantly hampers our ability to optimize and manage costs effectively, leaving us without viable alternatives.

Additional context

Below are examples of how the mentioned proxy solutions can currently be configured, demonstrating the simplicity and effectiveness of these integrations:

Example of Helicone usage:

import openai

openai.api_base = "https://gateway.hconeai.com"

openai.ChatCompletion.create(
  model="[DEPLOYMENT]",
  messages=[{"role": "User", "content": "Say hi!"}],
  headers={
    "Helicone-Auth": "Bearer [HELICONE_API_KEY]",
    "Helicone-Target-Url": "https://api.lemonfox.ai",
    "Helicone-Target-Provider": "LemonFox",
  }
)

Example of Lang Smith usage:

docker run -p 8080:8080 -e LANGCHAIN_API_KEY=<your_langsmith_api_key> -e docker.io/langchain/langsmith-proxy:latest -p 8080:8080

import time
import openai

OPENAI_API_URL = "http://localhost:8080/proxy/openai"

client = openai.Client(api_key="", base_url=OPENAI_API_URL)
start = time.time()
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Write a poem about artificial intelligence."}
    ],
)
print(response)
print(f"Time taken: {time.time() - start}")

Integrating with these proxy solutions would not only optimize resource use and costs in our projects but also significantly improve the management and scalability of applications developed with TaskingAI.

SimsonW · 2024-04-16T13:10:21Z

Thank you very much for your suggestions! After discussion, we have included your requirements in our development plan and expect to launch it in a few months. Your suggestions are incredibly valuable; thank you once again. :)

zeahoo · 2024-05-21T07:29:46Z

Thank you very much for your suggestions! After discussion, we have included your requirements in our development plan and expect to launch it in a few months. Your suggestions are incredibly valuable; thank you once again. :)

Hello, is there any progress on this issue at the moment?

jameszyao · 2024-05-23T15:15:14Z

@zeahoo we are testing it internally now. the feature is expected to be released in early June :-)

SimsonW self-assigned this Apr 16, 2024

SimsonW added the enhancement New feature or request label Apr 16, 2024

SimsonW pinned this issue Apr 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Advanced Proxy Integration for Consumption Management and Caching in TaskingAI #105

Advanced Proxy Integration for Consumption Management and Caching in TaskingAI #105

leandrosilvaferreira commented Apr 15, 2024

SimsonW commented Apr 16, 2024

zeahoo commented May 21, 2024

jameszyao commented May 23, 2024

Advanced Proxy Integration for Consumption Management and Caching in TaskingAI #105

Advanced Proxy Integration for Consumption Management and Caching in TaskingAI #105

Comments

leandrosilvaferreira commented Apr 15, 2024

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

SimsonW commented Apr 16, 2024

zeahoo commented May 21, 2024

jameszyao commented May 23, 2024