Skip to content

deepily/genie-in-the-box

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Performant Vox 2 Vox Agents

YOU KNOW THE DREAM

Talk to the computer, and it tells you, or does, something useful.

YOU PROBABLY KNOW THE PROBLEM

Currently, AI Agents & Chat Bots are slow and expensive. They make silly mistakes. They're forgetful. And they work too hard reinventing the wheel.

WHAT MOST PEOPLE PROBABLY DON'T REALIZE

Even the simplest of vox in & vox out UX -- especially when coupled with agentic behaviors -- is hard. It's asynchronous, and usually frustratingly slow. It's a new way of interacting with computers, which requires a global re-thinking of how different the UI control and display modalities interact.

DEEPILY HAS IS WORKING ON A SOLUTIONs

I'm working on helping Agents remember what problems they've already solved, or if they've solved something semantically synonymous or computationally analogous before.

THE RESULT

Fast, real time responses, asynchronous callbacks for big jobs, and more natural, human-like interaction. You will want to talk to your computer!

THE VIEW FROM 30,000 FT

There are two ways to answer a question when using agentic vox 2 vox: The fast, or agonizingly slow, way. THE VIEW FROM 30,000 FT The green dotted lines and boxes are the quickest way through this flow chart (Deepily.ai Agents), the red dotted lines and boxes take anywhere from 100 to 200 times longer to execute (ChatGPT & LangChain).

CURRENT FOCUS

I'm currently working on

  1. Agentic learning (code refactoring) based on previously solved problems stored in long-term memory
  2. Using query-to-function mapping similar to what ChatGPT is doing, and
  3. Providing human in the loop feedback when agents go awry

THE PRESENT REALITY

  1. I can perform basic browsing tasks with Firefox using my voice
  2. I can edit, spellcheck and proofread documents using my voice
  3. I can also interact with PyCharm using my voice

THE (NEAR) FUTURE PLAN: EOY 2023

  1. Interact seamlessly, asynchronously and in real time, with calendaring and TODO list apps using my voice
  2. Do the same with a web research assistant to replace what I'm doing manually with ChatGPT
  3. Have my agents speak to me with any of my favorite character voices in multiple languages
  4. Host my own internal LLM server for privacy and security

THE (FAR) FUTURE DREAM: 2024

  1. Interact with my agents, servers & computers using my voice, and have it do what I want it to do, when & how I want it done. I'm not asking for much, am I?
  2. Safely and securely, of course
  3. World peace, non X, and all that too

DISCLAIMER

This Genie-in-the-box project is currently an extremely large set of working sketches which I am actively organizing & tidying up so that I can collaborate with others.

So, I'm not there yet, obviously. But I'm working on it and getting closer every day.

Interested?

Begin!

About

Genie in the Box: Distill Whisper STT => Mistral-7B => Phind/Phind-CodeLlama-34B-v2 => GPT 3.5 => Coqui's TTS/OpenAI TTS

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published