Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assess the theoretical and practical implementation of a OAI Teachable Agent #7

Open
2 of 3 tasks
MrXandbadas opened this issue Nov 26, 2023 · 0 comments
Open
2 of 3 tasks
Assignees
Labels
enhancement New feature or request

Comments

@MrXandbadas
Copy link
Owner

MrXandbadas commented Nov 26, 2023

OAI Assistants can be provided with a knowledge retrieval skill.

Theoretically this enables the assistant to keep persistent chat related memory across different chats/threads.

To Do:

  • investigate how Knowledge Retrieval work at an assistant level
  • investigate the implementation of Teachable Agents
  • Consider the effectiveness of the implementation

Information:

How it works
The model then decides when to retrieve content based on the user Messages. The Assistants API automatically chooses between two retrieval techniques:

it either passes the file content in the prompt for short documents, or
performs a vector search for longer documents
Retrieval currently optimizes for quality by adding all relevant content to the context of model calls. We plan to introduce other retrieval strategies to enable developers to choose a different tradeoff between retrieval quality and model usage cost.

Uploading files for retrieval
Similar to Code Interpreter, files can be passed at the Assistant-level or at the Thread-level

# Upload a file with an "assistants" purpose
file = client.files.create(
  file=open("knowledge.pdf", "rb"),
  purpose='assistants'
)

# Add the file to the assistant
assistant = client.beta.assistants.create(
  instructions="You are a customer support chatbot. Use your knowledge base to best respond to customer queries.",
  model="gpt-4-1106-preview",
  tools=[{"type": "retrieval"}],
  file_ids=[file.id]
)

Files can also be added to a Message in a Thread. These files are only accessible within this specific thread. After having uploaded a file, you can pass the ID of this File when creating the Message:

message = client.beta.threads.messages.create(
  thread_id=thread.id,
  role="user",
  content="I can not find in the PDF manual how to turn off this device.",
  file_ids=[file.id]
)

Maximum file size is 512MB. Retrieval supports a variety of file formats including .pdf, .md, .docx and many more. More details on the file extensions (and their corresponding MIME-types) supported can be found in the Supported files section below.

Deleting files
To remove a file from the assistant, you can detach the file from the assistant:

file_deletion_status = client.beta.assistants.files.delete(
  assistant_id=assistant.id,
  file_id=file.id
)

Detaching the file from the assistant removes the file from the retrieval index as well.

File citations
When Code Interpreter outputs file paths in a Message, you can convert them to corresponding file downloads using the annotations field. See the Annotations section for an example of how to do this.

{
    "id": "msg_abc123",
    "object": "thread.message",
    "created_at": 1699073585,
    "thread_id": "thread_abc123",
    "role": "assistant",
    "content": [
      {
        "type": "text",
        "text": {
          "value": "The rows of the CSV file have been shuffled and saved to a new CSV file. You can download the shuffled CSV file from the following link:\n\n[Download Shuffled CSV File](sandbox:/mnt/data/shuffled_file.csv)",
          "annotations": [
            {
              "type": "file_path",
              "text": "sandbox:/mnt/data/shuffled_file.csv",
              "start_index": 167,
              "end_index": 202,
              "file_path": {
                "file_id": "file-abc123"
              }
            }
          ]
        }
      }
    ],
    "file_ids": [
      "file-abc456"
    ],
        ...
  },

Supported files

For text/ MIME types, the encoding must be one of utf-8, utf-16, or ascii.

FILE FORMAT MIME TYPE CODE INTERPRETER RETRIEVAL
.c text/x-c
.cpp text/x-c++
.csv application/csv
.docx application/vnd.openxmlformats-officedocument.wordprocessingml.document
.html text/html
.java text/x-java
.json application/json
.md text/markdown
.pdf application/pdf
.php text/x-php
.pptx application/vnd.openxmlformats-officedocument.presentationml.presentation
.py text/x-python
.py text/x-script.python
.rb text/x-ruby
.tex text/x-tex
.txt text/plain
.css text/css
.jpeg image/jpeg
.jpg image/jpeg
.js text/javascript
.gif image/gif
.png image/png
.tar application/x-tar
.ts application/typescript
.xlsx application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
.xml application/xml or "text/xml"
.zip application/zip
@MrXandbadas MrXandbadas added the enhancement New feature or request label Nov 26, 2023
@MrXandbadas MrXandbadas self-assigned this Nov 26, 2023
@MrXandbadas MrXandbadas added this to the Autogen Implementation milestone Nov 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant