New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Support multi-modal input and multi-modal output in one agent #529
base: master
Are you sure you want to change the base?
Conversation
Important Auto Review SkippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears that your code hasn't passed the CI/CD tests. I kindly suggest taking a look at https://github.com/camel-ai/camel/blob/master/CONTRIBUTING.md for guidance. Once you've resolved the formatting issues, mypy checks, and ensured that pytest is successful, hope you could resubmit your code. Thank you! @Zhoues
Additionally, it seems that this PR encompasses multiple features. I'm considering that it might be helpful if you could propose a roadmap and then break it down into several PRs for submission and review by others. This approach could facilitate a more thorough review process and ensure each feature receives the attention it deserves. Thank you for your consideration!
|
||
|
||
def image_path_to_base64(image_path): | ||
with open(image_path, "rb") as image_file: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with open(image_path, "rb") as image_file: | |
with open(image_path, "rb") as image_file: |
def get_dalle_img(model: str, prompt: str, size: str, quality: str, n: int) -> str: | ||
"""Generate an image using OpenAI's DALL-E model. | ||
Args: | ||
model (str): The specific DALL-E model to use for image generation, including "dall-e-3" and "dall-e-2". Defaults to "dall-e-3". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add the dall-e as some fixed model types to https://github.com/camel-ai/camel/blob/master/camel/types/enums.py
|
||
# use local path | ||
cache = Cache(".cache/") | ||
key = (model, prompt, size, quality, n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this key be unique and why we need to cache? IMO each generation can have some randomness?
return None | ||
|
||
|
||
def get_dalle_img(model: str, prompt: str, size: str, quality: str, n: int) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a image path for the generated image
def get_dalle_img(model: str, prompt: str, size: str, quality: str, n: int) -> str: | |
def get_dalle_img(model: str, prompt: str, size: str, quality: str, n: int, image_path: str) -> str: |
|
||
class ImageCraftPromptTemplateDict(TextPromptDict): | ||
ASSISTANT_PROMPT = TextPrompt( | ||
"""You are tasked with creating an original image based on the provided descriptive captions. Please use your imagination and artistic capabilities to visualize and draw the images and explain what you are thinking about.""") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
"""You are tasked with creating an original image based on the provided descriptive captions. Please use your imagination and artistic capabilities to visualize and draw the images and explain what you are thinking about.""") | |
"""You are given the task of generating an original image based on the descriptive captions. Please use your creativity and artistic skills to visualize and create an image with your thought process.""") |
Description
Mainly implements 4 multi-modal parts:
Motivation and Context
Part of #454
also will fix #541
Types of changes
What types of changes does your code introduce? Put an
x
in all the boxes that apply:Implemented Tasks
FunctionCallingVisionConfig
is built to adapt theChatGPTVisionConfig
Checklist
Go over all the following points, and put an
x
in all the boxes that apply.If you are unsure about any of these, don't hesitate to ask. We are here to help!