New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Huggingface agent #2599
base: main
Are you sure you want to change the base?
Huggingface agent #2599
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2599 +/- ##
===========================================
- Coverage 33.11% 19.10% -14.02%
===========================================
Files 86 88 +2
Lines 9108 9444 +336
Branches 1938 2173 +235
===========================================
- Hits 3016 1804 -1212
- Misses 5837 7524 +1687
+ Partials 255 116 -139
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@whiskyboy thanks for the PR! I had a couple of design questions and wanted your opinion on them. Autogen has an image generation capability, which allows anyone to add text-to-image capabilities to any LLM.
What do you think about implementing a new custom
For image-to-text, we also have a capability called
|
@WaelKarkoub Thanks for your comment!
|
@whiskyboy This is very cool and I appreciate your efforts! Your reasoning fits well with what I think now. Both approaches could be beneficial to the autogen community and could coexist. We can have standalone huggingface conversible agents as well as huggingface image generators, audio generators, etc. I look at Autogen as a lego world where users can mix and match different useful tools (lego pieces), and the tools you've developed are valuable and versatile enough to be applicable across many areas (e.g., agent capabilities). For a concrete example, what do you think about breaking down the text-to-image functionality and implementing it as an One last question, is the image-to-image capability the same as image editing? If so, I'm considering improving the image generator capability to allow for this. |
@WaelKarkoub It's glad to know we are working towards the same goal!
Sounds like a versatile lego block that could be utilized by both standalone agents and agent capabilities? I think it's a good idea! As it could enhance the function reusability, and make the code more readable and maintainable.
Yes, some typical user scenarios include style transfer, image inpainting, etc. For instance, the |
@WaelKarkoub @BeibinLi minding take a review of this PR? I'll add the documentation and tests once you approve the design. |
@self._user_proxy.register_for_execution() | ||
@self._assistant.register_for_llm( | ||
name=HuggingFaceCapability.TEXT_TO_IMAGE.name, | ||
description="Generates images from input text.", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the idea behind using function registration instead of using the text analyzer agent?
self._assistant = AssistantAgent( | ||
self.name + "_inner_assistant", | ||
system_message=system_message, | ||
llm_config=inner_llm_config, | ||
is_termination_msg=lambda x: False, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may have to expose these two agents to the public by initializing them in the constructor for a couple of reasons:
- Users can apply transform messages capability to limit token count by either truncation or compression.
- Expose to the users that we'll be making extra API calls
from autogen.agentchat.contrib import img_utils | ||
|
||
|
||
class HuggingFaceClient: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this meant to be a model client?
Line 64 in 19de99e
class ModelClient(Protocol): |
Why are these changes needed?
Introducing a new agent named
HuggingFaceAgent
which can connect to models in HuggingFace Hub to achieve several multimodal capabilities.This agent essentially consists of a pairing between an assistant and a user-proxy agent, both are registered with the huggingface-hub models capabilities. Users could seamlessly access this agent to leverage its multimodal capabilities, without the need for manual registration of toolkits for execution.
Some key changes:
HuggingFaceClient
class inautogen/agentchat/contrib/huggingface_utils.py
: this class simplifies calling HuggingFace models locally or remotely.HuggingFaceAgent
class inautogen/agentchat/contrib/huggingface_agent.py
: this agent utilizesHuggingFaceClient
to achieve multimodal capabilities.HuggingFaceImageGenerator
class inautogen/agentchat/contrib/capabilities/generate_images.py
: this class enables text-based LLMs to generate images usingHuggingFaceClient
.Related issue number
The second approach mentioned in #2577
Checks