Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for nested structured in GeminiTool #221

Open
willbakst opened this issue May 8, 2024 · 7 comments
Open

Add support for nested structured in GeminiTool #221

willbakst opened this issue May 8, 2024 · 7 comments
Assignees
Labels
Feature Request New feature or request

Comments

@willbakst
Copy link
Contributor

Is your feature request related to a problem? Please describe.

"Unfortunately Google's Gemini API cannot handle nested structures "

Where does this restriction originate from? Is this just something not yet implemented, or is it inherent to gemini?

Originally posted by @barapa in https://github.com/Mirascope/mirascope/discussions/219

Describe the solution you'd like
Add support to GeminiTool to properly structured nested definitions to match the Open API 3.0.3 Parameter Object that Gemini supports.

Parameter Object: https://spec.openapis.org/oas/v3.0.3#parameter-object
Schema Object: https://spec.openapis.org/oas/v3.0.3#schema-object
Reference Object: https://spec.openapis.org/oas/v3.0.3#reference-object

@willbakst willbakst added Feature Request New feature or request good first issue Good for newcomers labels May 8, 2024
@willbakst
Copy link
Contributor Author

@barapa do you have any interest in taking this on?

@barapa
Copy link
Contributor

barapa commented May 9, 2024

It appears that it doesn't really follow the Open API 3.0.3 spec, but rather a "subset" supported by their FunctionDefinition proto. The hard part is the Schema proto, which I'm reproducing below.

A few challenges I have encountered so far, when trying to convert the pydantic model's model_json_schema to conform to this Schema proto:

  • Remove $def fields
  • Remove $ref fields
  • Remove AnyOf
  • Remove AllOf

WIP PR here: #222

class Schema(proto.Message):
    r"""The ``Schema`` object allows the definition of input and output data
    types. These types can be objects, but also primitives and arrays.
    Represents a select subset of an `OpenAPI 3.0 schema
    object <https://spec.openapis.org/oas/v3.0.3#schema>`__.


    .. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields

    Attributes:
        type_ (google.ai.generativelanguage_v1beta.types.Type):
            Required. Data type.
        format_ (str):
            Optional. The format of the data. This is
            used only for primitive datatypes. Supported
            formats:

             for NUMBER type: float, double
             for INTEGER type: int32, int64
        description (str):
            Optional. A brief description of the
            parameter. This could contain examples of use.
            Parameter description may be formatted as
            Markdown.
        nullable (bool):
            Optional. Indicates if the value may be null.
        enum (MutableSequence[str]):
            Optional. Possible values of the element of Type.STRING with
            enum format. For example we can define an Enum Direction as
            : {type:STRING, format:enum, enum:["EAST", NORTH", "SOUTH",
            "WEST"]}
        items (google.ai.generativelanguage_v1beta.types.Schema):
            Optional. Schema of the elements of
            Type.ARRAY.

            This field is a member of `oneof`_ ``_items``.
        properties (MutableMapping[str, google.ai.generativelanguage_v1beta.types.Schema]):
            Optional. Properties of Type.OBJECT.
        required (MutableSequence[str]):
            Optional. Required properties of Type.OBJECT.
    """

    type_: "Type" = proto.Field(
        proto.ENUM,
        number=1,
        enum="Type",
    )
    format_: str = proto.Field(
        proto.STRING,
        number=2,
    )
    description: str = proto.Field(
        proto.STRING,
        number=3,
    )
    nullable: bool = proto.Field(
        proto.BOOL,
        number=4,
    )
    enum: MutableSequence[str] = proto.RepeatedField(
        proto.STRING,
        number=5,
    )
    items: "Schema" = proto.Field(
        proto.MESSAGE,
        number=6,
        optional=True,
        message="Schema",
    )
    properties: MutableMapping[str, "Schema"] = proto.MapField(
        proto.STRING,
        proto.MESSAGE,
        number=7,
        message="Schema",
    )
    required: MutableSequence[str] = proto.RepeatedField(
        proto.STRING,
        number=8,
    )

@willbakst
Copy link
Contributor Author

willbakst commented May 9, 2024

Oh wow super annoying that it doesn't support the spec fully :(

Took a brief look at the PR, looking good! Left some minor comments/questions taking WIP into account :)

@willbakst
Copy link
Contributor Author

willbakst commented May 9, 2024

Noticed your comment in the PR (#222 (comment))

I'm totally fine with putting this on pause given the difference from the Open API spec if you think it's not worth the time/effort. Otherwise we'll likely still want to raise value errors if we find something that isn't supported (e.g. instead of removing AnyOf we should just throw an error so the user knows it isn't supported rather than silently change things).

Thoughts?

@barapa
Copy link
Contributor

barapa commented May 9, 2024

I don't think there is anything inherently necessary about AnyOf an AllOf. In both cases, I think you could re-write them to fit their Schema without losing the semantics. AnyOf appears to just mean they are all nullable. AllOf just means they are all required. However, the conversion isn't trivial.

But, I'm not convinced that with Gemini it wouldn't be more effective to simply prompt the model in JSON mode, providing the spec in the prompt.

They do have a mechanism of converting a function that has a dataclass as a parameter into their required object. Take a look at https://github.com/google-gemini/generative-ai-python/blob/e09e7f242abcabe1bda28168be58a751ccdc5c03/tests/test_content.py#L393.

But it doesn't work with pydantic objects.

@barapa
Copy link
Contributor

barapa commented May 9, 2024

I have done some testing (with the vertex ai version of the gemini API) and found that setting it to JSON mode and providing the full json schema in the system prompt works consistently.

@willbakst - do you have any thoughts on how we could create an extractor that doesn't make use of Tools?

@willbakst
Copy link
Contributor Author

Ok I think we should go down the json path then. In this case, we should handle it like we do for other model providers through a json_mode equivalent and leave the tool calling functionality (with the ValueError) the same except update the error message to mention using json_mode if using nested structured.

For reference, Anthropic doesn't have an official json mode, so we do something there similar to what you'll need to do here:

if self.call_params.response_format == "json":
if system_message:
system_message += "\n\n"
system_message += "Response format: JSON."
messages.append(
{
"role": "assistant",
"content": "Here is the JSON requested with only the fields "
"defined in the schema you provided:\n{",
}
)
if "tools" in kwargs:
tools = kwargs.pop("tools")
messages[-1]["content"] = (
"For each JSON you output, output ONLY the fields defined by these "
"schemas. Include a `tool_name` field that EXACTLY MATCHES the "
"tool name found in the schema matching this tool:"
"\n{schemas}\n{json_msg}".format(
schemas="\n\n".join([str(tool) for tool in tools]),
json_msg=messages[-1]["content"],
)
)

I'm also noticing that it looks like when we switched from XML to the new beta tools with anthropic we broke json mode for standard tool use, which I will be looking into now separately (it still works for streaming tools though)

if chunk.response_format != "json":

A good reference for taking the JSON mode output for extraction in the meantime would be how we handle it for OpenAI:

else:
# Note: we only handle single tool calls in JSON mode.
tool_type = self.tool_types[0]
return [
tool_type.from_tool_call(
ChatCompletionMessageToolCall(
id="id",
function=Function(
name=tool_type.__name__, arguments=self.content
),
type="function",
)
)
]

@willbakst willbakst changed the title [FEATURE REQUEST] Add support for nested structured in GeminiTool Add support for nested structured in GeminiTool May 10, 2024
@willbakst willbakst removed the good first issue Good for newcomers label May 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature Request New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants