Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use orjson to serialize dict including np.array #8041

Merged
merged 22 commits into from Apr 25, 2024
Merged

Conversation

whitphx
Copy link
Member

@whitphx whitphx commented Apr 16, 2024

Description

gr.JSON causes errors when the value includes NumPy arrays.
There are two different cases:

import gradio as gr
import numpy as np


def gendata():
    return np.zeros((10, 10))

demo = gr.Interface(
    fn=gendata,
    inputs=None,
    outputs="json",
)

demo.launch()
import gradio as gr
import numpy as np

with gr.Blocks() as demo:
    inp = gr.JSON(
        value=np.array([1, 2, 3])
    )
    out = gr.JSON()
    btn = gr.Button("Submit")
    btn.click(lambda x: x, inp, out)

demo.launch()

TODO:

  • Add tests -> Done

@whitphx whitphx requested a review from aliabid94 April 16, 2024 18:22
@gradio-pr-bot
Copy link
Contributor

gradio-pr-bot commented Apr 16, 2024

🪼 branch checks and previews

Name Status URL
Spaces ready! Spaces preview
Website ready! Website preview
Storybook ready! Storybook preview
🦄 Changes detected! Details

Install Gradio from this PR

pip install https://gradio-builds.s3.amazonaws.com/4102b4bc0f1260dedcbbe6ec8d95666b7b615551/gradio-4.27.0-py3-none-any.whl

Install Gradio Python Client from this PR

pip install "gradio-client @ git+https://github.com/gradio-app/gradio@4102b4bc0f1260dedcbbe6ec8d95666b7b615551#subdirectory=client/python"

@gradio-pr-bot
Copy link
Contributor

gradio-pr-bot commented Apr 16, 2024

🦄 change detected

This Pull Request includes changes to the following packages.

Package Version
gradio patch
  • Maintainers can select this checkbox to manually select packages to update.

With the following changelog entry.

Use orjson to serialize dict including np.array

Maintainers or the PR author can modify the PR title to modify this entry.

Something isn't right?

  • Maintainers can change the version label to modify the version bump.
  • If the bot has failed to detect any changes, or if this pull request needs to update multiple packages to different versions or requires a more comprehensive changelog entry, maintainers can update the changelog file directly.

@aliabid94
Copy link
Collaborator

Confirmed that this PR does not degrade performance

@whitphx
Copy link
Member Author

whitphx commented Apr 16, 2024

Thank you! Let me add tests and some more fixes.

@whitphx
Copy link
Member Author

whitphx commented Apr 16, 2024

When an error occurs at the /info endpoint on the server side, the HTTP error is caught here and no visual notification is displayed on the frontend.
I think the error notification to the user should be clearer. What do you think, and how can we do it here @pngwn @hannahblair ?

try {
api = await view_api(config);
} catch (e) {
console.error(`Could not get api details: ${e.message}`);
}

As an example, you can reproduce the error by running the code below on the main branch and it feels like the app crashes silently while we can see the error log on the server-side terminal.
The error itself will be fixed in this PR, but the error notification should be considered anyway?

import gradio as gr
import numpy as np

with gr.Blocks() as demo:
    inp = gr.JSON(
        value=np.array([1, 2, 3])
    )
    out = gr.JSON()
    btn = gr.Button("Submit")
    btn.click(lambda x: x, inp, out)

demo.launch()

@whitphx whitphx marked this pull request as ready for review April 16, 2024 21:48
Copy link
Collaborator

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @whitphx and nice that it does not degrade performance. Just left a few comments on the demos

@@ -0,0 +1,21 @@
import { test, expect } from "@gradio/tootils";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need separate tests for Blocks and Interface since an Interface uses the same server as a Blocks

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added them because each caused a different error actually.

@@ -1,6 +0,0 @@
import gradio as gr
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This demo is used in the docs so we can't delete it

image

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh missed it, sorry.
Revert renaming.

@abidlabs abidlabs added the v: patch A change that requires a patch release label Apr 18, 2024
@freddyaboulton
Copy link
Collaborator

Actually I am doubting why we even need orjson at all?

The docstring of the json component's postprocess says Expects a strfilepath to a file containing valid JSON -- or alistordict that is valid JSON. Why would we expect passing a numpy array to work?

@whitphx
Copy link
Member Author

whitphx commented Apr 19, 2024

Good point.
@aliabid94 and I discussed and said it's a "bug" atm, but it's also an option to leave this as-is as a design, as you wrote.
However, the following code works and it shows the input JSON data properly since the config including a NumPy array is serialized by orjson
so I think this inconsistency is confusing. This can be the answer to your comment (Why would we expect passing a numpy array to work?)

import gradio as gr
import numpy as np


demo = gr.Interface(
    fn=lambda x: x,
    inputs=gr.JSON(value=np.array([1, 2, 3])),
    outputs="json",
)

demo.launch()

CleanShot 2024-04-19 at 16 04 16@2x


If we treat this as a spec following the docstring, not a bug, the changes in this PR will be reverted.
Still, I think the problem of silent crash is a problem anyway.

Copy link
Collaborator

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification @whitphx @abidlabs . Let's use orjson for the serialization but I think we should update the docstring of the json component to say values that are orjson serializable are valid

@@ -0,0 +1,21 @@
import { test, expect } from "@gradio/tootils";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The json_component_blocks demo does not exist so we should remove this test file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry forgot to rename this.
This should be json_component.spec.ts instead of deleting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems identical to the other demo, let's just delete this one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think they are testing the same thing though. If we wanted to make sure the info route works, i think we can manually access that route with playwright?

@abidlabs
Copy link
Member

We can change this to orjson.loads for faster performance:

return json.loads(value)

@whitphx
Copy link
Member Author

whitphx commented Apr 22, 2024

I think we should update the docstring of the json component to say values that are orjson serializable are valid

I'm not sure if we should.
This orjson serializing is kind of a defensive fallback, rather than gr.JSON's spec.
In this sense, I can agree with your original opinion not to care about NumPy here as it's not gr.JSON's spec. However, the samples above were working with gradio<=4.22.0 but started to fail since 4.23.0 due to #7810 (comment) (sorry I didn't post this) so we can see it as a regression. This is my main reason to think we should use orjson here; because it's a regression, but not because it's a spec.

@abidlabs
Copy link
Member

Good point. I'm leaning towards (2) because I think it is clearer what the postprocessing function for each component is doing. But what do you mean by this:

in this example, gr.Checkbox will have a "regression"

Why would there be a regression in gr.Checkbox if we switch gr.JSON to use orjson?

@whitphx
Copy link
Member Author

whitphx commented Apr 23, 2024

Pushed a commit to take the option (2).

Why would there be a regression in gr.Checkbox if we switch gr.JSON to use orjson?

It was an example, and I wanted to say, we have to take care of this problem about all the components whose postprocess pass through the value without value conversion like the current gr.JSON and gr.Checkbox.
If we make only gr.JSON.postprocess use orjson, this problem is solved on it, but the same error still occurs with the sample code below. A NumPy array as gr.Checkbox's value doesn't make so much sense, but technically this example worked before 4.23.0, so it can be seen as a regression. (while we can see such an error for not sense-making values as a benefit.)

def gendata():
    return np.zeros((10, 10))

demo = gr.Interface(
    fn=gendata,
    inputs=None,
    outputs="checkbox",
)

demo.launch()

Anyway it's ok to patch each postprocess to make sure they never return non-JSON-serializable value no matter what value the user-defined fn returns.

In other words, without the globally-applied fallback like the option (1), we have to take care of the postprocess-s of all components.

@whitphx
Copy link
Member Author

whitphx commented Apr 23, 2024

Added a patch to gr.Checkbox.postprocess to void this error for example.

@whitphx
Copy link
Member Author

whitphx commented Apr 23, 2024

Found #7415 did a similar discussion.

@freddyaboulton freddyaboulton added the t: fix A change that implements a fix label Apr 23, 2024
@freddyaboulton
Copy link
Collaborator

Looks good @whitphx ! Thanks for making the fixes! Now that we do the json-handling in each component, we can use python unit tests as opposed to e2e tests?

Comment on lines 99 to 101
if isinstance(value, Iterable):
# Handles the cases of NumPy arrays for instance, in the same way as lists.
return len(value) > 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this. It may technically be a breaking change to not include this, but the behavior when a numpy array is passed into a gr.Checkbox is very unclear and I think this addition will just confuse readers of the codebase.

In fact, when I test bool(np.array([])) it returns False with a warning that in future versions of numpy, it will return an error. So the breaking change is really on numpy's side. I think we can just return bool(value)

Copy link
Member Author

@whitphx whitphx Apr 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may technically be a breaking change to not include this,

Exactly. This is what I meant as a "regression", and I added this commit as an example to clarify Checkbox's behavior in such edge cases.
The options are, (1) to cover as many cases as possible like this, or (2) to support only reasonable cases e.g. only bool value in the case of checkbox and raise an error for other values.
This commit is (1) to avoid the regression, and bool(value) as you wrote is (2). I think (2) is also a reasonable option.
Will take it.

Please note that bool(np.array([1,2])) raises a ValueError,
so if we take the option 2 (bool(value)), the app below still crashes without a clear error message. We will admit it as a design.

def gendata():
    return np.zeros((10, 10))

demo = gr.Interface(
    fn=gendata,
    inputs=None,
    outputs="checkbox",
)

demo.launch()

CleanShot 2024-04-24 at 18 14 13@2x

# Use orjson to convert NumPy arrays and datetime objects to JSON.
# This ensures a backward compatibility with the previous behavior.
# See https://github.com/gradio-app/gradio/pull/8041
return orjson.loads(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool

@abidlabs
Copy link
Member

Looks good to me @whitphx. I agree with @freddyaboulton that it would be better if we could replace the e2e tests with python unit tests now for speed and remove the demo/json_component_interface/run.py as it should not be needed anymore

@@ -83,9 +84,18 @@ def postprocess(self, value: dict | list | str | None) -> dict | list | None:
if value is None:
return None
if isinstance(value, str):
return json.loads(value)
return orjson.loads(value)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abidlabs @freddyaboulton Apart from this PR,
I found this implementation is different from the doc string which says

Expects a str filepath to a file containing valid JSON

while json.loads() expects a JSON string, not a file path (but json.load() does).

Which is the correct, the impl or the doc?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's the implementation @whitphx !

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I updated the docstring about it together.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch @whitphx

@whitphx
Copy link
Member Author

whitphx commented Apr 24, 2024

Updated.

Copy link
Collaborator

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to merge @whitphx ! Thanks for all the hard work on this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to create this as a separate file. We should refactor test_components.py into separate component files to mirror the component .py files.

Copy link
Member Author

@whitphx whitphx Apr 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh I missed test_components.py 😅
Will do the refactoring in a separate PR.
Reminder -> #8125

@@ -38,7 +39,7 @@ def __init__(
):
"""
Parameters:
value: Default value. If callable, the function will be called whenever the app loads to set the initial value of the component.
value: Default value as a valid JSON `str` -- or a `list` or `dict` that can be serialized to a JSON string. If callable, the function will be called whenever the app loads to set the initial value of the component.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this as well.

Copy link
Member

@abidlabs abidlabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixes @whitphx!

@whitphx whitphx merged commit 937c858 into main Apr 25, 2024
7 checks passed
@whitphx whitphx deleted the json-component-numpy branch April 25, 2024 07:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
t: fix A change that implements a fix v: patch A change that requires a patch release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants