New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python] TypeError: Object of type int64 is not JSON serializable when converting pandas to arrow table #41625
Comments
Hi, thank you for opening an issue @djouallah! I have been able to reproduce on my dev environment. For next time, it will be much easier to help if you present a simple reproducible example. The google colab you have linked has lots (lots!) of code not connected to the issue and I was very reluctant at first to download files and manipulate them but did so after taking time and checking the source and all the code. Here is a on I created that shows the issue: >>> import pyarrow as pa
>>> data = {'UNIT': ["DUNIT", "DUNIT", "DUNIT", "DUNIT"],
... 'version' : [1, 1, 3, 3]}
>>> df = pd.DataFrame(data)
>>> df.index = df['version']
>>> df.columns.name = np.int64(142564) ------> The issue is here, numpy int64 column index name
>>> df
142564 UNIT version
version
1 DUNIT 1
1 DUNIT 1
3 DUNIT 3
3 DUNIT 3
>>> pa.Table.from_pandas(df)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pyarrow/table.pxi", line 4559, in pyarrow.lib.Table.from_pandas
arrays, schema, n_rows = dataframe_to_arrays(
File "/Users/alenkafrim/repos/arrow/python/pyarrow/pandas_compat.py", line 635, in dataframe_to_arrays
pandas_metadata = construct_metadata(
^^^^^^^^^^^^^^^^^^^
File "/Users/alenkafrim/repos/arrow/python/pyarrow/pandas_compat.py", line 257, in construct_metadata
b'pandas': json.dumps({
^^^^^^^^^^^^
File "/opt/homebrew/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/__init__.py", line 231, in dumps
return _default_encoder.encode(obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/encoder.py", line 200, in encode
chunks = self.iterencode(o, _one_shot=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/encoder.py", line 258, in iterencode
return _iterencode(o, 0)
^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/encoder.py", line 180, in default
raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type int64 is not JSON serializable The code worked if I remove the column name >>> df.columns.name = None
>>> pa.Table.from_pandas(df)
pyarrow.Table
UNIT: string
version: int64
__index_level_0__: int64
----
UNIT: [["DUNIT","DUNIT","DUNIT","DUNIT"]]
version: [[1,1,3,3]]
__index_level_0__: [[1,1,3,3]] It would have also worked if python int type would have been used instead of |
The conclusion is that the issue can be fixed by renaming the column index name or removing it completely. I do not think checking for numpy types in the column index names and converting them to a python type is something that would fit here. As I do not think this is a bug I will change the label type to |
@AlenkaF thanks a lot |
Closing. Feel free to reopen in case there is any further questions! |
Describe the bug, including details regarding any error messages, version, and platform.
I am getting an error when converting from Pandas to arrow, I added a reproducible example here
https://colab.research.google.com/drive/1uPOv8qyj5xW4XfrnkLtZtYIaaQfhupjG#scrollTo=7USg9dd-1ivc
Component(s)
Python
The text was updated successfully, but these errors were encountered: