Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Orange Table-specific HDF5Reader #6791

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

stuart-cls
Copy link

@stuart-cls stuart-cls commented Apr 25, 2024

Issue

Enable saving/loading the Orange Table data structure from the binary HDF5 container.
Based on the implementation used in the dask branch, but with the dask parts removed.

Related: #6356

Description of changes
  • Add HDF5Reader with both read and write_file methods.
  • Add round-trip test for HDF5Reader
  • Add h5py to requirements
  • Use existing .metadata sidecar file to store table.attributes if present.
Includes
  • Code changes
  • Tests
  • Documentation

@stuart-cls stuart-cls changed the title Orange hdf5 Orange Table-specifc HDF5Reader Apr 25, 2024
@stuart-cls stuart-cls changed the title Orange Table-specifc HDF5Reader Orange Table-specific HDF5Reader Apr 25, 2024
return f[name]
return None

assert 'domain' in f
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to put some attribute in the HDF5 to explicitly identify it as coming from Orange (could also specify which version) rather than just sniffing for "domain". This would improve discoverability, which is always an issue for HDF5 files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, absolutely. We should have an ID field and probably also a version identifier. Do you know how HDF5 files usually do it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some attributes to the root following the NeXus convention (creator, Orange_version, HDF5_Version, h5py_version) Weird capitalization of "HDF5_Version" is apparently intentional.

@stuart-cls
Copy link
Author

I couldn't find a satisfactory solution to the .attributes problem (for the use case I care about) so I re-used the .metadata sidecar files for now, which is better than nothing.

Copy link

codecov bot commented May 20, 2024

Codecov Report

Attention: Patch coverage is 93.67089% with 5 lines in your changes are missing coverage. Please review.

Project coverage is 88.21%. Comparing base (5ada6c4) to head (63c71b3).
Report is 21 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #6791   +/-   ##
=======================================
  Coverage   88.20%   88.21%           
=======================================
  Files         327      327           
  Lines       71223    71301   +78     
=======================================
+ Hits        62825    62900   +75     
- Misses       8398     8401    +3     

@markotoplak
Copy link
Member

Comments from @stuart-cls (from his email, just so that they do not get lost):

  • Table.attributes -> I could try again to properly store this in the HDF5, instead of the .metadata sidecar. I got stuck trying to get the round-trip on a visible image to work :)
  • Compatibility with dask branch: both opening files saved with that branch, and updating the branch to be compatible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants