Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Improved async api for arrow1 #393

Closed
wants to merge 2 commits into from
Closed

Conversation

kylebarron
Copy link
Owner

Improvements

  • No need for content length (closes HEAD-ache requests  #272). I was able to verify in the browser console a fetch for Range: bytes=-8 and then secondly Range: bytes=-9772 for the full metadata (testing in this notebook).
  • Merge adjacent byte ranges into a single request (Closes Request batching  #392). I was able to see that table = await parquetFile.read_row_group(0) made just a single request!
  • New class-based API with AsyncParquetFile (for Async API refactor #215). I think this is cleaner and easier to use, and on the Rust side the only data stored in the class is the file metadata, arrow schema, and reqwest client. So if the user forgets to call .free(), not a ton of memory will leak.

Todo:

  • Use this API automatically under the hood for streaming
  • Concurrent row group fetches? Or should I leave that to the user to call Promise.all from the JS side? It's probably non-trivial to implement support for this in wasm
  • Ability to turn off arrow's re-chunking creating batches of 1024 rows by default is way too small imo.

cc @H-Plus-Time

@kylebarron
Copy link
Owner Author

kylebarron commented Jan 25, 2024

Superseded by #407

@kylebarron kylebarron closed this Jan 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HEAD-ache requests Request batching
1 participant