Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propagate storage_options and other read parameters when pickling LanceFragment #2280

Open
wjones127 opened this issue May 1, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@wjones127
Copy link
Contributor

No description provided.

@wjones127 wjones127 added the enhancement New feature or request label May 1, 2024
@wjones127
Copy link
Contributor Author

I'm actually not sure we should do this. We found when debugging ray-project/ray#45392 that pickling a list of LanceFragment is terribly inefficient. On unpickling: each fragment will load an independent copy of the LanceDataset from disk, using up a whole lot of IO and memory.

Instead, we should probably encourage users to send just the fragment ids and the dataset across processes. We might consider deprecating pickling LanceFragment in favor of this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant