-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataset not found during frequent writes #2338
Labels
bug
Something isn't working
Comments
3 tasks
wjones127
added a commit
that referenced
this issue
May 29, 2024
Fixes #2338 Partially addresses #2318 **For a dataset on local SSD with 8,000 versions, we get 6x faster load time and 3x faster append.** * Added special code path for local filesystem for finding latest manifest. This path skips the `metadata` call for paths that aren't relevant, both fixing #2338 and improving performance on local filesystems overall. * Fixed code path where we were reading the manifest file twice * Changed `CloudObjectReader` and `LocalFileReader` to both cache the file size, so we aren't making multiple calls to get the size of the same object/file. Also allowed passing the size when opening, in case we already have it from a list operation. * Deprecated some more methods for loading a dataset, in favor of using `DatasetBuilder`. Also consolidated the implementations to use `DatasetBuilder`, so we have fewer code paths to worry about and test. ## TODO * [x] Cleanup * [x] Add IO unit test for loading a dataset * [x] Check repro from 2318 --------- Co-authored-by: Weston Pace <weston.pace@gmail.com>
eddyxu
pushed a commit
that referenced
this issue
May 29, 2024
Fixes #2338 Partially addresses #2318 **For a dataset on local SSD with 8,000 versions, we get 6x faster load time and 3x faster append.** * Added special code path for local filesystem for finding latest manifest. This path skips the `metadata` call for paths that aren't relevant, both fixing #2338 and improving performance on local filesystems overall. * Fixed code path where we were reading the manifest file twice * Changed `CloudObjectReader` and `LocalFileReader` to both cache the file size, so we aren't making multiple calls to get the size of the same object/file. Also allowed passing the size when opening, in case we already have it from a list operation. * Deprecated some more methods for loading a dataset, in favor of using `DatasetBuilder`. Also consolidated the implementations to use `DatasetBuilder`, so we have fewer code paths to worry about and test. ## TODO * [x] Cleanup * [x] Add IO unit test for loading a dataset * [x] Check repro from 2318 --------- Co-authored-by: Weston Pace <weston.pace@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
We got a user report where they got Dataset not found randomly. This is happening on AWS NFS. 1 This had the error:
It seems to be failing on getting metadata for a temporary manifest file.
I think where this is happening is we are using
list_with_delimeter
to get the manifest versions.lance/rust/lance-table/src/io/commit.rs
Lines 89 to 91 in b61850b
The implementation of
LocalFileSystem::list_with_delimeter
callsWalkDir
and then various metadata operations on those file references 2. My suspicion is the originalWalkDir
call sometimes picks up the temporary manifests, but by the time they are calling the metadata operations they have been deleted.There's two possible solutions to this:
head
to check if there are any newer versions. This is what I lean towards, since it will also help with perf bug: Inserting data is O(num versions) #2318. See also Replace _lastest.manifest and change manifest naming scheme #1362.Footnotes
https://discord.com/channels/1030247538198061086/1237660577749274684/1239179004364456039 ↩
https://github.com/apache/arrow-rs/blob/d17b2067d637c563eb315571e719807a6f482784/object_store/src/local.rs#L554-L567 ↩
The text was updated successfully, but these errors were encountered: