Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[s3] recursive list of keys on filer store with SQL #5580

Open
wants to merge 24 commits into
base: master
Choose a base branch
from

Conversation

kmlebedev
Copy link
Contributor

@kmlebedev kmlebedev commented May 12, 2024

What problem are we solving?

The registers contain a large number of nested keys, which leads to exponential growth of the query list for each subdirectory.

How are we solving the problem?

Iterating over keys by prefix on the SQL side of the database

How is the PR tested?

S3TEST_CONF=s3tests.local.conf tox -- s3tests_boto3/functional/test_s3.py::test_bucket_listv2_delimiter_basic

all list v2 tests

S3TEST_CONF=s3tests.local.conf tox -- 
s3tests_boto3/functional/test_s3.py::test_bucket_listv2_many
s3tests_boto3/functional/test_s3.py::test_bucket_listv2_delimiter_basic s3tests_boto3/functional/test_s3.py::test_bucket_listv2_encoding_basic s3tests_boto3/functional/test_s3.py::test_bucket_listv2_delimiter_prefix s3tests_boto3/functional/test_s3.py::test_bucket_listv2_delimiter_prefix_ends_with_delimiter s3tests_boto3/functional/test_s3.py::test_bucket_listv2_delimiter_alt s3tests_boto3/functional/test_s3.py::test_bucket_listv2_delimiter_prefix_underscore s3tests_boto3/functional/test_s3.py::test_bucket_listv2_delimiter_percentage s3tests_boto3/functional/test_s3.py::test_bucket_listv2_delimiter_whitespace s3tests_boto3/functional/test_s3.py::test_bucket_listv2_delimiter_dot s3tests_boto3/functional/test_s3.py::test_bucket_listv2_delimiter_unreadable s3tests_boto3/functional/test_s3.py::test_bucket_listv2_delimiter_empty s3tests_boto3/functional/test_s3.py::test_bucket_listv2_delimiter_none s3tests_boto3/functional/test_s3.py::test_bucket_listv2_fetchowner_notempty s3tests_boto3/functional/test_s3.py::test_bucket_listv2_fetchowner_defaultempty s3tests_boto3/functional/test_s3.py::test_bucket_listv2_fetchowner_empty s3tests_boto3/functional/test_s3.py::test_bucket_listv2_delimiter_not_exist s3tests_boto3/functional/test_s3.py::test_bucket_listv2_prefix_basic s3tests_boto3/functional/test_s3.py::test_bucket_listv2_prefix_alt s3tests_boto3/functional/test_s3.py::test_bucket_listv2_prefix_empty s3tests_boto3/functional/test_s3.py::test_bucket_listv2_prefix_none s3tests_boto3/functional/test_s3.py::test_bucket_listv2_prefix_not_exist s3tests_boto3/functional/test_s3.py::test_bucket_listv2_prefix_unreadable s3tests_boto3/functional/test_s3.py::test_bucket_listv2_prefix_delimiter_basic s3tests_boto3/functional/test_s3.py::test_bucket_listv2_prefix_delimiter_alt s3tests_boto3/functional/test_s3.py::test_bucket_listv2_prefix_delimiter_prefix_not_exist s3tests_boto3/functional/test_s3.py::test_bucket_listv2_prefix_delimiter_delimiter_not_exist s3tests_boto3/functional/test_s3.py::test_bucket_listv2_prefix_delimiter_prefix_delimiter_not_exist s3tests_boto3/functional/test_s3.py::test_bucket_listv2_maxkeys_one s3tests_boto3/functional/test_s3.py::test_bucket_listv2_maxkeys_zero s3tests_boto3/functional/test_s3.py::test_bucket_listv2_maxkeys_none s3tests_boto3/functional/test_s3.py::test_bucket_listv2_unordered s3tests_boto3/functional/test_s3.py::test_bucket_listv2_continuationtoken_empty s3tests_boto3/functional/test_s3.py::test_bucket_listv2_continuationtoken s3tests_boto3/functional/test_s3.py::test_bucket_listv2_both_continuationtoken_startafter s3tests_boto3/functional/test_s3.py::test_bucket_listv2_startafter_unreadable s3tests_boto3/functional/test_s3.py::test_bucket_listv2_startafter_not_in_list s3tests_boto3/functional/test_s3.py::test_bucket_listv2_startafter_after_list ```
# Checks
- [ ] I have added unit tests if possible.
- [ ] I will add related wiki document changes and link to this PR after merging.

return lastFileName, fmt.Errorf("findDB %s : %v", dirPath, err)
}
glog.V(5).Infof("ListRecursivePrefixedEntries lastFileName %s shortPath %v, prefix %v, sql %s", lastFileName, string(shortPath), prefix, store.GetSqlListRecursive(bucket))
rows, err := db.QueryContext(ctx, store.GetSqlListRecursive(bucket), startFileName, util.HashStringToLong(string(shortPath)), prefix+"%", string(shortPath)+prefix+"%", limit+1)

Check failure

Code scanning / CodeQL

Database query built from user-controlled sources High

This query depends on a
user-provided value
.
This query depends on a
user-provided value
.
@kmlebedev kmlebedev marked this pull request as ready for review May 28, 2024 11:43
@chrislusf
Copy link
Collaborator

  • The change is internal to the filer store backends. The allowListRecursive option can be avoided. With the data layout change, it is basically a different filer store.
  • The PR is too big to review.

@kmlebedev
Copy link
Contributor Author

kmlebedev commented May 28, 2024

  • The change is internal to the filer store backends. The allowListRecursive option can be avoided.

Hi, I wanted to leave the allowListRecursive option for some time, as an experimental one, since the tests do not cover all possible cases and might break something for someone.

  • With the data layout change, it is basically a different filer store.

I don’t understand what changes we are talking about here, but the data structure in the stores does not change, perhaps indexes will need to be added for SQL.

  • The PR is too big to review.

It is large mainly because of the plug functions for each filer. But I will try to reduce it as much as I can by moving the new Ceph S3 tests fixes into a separate PR and adding new github actions tests.

@chrislusf
Copy link
Collaborator

This would need a secondary index on directory+'/'+name, which can be costly for large directories.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants