Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for storing the "local" pool on Azure #1074

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

refi64
Copy link
Contributor

@refi64 refi64 commented May 19, 2022

Requirements

All new code should be covered with tests, documentation should be updated. CI should pass.

Description of the Change

This adds support for storing the "local" package pool on Azure. It includes:

  • Misc. test fixes from the Python 3 migration when passing --capture.
  • Adding some functional tests for Azure publishing based on the S3 ones, as a foundation for these changes.
  • Changes to the Azure endpoint syntax to match the official Azure connection string format.
  • Some refactoring of Azure publishing to move any code reused by the package pool to a shared file.
  • Tangentially related change to clean up temporary files when mirroring.

I described some of these in further detail in the individual commits.

This does not include changing all references to the "local" package pool, because I'm honestly not sure if that's worthwhile...

Checklist

  • unit-test added (if change is algorithm)
  • functional test added/updated (if change is functional)
  • man page updated (if applicable)
  • bash completion updated (if applicable) (CLI was not touched)
  • documentation updated (I don't believe the changes here would result in changes to the website content)
  • author name in AUTHORS

Copy link
Contributor

@chuan chuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Great to see the new functionality to store the package pool in the remote storage. I will take a more detailed pass of the change later today.

context/context.go Outdated Show resolved Hide resolved
azure/package_pool.go Outdated Show resolved Hide resolved
@randombenj randombenj self-requested a review May 25, 2022 08:24
@refi64
Copy link
Contributor Author

refi64 commented May 25, 2022

Hmm, the Azure logs seem to have some 500 internal server errors from Azurite, I'll look into it locally.

@refi64
Copy link
Contributor Author

refi64 commented May 25, 2022

I can't reproduce the CI failure at all locally, so I added a step to upload Azurite's logs if the tests fail, which should hopefully help clear things up a bit...

@refi64
Copy link
Contributor Author

refi64 commented Jun 2, 2022

Can't use 'tar -xzf' extract archive file: /home/runner/work/_actions/_temp_4c5950e6-f316-4a7c-aa09-7b03239b3a59/3b979214-6f9f-46a7-976a-ed55121cf5e1.tar.gz. return code: 2.

...GitHub Actions issue maybe?

@chuan
Copy link
Contributor

chuan commented Jun 2, 2022

Can't use 'tar -xzf' extract archive file: /home/runner/work/_actions/_temp_4c5950e6-f316-4a7c-aa09-7b03239b3a59/3b979214-6f9f-46a7-976a-ed55121cf5e1.tar.gz. return code: 2.

...GitHub Actions issue maybe?

Where did you see this error?

Since in the CI #137 run only the long test for v1.17 failed, I suspect it is an intermittent issue with Azurite or the runtime environment.

@refi64
Copy link
Contributor Author

refi64 commented Jun 2, 2022

@chuan apologies for the confusion, that was from an earlier run, but I think @randombenj restarted it since then. There was a bug in the way config files were written that I missed, just pushed up a fix.

@chuan
Copy link
Contributor

chuan commented Jun 7, 2022

@refi64 I figured out that the Azurite 500 errors are due to its use of /tmp. The Azure tests in your change should pass the CI if you pull the change here: #1078.

@refi64
Copy link
Contributor Author

refi64 commented Jun 8, 2022

@chuan thanks! I just pushed a new version including that commit to see how it goes

.github/workflows/ci.yml Outdated Show resolved Hide resolved
aptly/interfaces.go Outdated Show resolved Hide resolved
@refi64 refi64 force-pushed the cloud-storage branch 2 times, most recently from 405bf8f to f158b3f Compare June 8, 2022 21:31
@chuan
Copy link
Contributor

chuan commented Jun 9, 2022

The latest change looks good to me and glad to see all CI jobs passed!

@refi64
Copy link
Contributor Author

refi64 commented Jun 13, 2022

Just rebased on top of the merged master incl. the Azurite fixes PR.

@randombenj
Copy link
Member

As this is quite a big change and I am not too familiar with the source yet, this will probably take some time to review ...

@randombenj
Copy link
Member

I am not sure if I understand all the changes yet ...
However, I am wondering a bit why the changes are necessary in the first place?
Couldn't you just mount the azure blob storage to the local disk and then run aptly as is to this directory?
This would be much simpler and aptly wouldn't have to implement all the read write logic?

@chuan
Copy link
Contributor

chuan commented Jun 23, 2022

I am not sure if I understand all the changes yet ... However, I am wondering a bit why the changes are necessary in the first place? Couldn't you just mount the azure blob storage to the local disk and then run aptly as is to this directory? This would be much simpler and aptly wouldn't have to implement all the read write logic?

Mounting Azure blob storage to local disk can usually be done via either Azure Blob NFS support or via FUSE driver, both have their own limitations.

Azure blob NFS protocol only supports limited features of blob storage. For example, users can only use it with a storage account that enables the hierarchical namespace and cannot enable NFS on an existing storage account. More limitations can be found here: https://docs.microsoft.com/en-us/azure/storage/blobs/network-file-system-protocol-known-issues

FUSE implementation (https://github.com/Azure/azure-storage-fuse) uses the same underlying REST APIs as the implementation here. It is targeting more generic use cases, has different requirements, and its implementation is much more complicated because it needs to support the complete set of filesystem APIs. Having our own storage and publish implementation directly based on REST APIs cuts through the middle layer and makes it possible to optimize for our own use cases. For example, the file moving operation is made atomic in the Azure publish support, which is not supported in FUSE.

Similar arguments also apply as to why we need to support publishing to S3 and OpenStack Swift.

@refi64
Copy link
Contributor Author

refi64 commented Jun 24, 2022 via email

Copy link
Member

@randombenj randombenj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this high quality patch! I think it also allows for future implementations in different storage backends.

One thing I was wondering however, what is your main use case for implementing this changes?

azure/package_pool_test.go Outdated Show resolved Hide resolved
azure/package_pool_test.go Show resolved Hide resolved
azure/package_pool_test.go Outdated Show resolved Hide resolved
Comment on lines 107 to 108
if path.startswith('public/'):
path = path[7:]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if path.startswith('public/'):
path = path[7:]
path.removeprefix("public/")

system/azure_lib.py Outdated Show resolved Hide resolved
system/azure_lib.py Show resolved Hide resolved
system/azure_lib.py Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Jun 27, 2022

Codecov Report

Attention: Patch coverage is 61.22449% with 133 lines in your changes are missing coverage. Please review.

Project coverage is 52.10%. Comparing base (b3d9055) to head (5307c1a).
Report is 88 commits behind head on master.

❗ Current head 5307c1a differs from pull request most recent head da0c525. Consider uploading reports for the commit da0c525 to get more accurate results

Files Patch % Lines
azure/package_pool.go 64.96% 33 Missing and 15 partials ⚠️
utils/config.go 34.37% 19 Missing and 2 partials ⚠️
azure/azure.go 76.82% 13 Missing and 6 partials ⚠️
context/context.go 0.00% 18 Missing ⚠️
api/mirror.go 0.00% 16 Missing ⚠️
files/public.go 47.61% 8 Missing and 3 partials ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master    #1074       +/-   ##
===========================================
- Coverage   66.30%   52.10%   -14.21%     
===========================================
  Files         141       75       -66     
  Lines       15908    11491     -4417     
===========================================
- Hits        10548     5987     -4561     
- Misses       4610     4920      +310     
+ Partials      750      584      -166     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

read_path() can read in binary, which the S3 tests don't support (simply
because they don't need it)...but it needs to be able to take the `mode`
argument anyway.

Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
None of the commands' output is ever treated as binary, so we can just
always decode it as text.

Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
Before, a "partial" URL (either "localhost:port" or an endpoint URL
*without* the account name as the subdomain) would be specified, and the
full one would automatically be inferred. Although this is somewhat
nice, it means that the endpoint string doesn't match the official Azure
syntax:

https://docs.microsoft.com/en-us/azure/storage/common/storage-configure-connection-string

This also raises issues for the creation of functional tests for Azure,
as the code to determine the endpoint string needs to be duplicated
there as well.

Instead, it's just easiest to follow Azure's own standard, and then
sidestep the need for any custom logic in the functional tests.

Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
The contents of `os.Stat` are rather fitted towards local package pools,
but the method is in the generic PackagePool interface. This moves it to
LocalPackagePool, and the use case of simply finding a file's size is
delegated to a new, more generic PackagePool.Size() method.

Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
Several sections of the code *required* a LocalPackagePool, but they
could still perform their operations with a standard PackagePool.

Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
This adds support for storing packages directly on Azure, with no truly
"local" (on-disk) repo used. The existing Azure PublishedStorage
implementation was refactored to move the shared code to a separate
context struct, which can then be re-used by the new PackagePool. In
addition, the files package's mockChecksumStorage was made public so
that it could be used in the Azure PackagePool tests as well.

Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
@neolynx neolynx added the needs rebase The PR needs to be rebased on master label Apr 20, 2024
@neolynx neolynx self-assigned this Apr 21, 2024
@neolynx neolynx added PR superseded and removed needs rebase The PR needs to be rebased on master labels Apr 21, 2024
@neolynx
Copy link
Member

neolynx commented Apr 21, 2024

could you check if #1283 works for you ?

especially since a prefix was introduced in azure/public.go ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants