New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create cache for GitHub Actions #1213
Comments
I'm currently evaluating running actions on ubicloud and ran into an issue with the actions cache. Our bundler cache is ~650MB, and the download from the default cache location runs at 8-10 MB/s. The cache restore takes ~90s. A clean install takes ~95s, so there is very little benefit to caching at the moment. |
I'm almost sold on using Ubicloud over other alternatives like e.g. https://github.com/runs-on/runs-on but I'm missing this important caching feature and it seems like they implemented it or at least part of it. I hope Ubicloud implements it 🙏. If you can already share a rough timeline of that feature that would be helpful |
Thanks for the support with GitHub Actions. I have a slightly different use case concerning caching. We use the bazel remote cache server to cache bazel builds and share them across CI runners. It would be beneficial for us to be able to:
If we know that the GitHub Runners are operating in the German region, we can deploy a VM with the bazel remote cache server in the same region. |
(creating this issue on @enescakir's behalf)
Summary: We'd like to introduce caching for our GitHub Actions integration. We realize that there are 3 separate problems to solve, with some overlap in engineering effort. We need to decide the order in which we implement these caches.
Docker container registry
When you build a Docker image, you push it to a registry. To use a Docker image, you pull it from a registry. DockerHub is one of the most popular Docker image registries, largely due to its default status for Docker CLI.
For instance, when you run
docker pull ubicloud/ubicloud
, DockerHub is implicitly used. If you want to use a different registry, you need to specify it, like indocker pull ghcr.io/ubicloud/ubicloud
.Notable Docker registries include DockerHub (docker.io), GitHub (ghcr.io), AWS (ecr.aws), Azure (azurecr.io), and GCP (gcr.io).
(Side note: Since you need to add the registry prefix to Docker CLI commands, short domains with .io TLDs are popular for registry domains. We might consider getting a short domain in the future.)
These registries are typically located in the US, but our runners are currently based in Europe. We ran into performance issues for two customers and found the slow intercontinental communication to be the root cause. As the image size increases, both push and pull times also increase.
Some customers use the final image as a cache for testing and benchmarking. These images are typically pushed once and pulled multiple times. We know of three customers who told us they fit this use-case.
@enescakir created a workflow to compare the pull/push performance of DockerHub and GitHub Container Registry. These tests are not highly isolated and are merely conducted to gain a general understanding. As expected, Ubicloud takes longer.
https://github.com/enescakir/large-image-benchmark/actions/runs/7561091589
https://github.com/enescakir/large-image-benchmark/actions/runs/7561545092
To solve this problem, we'd need to create a registry service. There are several alternatives for constructing such a service. We need to investigate them in more detail.
When we decide to build a service, we need to answer several questions:
Docker Build Cache (Layer caching)
Docker employs a layered architecture. Each operation in the Dockerfile generates a new layer atop the previous one. If the input from an older operation remains unchanged, Docker uses the cached layer. Consequently, when you execute
docker build
on your local machine twice, the latter run is faster due to the caching of identical layers.However, this cache cannot be directly used in CI/CD because CI/CD providers provision a new runner for each build. To address this issue,
docker buildx
offers a feature for external layer caching. It includes '--cache-to' and '--cache-from' configurations and supports various backends.ubicloud/.github/workflows/build.yml
Line 66 in 4c29c74
Each backend requires a unique implementation. The choice cannot be determined without benchmarking.
The S3 backend is the easiest to test, given our access to MinIO. We have one customer who requested a solution for Docker layer caching.
GitHub Actions Cache
GitHub Actions includes a cache feature that allows file sharing between runners. This can be accomplished using actions/cache. Furthermore, environment setup actions like actions/setup-ruby and actions/setup-go come with a built-in dependency caching feature, which internally utilizes "actions/cache".
The cache has a 7-day expiration and a 10GB limit. https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#usage-limits-and-eviction-policy
To implement this feature, we'd need to create a fork of "actions/cache" and refactor it to utilize our internal storage service instead of GitHub's.
We'd then need to fork 8 environment setup actions and refactor them to use 'ubicloud/cache' instead of the default one. There are inherent maintenance costs involved. We must subscribe to all these repositories. We have two customers who asked for this feature.
The text was updated successfully, but these errors were encountered: