cirrus: use faster VM's for integration tests #22698

Luap99 · 2024-05-14T13:44:39Z

Use 4 core VM vompred to the standard 2 cores, integration tests scale
almost linear with extra cores, as such doubling the cores makes the
tests almost twice as fast. This brings the test time down to 15-17 min
in CI.

Does this PR introduce a user-facing change?

None

openshift-ci · 2024-05-14T13:44:48Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Luap99

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [Luap99]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

packit-as-a-service · 2024-05-14T13:48:58Z

Ephemeral COPR build failed. @containers/packit-build please check.

packit-as-a-service · 2024-05-14T14:04:11Z

Cockpit tests failed for commit 79bd109. @martinpitt, @jelly, @mvollmer please check.

packit-as-a-service · 2024-05-14T14:06:52Z

Cockpit tests failed for commit f5f1f74. @martinpitt, @jelly, @mvollmer please check.

Luap99 · 2024-05-14T14:58:44Z

Timings with 4 core 4 GB

type	distro	user	DB	local	remote	container
int	rawhide	root		16:14	15:03
int	rawhide	rootless		16:15
int	fedora-40	root		15:04	16:12	15:21
int	fedora-40	rootless		15:22
int	fedora-39	root	boltdb	15:57	16:29	15:01
int	fedora-39	rootless	boltdb	15:26
int	debian-13	root		16:05	17:07
int	debian-13	rootless		15:02
sys	rawhide	root		39:08	25:51
sys	rawhide	rootless		39:00
sys	fedora-40-aarch64	root		32:06	22:14
sys	fedora-40	root		37:07	25:38
sys	fedora-40	rootless		41:12	25:08
sys	fedora-39	root	boltdb	40:46	27:28
sys	fedora-39	rootless	boltdb	39:48
sys	debian-13	root		36:28	30:08
sys	debian-13	rootless		39:33
bud	fedora-40	root		31:01	30:29
mach	fedora-40-aarch64	rootless		19:17
mach	fedora-40	rootless		15:03

int tests make great use of the extra cpu cores here 15-17 min vs 27-30 min on main right now

Luap99 · 2024-05-14T15:00:38Z

However it doesn't seem to help at all for system tests, which also makes sense as they run in sequence not in parallel so extra cores do not help

Use 4 core VM vompred to the standard 2 cores, integration tests scale almost linear with extra cores, as such doubling the cores makes the tests almost twice as fast. This brings the test time down to 15-17 min in CI. Signed-off-by: Paul Holzinger <pholzing@redhat.com>

Luap99 · 2024-05-16T11:34:09Z

I made the change to only effect the integration tasks as other tests does not seem to benifit from more cores and it would be a waste of money to use more cores there.

I leave it as draft until I get approval from @cevich.

edsantiago · 2024-05-16T12:32:16Z

The approach I've been working on (side branch) for speeding up system tests is:

invoke bats with --jobs 4 --no-parallelize-across-files. This enables parallelization within individual .bats files. However:
also need to, for start, export BATS_NO_PARALLELIZE_WITHIN_FILE=true within setup() for each individual file, then
go file by file, checking and fixing hardcoded image names ("test", "ctr", etc), and removing the NO_PARALLELIZE
in some cases like 001 which is almost entirely parallelizable except for one rmi test, just move the unparallelizable tests to a separate file
probably have to fiddle with basic_setup() and basic_teardown() to be less strict about stray containers/images
and then figure out what other problems come up, and if this is even worth it

I'm focusing on the local-registry thing today and probably next week, because I see that as higher ROI.

rhatdan · 2024-05-16T12:37:21Z

Does this cost us seriously more money? Or even Less? IE if E2E tests complete in X time for Y Dollars/time, is this less or greater then before and how much difference? If it ain't much we should make the change, or at least get Red Hat an estimate of additional cost.

Saving time on tests helps developers a great deal especially when we get near releases.

Luap99 · 2024-05-16T12:43:21Z

The approach I've been working on (side branch) for speeding up system tests is:

1. invoke bats with `--jobs 4 --no-parallelize-across-files`. This enables parallelization within individual `.bats` files. However:

2. also need to, for start, `export BATS_NO_PARALLELIZE_WITHIN_FILE=true` within `setup()` for each individual file, then

3. go file by file, checking and fixing hardcoded image names ("test", "ctr", etc), and removing the `NO_PARALLELIZE`

4. in some cases like `001` which is almost entirely parallelizable except for one `rmi` test, just move the unparallelizable tests to a separate file

5. probably have to fiddle with `basic_setup()` and `basic_teardown()` to be less strict about stray containers/images

6. and then figure out what other problems come up, and if this is even worth it

I'm focusing on the local-registry thing today and probably next week, because I see that as higher ROI.

Yes, agreed.

Note right now I focus only on e2e per the current Jira card (https://issues.redhat.com/browse/RUN-2107)
The system tests are the next step for next week and is assigned to me but I am very happy to collaborate with you on that one (https://issues.redhat.com/browse/RUN-2108)

Luap99 · 2024-05-16T12:47:31Z

Does this cost us seriously more money? Or even Less? IE if E2E tests complete in X time for Y Dollars/time, is this less or greater then before and how much difference? If it ain't much we should make the change, or at least get Red Hat an estimate of additional cost.

Saving time on tests helps developers a great deal especially when we get near releases.

I don't know how the pricing structure works, but assuming double the cores double the price we now only need half the time so it should be not significantly more expensive IMO. But that only goes for integration tests which this is about.

The pricing comment is about tests that do not scale at all see timings comment above. There is absolutely no point in running system tests on higher core count VM's right now as we would pay more for no time gain.

packit-as-a-service · 2024-05-16T12:55:15Z

Ephemeral COPR build failed. @containers/packit-build please check.

edsantiago · 2024-05-16T14:31:37Z

Latest change (47f01e8) LGTM

edsantiago · 2024-05-16T16:19:33Z

type	distro	user	DB	local	remote	container
int	rawhide	root		15:01	15:12
int	rawhide	rootless		15:25
int	fedora-40	root		16:24	15:08	15:35
int	fedora-40	rootless		14:43
int	fedora-39	root	boltdb	16:31	16:02	15:42
int	fedora-39	rootless	boltdb	16:25
int	debian-13	root		16:36	16:04
int	debian-13	rootless		13:43

int test timings are phenomenal.

/lgtm

rhatdan · 2024-05-16T18:31:33Z

@Luap99 Can you remove Draft so this will merge.

cevich · 2024-05-21T14:03:04Z

@Luap99 this LGTM.

I don't know how the pricing structure works, but assuming double the cores double the price

The time developers spend waiting for tests to complete is FAR more expensive than the VMs. Granted this is also hard to measure, since developers aren't likely just sitting on their hands waiting. In either case, there are cost-increases built-in to cloud-spend budgeting. It's incredibly rare that testing is reduced and/or cloud-costs go down magically. That's all to say, I'm not concerned at all about the cost increase by this PR.

openshift-ci bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/release-note-label-needed Enforce release-note requirement, even if just None labels May 14, 2024

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 14, 2024

Luap99 force-pushed the CI-VM branch from f5f1f74 to 79bd109 Compare May 14, 2024 13:46

Luap99 force-pushed the CI-VM branch from 79bd109 to 47f01e8 Compare May 16, 2024 11:31

Luap99 changed the title ~~WIP: gce use faster VM's~~ cirrus: use faster VM's for integration tests May 16, 2024

openshift-ci bot added release-note-none and removed do-not-merge/release-note-label-needed Enforce release-note requirement, even if just None labels May 16, 2024

openshift-ci bot assigned edsantiago May 16, 2024

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 16, 2024

Luap99 marked this pull request as ready for review May 21, 2024 14:10

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 21, 2024

openshift-merge-bot bot merged commit d85d563 into containers:main May 21, 2024
89 of 91 checks passed

Luap99 deleted the CI-VM branch May 21, 2024 14:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cirrus: use faster VM's for integration tests #22698

cirrus: use faster VM's for integration tests #22698

Luap99 commented May 14, 2024 •

edited

openshift-ci bot commented May 14, 2024

packit-as-a-service bot commented May 14, 2024

packit-as-a-service bot commented May 14, 2024

packit-as-a-service bot commented May 14, 2024

Luap99 commented May 14, 2024

Luap99 commented May 14, 2024

Luap99 commented May 16, 2024

edsantiago commented May 16, 2024

rhatdan commented May 16, 2024

Luap99 commented May 16, 2024

Luap99 commented May 16, 2024

packit-as-a-service bot commented May 16, 2024

edsantiago commented May 16, 2024

edsantiago commented May 16, 2024

rhatdan commented May 16, 2024

cevich commented May 21, 2024

cirrus: use faster VM's for integration tests #22698

cirrus: use faster VM's for integration tests #22698

Conversation

Luap99 commented May 14, 2024 • edited

Does this PR introduce a user-facing change?

openshift-ci bot commented May 14, 2024

packit-as-a-service bot commented May 14, 2024

packit-as-a-service bot commented May 14, 2024

packit-as-a-service bot commented May 14, 2024

Luap99 commented May 14, 2024

Luap99 commented May 14, 2024

Luap99 commented May 16, 2024

edsantiago commented May 16, 2024

rhatdan commented May 16, 2024

Luap99 commented May 16, 2024

Luap99 commented May 16, 2024

packit-as-a-service bot commented May 16, 2024

edsantiago commented May 16, 2024

edsantiago commented May 16, 2024

rhatdan commented May 16, 2024

cevich commented May 21, 2024

Luap99 commented May 14, 2024 •

edited