Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 6435: System Test Logging #6445

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

co-jo
Copy link
Member

@co-jo co-jo commented Nov 23, 2021

Change log description

  • Adds log to print the set of active pods for some service after a scale event.
  • Makes waitUntilPodIsRunning wait for both the expected amount of active pods be available, as well as ensuring no pods remain in a terminated state, i.e. are properly removed.
  • Allow one to specify the number of tries to attempt for waitUntilPodIsRunning.

Purpose of the change

Fixes #6435

What the code does

See #6435.

How to verify it

Run system tests.

co-jo and others added 10 commits November 9, 2021 15:45
Signed-off-by: Colin Hryniowski <colin.hryniowski@dell.com>
Signed-off-by: Colin Hryniowski <colin.hryniowski@dell.com>
Signed-off-by: Colin Hryniowski <colin.hryniowski@dell.com>
Signed-off-by: Colin Hryniowski <colin.hryniowski@dell.com>
Signed-off-by: Colin Hryniowski <colin.hryniowski@dell.com>
…d up.

Signed-off-by: Colin Hryniowski <colin.hryniowski@dell.com>
Signed-off-by: Colin Hryniowski <colin.hryniowski@dell.com>
@codecov
Copy link

codecov bot commented Nov 29, 2021

Codecov Report

Merging #6445 (5150c7d) into master (2920577) will decrease coverage by 0.00%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #6445      +/-   ##
============================================
- Coverage     86.26%   86.26%   -0.01%     
+ Complexity    15434    15431       -3     
============================================
  Files          1008     1008              
  Lines         57760    57760              
  Branches       5885     5885              
============================================
- Hits          49828    49826       -2     
  Misses         4870     4870              
- Partials       3062     3064       +2     
Impacted Files Coverage Δ
...hared/security/crypto/StrongPasswordProcessor.java 87.50% <0.00%> (-6.25%) ⬇️
...ain/java/io/pravega/client/stream/impl/Pinger.java 94.82% <0.00%> (-3.45%) ⬇️
...egmentstore/server/containers/MetadataCleaner.java 87.32% <0.00%> (-2.82%) ⬇️
...io/pravega/common/concurrent/DelayedProcessor.java 89.02% <0.00%> (-1.22%) ⬇️
...avega/controller/server/ControllerServiceMain.java 80.76% <0.00%> (-0.97%) ⬇️
...o/pravega/client/stream/impl/ReaderGroupState.java 93.30% <0.00%> (-0.91%) ⬇️
...mentstore/server/host/stat/AutoScaleProcessor.java 86.95% <0.00%> (-0.73%) ⬇️
...tore/server/containers/StreamSegmentContainer.java 90.00% <0.00%> (-0.45%) ⬇️
.../segmentstore/server/writer/SegmentAggregator.java 80.17% <0.00%> (-0.29%) ⬇️
...main/java/io/pravega/storage/hdfs/HDFSStorage.java 73.33% <0.00%> (+0.31%) ⬆️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2920577...5150c7d. Read the comment docs.

Copy link
Contributor

@RaulGracia RaulGracia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments. Please, let us know the result of multiple runs of system tests and post here the output of the new log messages to demonstrate that it meets the expected goal.

long runCount = pods.getItems().stream().map(pod -> pod.getStatus()).filter(this::isPodRunning).count();
long totalCount = pods.getItems().size();

if (runCount == totalCount && runCount == expectedPodCount) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be the same as saying totalCount == expectedPodCount?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because the totalCount will not always be the same value as the runCount. If we are expecting 3 pods to be running, but only have two running pods and another pod that has been terminated (another one has yet to be spun up, and the terminated pod yet to be cleaned up) we will still have a totalCount of 3, an expectedPodCount of 3 and a runCount of 2.

.stream()
.map(pod -> pod.getMetadata().getName())
.collect(Collectors.toList());
log.info("Set of running pods after scale event: {}", names);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the output of this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This prints the names of the pods that belong to the current set of running pods for some resource (i.e. Pravega or BookKeeper cluster). I found this to be useful information to have readily available to make parsing through the logs (our fluent-bit ones) easier. It helps to know what the set of active pods were during some given system test.

Signed-off-by: Colin Hryniowski <colin.hryniowski@dell.com>
Copy link
Contributor

@shrids shrids left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one minor comment regarding the log line.

// Break out of loop.
retriesRemaining.set(0);
} else if (retriesRemaining.get() == 0) {
log.error("Retries exhausted waiting for pod(s): <{}={}>", labelName, labelValue);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Can we print the expected count and the current count here in the same log line ? This will simplify debugging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Scaling System Test Logging Improvements.
3 participants