Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

streamingccl: mark cutback retention jobs as successful #123934

Merged
merged 1 commit into from May 13, 2024

Conversation

dt
Copy link
Member

@dt dt commented May 10, 2024

Previously we started creating a stream producer job in the destination cluster when we completed replication cutover, to preserve the history as of that cutover time in case the another cluster would subsequently want to start replicating as of that time, e.g. reversing the direction of replication, or in case the promoted cluster would want to revert to the cutover time as part of a demotion back to a standby.

However, this placeholder job is, by design, never actually used by replication -- it exists only to keep the option open for some other replication job to be started -- and thus is never heartbeated or marked as no longer needed due to successful completion of replication, causing it to be marked as FAILED when it expires.

This changes the initial status so that it is created already indicating that replication succeeded. Thus when it expires, it is marked as successful instead of failed, avoiding the spurious 'failures' that one observes in the job system surfaces.

Release note (enterprise change): History Retention jobs created at the completion of cluster replication no longer erroneously indicate they failed when the expire.

Epic: none.

Previously we started creating a stream producer job in the destination
cluster when we completed replication cutover, to preserve the history
as of that cutover time in case the another cluster would subsequently
want to start replicating as of that time, e.g. reversing the direction
of replication, or in case the promoted cluster would want to revert to
the cutover time as part of a demotion back to a standby.

However, this placeholder job is, by design, never actually used by
replication -- it exists only to keep the option open for some other
replication job to be started -- and thus is never heartbeated or marked
as no longer needed due to successful completion of replication, causing
it to be marked as FAILED when it expires.

This changes the initial status so that it is created already indicating
that replication succeeded. Thus when it expires, it is marked as
successful instead of failed, avoiding the spurious 'failures' that one
observes in the job system surfaces.

Release note (enterprise change): History Retention jobs created at the
completion of cluster replication no longer erroneously indicate they
failed when the expire.

Epic: none.
@dt dt requested review from stevendanna and msbutler May 10, 2024 12:33
@dt dt requested a review from a team as a code owner May 10, 2024 12:33
@cockroach-teamcity

This comment was marked as off-topic.

@msbutler msbutler added the backport-24.1.x Flags PRs that need to be backported to 24.1. label May 13, 2024
@dt
Copy link
Member Author

dt commented May 13, 2024

TFTR!

bors r+

@craig craig bot merged commit c179d3f into cockroachdb:master May 13, 2024
22 checks passed
@dt dt deleted the pcr-cutback-sucess branch May 13, 2024 17:35
@dt
Copy link
Member Author

dt commented May 13, 2024

blathers backport 24.1

Copy link

blathers-crl bot commented May 13, 2024

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating backport branch refs/heads/blathers/backport-release-24.1-123934: POST https://api.github.com/repos/cockroachdb/cockroach/git/refs: 422 Reference already exists []

Backport to branch 24.1 failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

msbutler added a commit to msbutler/cockroach that referenced this pull request May 14, 2024
As of cockroachdb#123934, the producer job succeeds instead of fails. This patch teaches
some test infra about this.

Fixes cockroachdb#124139
Fixes cockroachdb#124138
Fixes cockroachdb#124151
Fixes cockroachdb#124137

Release note: none
msbutler added a commit to msbutler/cockroach that referenced this pull request May 14, 2024
As of cockroachdb#123934, the producer job succeeds instead of fails. This patch teaches
some test infra about this.

Fixes cockroachdb#124139
Fixes cockroachdb#124138
Fixes cockroachdb#124151
Fixes cockroachdb#124137

Release note: none
craig bot pushed a commit that referenced this pull request May 14, 2024
124162: streamingccl: deflake a few tests r=msbutler a=msbutler

As of #123934, the producer job succeeds instead of fails. This patch teaches some test infra about this.

Fixes #124139
Fixes #124138
Fixes #124151
Fixes #124137

Release note: none

Co-authored-by: Michael Butler <butler@cockroachlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-24.1.x Flags PRs that need to be backported to 24.1.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants