Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: schemachange/leasing-benchmark failed [azure; n2 failed to start due to connection refused error] #123947

Open
cockroach-teamcity opened this issue May 10, 2024 · 6 comments
Labels
branch-master Failures on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. T-testeng TestEng Team

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented May 10, 2024

roachtest.schemachange/leasing-benchmark failed with artifacts on master @ 16d41751607b92234351c1ab27053c3875a4f2b7:

(test_runner.go:1237).runTest: test timed out (2h0m0s)
test artifacts and logs in: /artifacts/schemachange/leasing-benchmark/cpu_arch=arm64/run_1

Parameters:

  • ROACHTEST_arch=arm64
  • ROACHTEST_cloud=azure
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=4
  • ROACHTEST_encrypted=false
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_metamorphicBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for azure clusters

/cc @cockroachdb/sql-foundations

This test on roachdash | Improve this report!

Jira issue: CRDB-38620

@cockroach-teamcity cockroach-teamcity added branch-master Failures on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) labels May 10, 2024
@rafiss
Copy link
Collaborator

rafiss commented May 10, 2024

It appears that n2 failed to ever startup, due to connectivity issues in the cluster

W240510 13:59:49.014602 15 gossip/client.go:121 ⋮ [T1,Vsystem,n2] 48  failed to start gossip client to ‹40.76.187.244:26257›: initial connection heartbeat failed: grpc: ‹connection error: desc = "transport: error while dialing: dial tcp 10.2.0.10:26257: connect: connection refused"› [code 2/Unknown]
E240510 13:59:49.014641 16 2@rpc/peer.go:598 ⋮ [T1,Vsystem,n2,rnode=?,raddr=‹40.76.187.244:26257›,class=system,rpc] 49  failed connection attempt‹ (last connected 0s ago)›: grpc: ‹connection error: desc = "transport: error while dialing: dial tcp 10.2.0.10:26257: connect: connection refused"› [code 2/Unknown]
E240510 13:59:50.010528 188 2@rpc/peer.go:598 ⋮ [T1,Vsystem,n2,rnode=?,raddr=‹40.76.187.244:26257›,class=system,rpc] 50  failed connection attempt‹ (last connected 996ms ago)›: grpc: ‹connection error: desc = "transport: error while dialing: dial tcp 10.2.0.10:26257: connect: connection refused"› [code 2/Unknown]
I240510 13:59:51.877241 273 kv/kvserver/liveness/liveness.go:648 ⋮ [T1,Vsystem,n2,liveness-hb] 51  unable to get liveness record from KV: unable to get liveness: aborted in DistSender: result is ambiguous: context deadline exceeded
I240510 13:59:52.875722 339 gossip/client.go:127 ⋮ [T1,Vsystem,n2] 52  started gossip client to n0 (‹40.76.187.244:26257›)
I240510 13:59:52.890874 143 1@server/server.go:1791 ⋮ [T1,Vsystem,n2] 53  node connected via gossip
I240510 13:59:52.891410 90 kv/kvserver/stores.go:283 ⋮ [T1,Vsystem,n2] 54  wrote 1 node addresses to persistent storage
I240510 13:59:52.891555 339 gossip/client.go:136 ⋮ [T1,Vsystem,n2] 55  closing client to n1 (‹40.76.187.244:26257›): recv msg error: grpc: ‹duplicate connection from node at 10.2.0.10:26257› [code 2/Unknown]
E240510 13:59:53.162512 315 2@rpc/peer.go:577 ⋮ [T1,Vsystem,n2,rnode=?,raddr=‹40.76.187.244:26257›,class=system,rpc] 56  disconnected (was healthy for 1.016s): grpc: ‹initial connection heartbeat failed: grpc: client requested node ID 2 doesn't match server node ID 3 [code 2/Unknown]› [code 2/Unknown]
I240510 13:59:54.878328 273 kv/kvserver/liveness/liveness.go:648 ⋮ [T1,Vsystem,n2,liveness-hb] 57  unable to get liveness record from KV: unable to get liveness: aborted in DistSender: result is ambiguous: context deadline exceeded

I'll move this to TestEng, in case this is something worth investigating in the new Azure infra. Otherwise, feel free to close this as a non-actionable flake.

@rafiss rafiss changed the title roachtest: schemachange/leasing-benchmark failed roachtest: schemachange/leasing-benchmark failed [azure; n2 failed to start due to connection refused error] May 10, 2024
@rafiss rafiss added T-testeng TestEng Team and removed release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) labels May 10, 2024
Copy link

blathers-crl bot commented May 10, 2024

cc @cockroachdb/test-eng

@blathers-crl blathers-crl bot added this to Triage in Test Engineering May 10, 2024
@cockroach-teamcity
Copy link
Member Author

roachtest.schemachange/leasing-benchmark failed with artifacts on master @ 4c2e7761acd050aaee565443932b6b0eca55620b:

(test_runner.go:1237).runTest: test timed out (2h0m0s)
test artifacts and logs in: /artifacts/schemachange/leasing-benchmark/cpu_arch=arm64/run_1

Parameters:

  • ROACHTEST_arch=arm64
  • ROACHTEST_cloud=azure
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=4
  • ROACHTEST_encrypted=false
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_metamorphicBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for azure clusters

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.schemachange/leasing-benchmark failed with artifacts on master @ 4cc0bfcc14771331fea57de01e1ea78b07393f3d:

(test_runner.go:1237).runTest: test timed out (2h0m0s)
test artifacts and logs in: /artifacts/schemachange/leasing-benchmark/cpu_arch=arm64/run_1

Parameters:

  • ROACHTEST_arch=arm64
  • ROACHTEST_cloud=azure
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=4
  • ROACHTEST_encrypted=false
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_metamorphicBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for azure clusters

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.schemachange/leasing-benchmark failed with artifacts on master @ 6300c3c3367ad46ac48bf24915cf0d73cae446a0:

(test_runner.go:1243).runTest: test timed out (2h0m0s)
test artifacts and logs in: /artifacts/schemachange/leasing-benchmark/cpu_arch=arm64/run_1

Parameters:

  • ROACHTEST_arch=arm64
  • ROACHTEST_cloud=azure
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=4
  • ROACHTEST_encrypted=false
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_metamorphicBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for azure clusters

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.schemachange/leasing-benchmark failed with artifacts on master @ d146ecff6f687e438706cf63591cafca60cc116d:

(test_runner.go:1253).runTest: test timed out (2h0m0s)
test artifacts and logs in: /artifacts/schemachange/leasing-benchmark/cpu_arch=arm64/run_1

Parameters:

  • ROACHTEST_arch=arm64
  • ROACHTEST_cloud=azure
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=4
  • ROACHTEST_encrypted=false
  • ROACHTEST_fs=ext4
  • ROACHTEST_localSSD=true
  • ROACHTEST_metamorphicBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for azure clusters

Same failure on other branches

This test on roachdash | Improve this report!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. T-testeng TestEng Team
Projects
SQL Foundations
  
Triage
Development

No branches or pull requests

2 participants