Skip to content

nodetool status error 500 error event when running on a decommissioned node of GrowShrinkCluster #11917

@yarongilor

Description

@yarongilor

Packages

Scylla version: 2025.4.0~dev-20250827.01bb7b629ad9 with build-id 6322d8df63c89dc033a079304733fd6702ed869f

Kernel Version: 6.11.0-1018-azure

Issue description

The GrowShrinkCluster decommissioned node-7:

< t:2025-08-29 02:32:15,842 f:nemesis.py      l:4276 c:sdcm.nemesis         p:INFO  > sdcm.nemesis.SisyphusMonkey: Start shrink cluster by 3 nodes
< t:2025-08-29 02:32:15,861 f:nemesis.py      l:400  c:sdcm.nemesis         p:INFO  > sdcm.nemesis.SisyphusMonkey: GrowShrinkCluster-f6488d8e: target node selected by allocator - Node longevity-10gb-3h-master-db-node-a2883744-eastus-6 [None | 10.0.0.10] (rack: RACK0)
< t:2025-08-29 02:32:16,057 f:nemesis.py      l:400  c:sdcm.nemesis         p:INFO  > sdcm.nemesis.SisyphusMonkey: GrowShrinkCluster-f6488d8e: target node selected by allocator - Node longevity-10gb-3h-master-db-node-a2883744-eastus-8 [None | 10.0.0.15] (rack: RACK1)
< t:2025-08-29 02:32:16,247 f:nemesis.py      l:400  c:sdcm.nemesis         p:INFO  > sdcm.nemesis.SisyphusMonkey: GrowShrinkCluster-f6488d8e: target node selected by allocator - Node longevity-10gb-3h-master-db-node-a2883744-eastus-7 [None | 10.0.0.14] (rack: RACK2)
< t:2025-08-29 02:32:23,685 f:remote_base.py  l:598  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.0.0.14>: Running command "echo "/usr/bin/nodetool  decommission " > /tmp/remoter_cmd_f36ac5a4-1889-40d4-9baa-487154f31e8d.sh"...
< t:2025-08-29 02:32:23,696 f:remote_base.py  l:598  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.0.0.10>: Running command "echo "/usr/bin/nodetool  decommission " > /tmp/remoter_cmd_f1f87ea7-ddc3-446e-b982-f369e066e152.sh"...
< t:2025-08-29 02:32:23,741 f:base.py         l:147  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.0.0.14>: Command "echo "/usr/bin/nodetool  decommission " > /tmp/remoter_cmd_f36ac5a4-1889-40d4-9baa-487154f31e8d.sh" finished with status 0
< t:2025-08-29 02:32:23,884 f:remote_base.py  l:598  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.0.0.15>: Running command "echo "/usr/bin/nodetool  decommission " > /tmp/remoter_cmd_acb2ceb0-58c0-4352-bb4e-f20a7bc3bcfe.sh"...
< t:2025-08-29 02:32:24,242 f:base.py         l:147  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.0.0.10>: Command "echo "/usr/bin/nodetool  decommission " > /tmp/remoter_cmd_f1f87ea7-ddc3-446e-b982-f369e066e152.sh" finished with status 0
< t:2025-08-29 02:32:24,466 f:base.py         l:147  c:RemoteLibSSH2CmdRunner p:DEBUG > <10.0.0.15>: Command "echo "/usr/bin/nodetool  decommission " > /tmp/remoter_cmd_acb2ceb0-58c0-4352-bb4e-f20a7bc3bcfe.sh" finished with status 0
< t:2025-08-29 02:33:07,641 f:cluster.py      l:2733 c:sdcm.cluster_azure   p:DEBUG > Node longevity-10gb-3h-master-db-node-a2883744-eastus-8 [None | 10.0.0.15] (rack: RACK1): Command '/usr/bin/nodetool  decommission ' duration -> 42.854328472001725 s
< t:2025-08-29 02:33:53,291 f:cluster.py      l:2733 c:sdcm.cluster_azure   p:DEBUG > Node longevity-10gb-3h-master-db-node-a2883744-eastus-7 [None | 10.0.0.14] (rack: RACK2): Command '/usr/bin/nodetool  decommission ' duration -> 89.28596406500037 s
< t:2025-08-29 02:34:05,254 f:cluster.py      l:2733 c:sdcm.cluster_azure   p:DEBUG > Node longevity-10gb-3h-master-db-node-a2883744-eastus-6 [None | 10.0.0.10] (rack: RACK0): Command '/usr/bin/nodetool  decommission ' duration -> 100.68179145699833 s
< t:2025-08-29 02:35:45,574 f:db_log_reader.py l:123  c:sdcm.db_log_reader   p:DEBUG > 2025-08-29T02:32:24.156+00:00 longevity-10gb-3h-master-db-node-a2883744-eastus-8     !INFO | scylla-cluster-tests[8845]: executing nodetool /usr/bin/nodetool  decommission  on longevity-10gb-3h-master-db-node-a2883744-eastus-8 [10.0.0.15]
< t:2025-08-29 02:35:56,355 f:db_log_reader.py l:123  c:sdcm.db_log_reader   p:DEBUG > 2025-08-29T02:32:23.920+00:00 longevity-10gb-3h-master-db-node-a2883744-eastus-7     !INFO | scylla-cluster-tests[10438]: executing nodetool /usr/bin/nodetool  decommission  on longevity-10gb-3h-master-db-node-a2883744-eastus-7 [10.0.0.14]
< t:2025-08-29 02:36:07,080 f:db_log_reader.py l:123  c:sdcm.db_log_reader   p:DEBUG > 2025-08-29T02:32:23.953+00:00 longevity-10gb-3h-master-db-node-a2883744-eastus-6     !INFO | scylla-cluster-tests[11789]: executing nodetool /usr/bin/nodetool  decommission  on longevity-10gb-3h-master-db-node-a2883744-eastus-6 [10.0.0.10]
< t:2025-08-29 02:41:56,832 f:nemesis.py      l:4288 c:sdcm.nemesis         p:INFO  > sdcm.nemesis.SisyphusMonkey: Cluster shrink finished. Current number of data nodes 6

node-7 decommission ended at 02:33:53:

< t:2025-08-29 02:47:28,732 f:db_log_reader.py l:123  c:sdcm.db_log_reader   p:DEBUG > 2025-08-29T02:33:53.601+00:00 longevity-10gb-3h-master-db-node-a2883744-eastus-7     !INFO | scylla-cluster-tests[11372]: nodetool /usr/bin/nodetool  decommission  completed after 89.61s on longevity-10gb-3h-master-db-node-a2883744-eastus-7 [10.0.0.14]

The nodetool status error happened at 02:34:06:

< t:2025-08-29 02:47:28,862 f:db_log_reader.py l:123  c:sdcm.db_log_reader   p:DEBUG > 2025-08-29T02:34:06.852+00:00 longevity-10gb-3h-master-db-node-a2883744-eastus-7     !INFO | scylla-cluster-tests[11394]: executing nodetool /usr/bin/nodetool  status  on longevity-10gb-3h-master-db-node-a2883744-eastus-7 [10.0.0.14]
< t:2025-08-29 02:47:28,864 f:db_log_reader.py l:123  c:sdcm.db_log_reader   p:DEBUG > 2025-08-29T02:34:07.852+00:00 longevity-10gb-3h-master-db-node-a2883744-eastus-7     !INFO | scylla-cluster-tests[11401]: nodetool /usr/bin/nodetool  status  failed after 0.88s on longevity-10gb-3h-master-db-node-a2883744-eastus-7 [10.0.0.14] - Error: Encountered a bad command exit code!
< t:2025-08-29 02:47:28,879 f:db_log_reader.py l:123  c:sdcm.db_log_reader   p:DEBUG > Command: '/usr/bin/nodetool  status '
< t:2025-08-29 02:47:28,879 f:db_log_reader.py l:123  c:sdcm.db_log_reader   p:DEBUG > error executing GET request to http://localhost:10000/storage_service/host_id with parameters {}: remote replied with status code 500 Internal Server Error:
< t:2025-08-29 02:47:28,880 f:db_log_reader.py l:123  c:sdcm.db_log_reader   p:DEBUG > std::runtime_error (The gossiper is not ready yet)

Impact

Describe the impact this issue causes to the user.

How frequently does it reproduce?

Describe the frequency with how this issue can be reproduced.

Installation details

Cluster size: 6 nodes (Standard_L8s_v3)

Scylla Nodes used in this run:

- longevity-10gb-3h-master-db-node-a2883744-eastus-9 ( | 10.0.0.16) (shards: 7)


- longevity-10gb-3h-master-db-node-a2883744-eastus-8 ( | 10.0.0.15) (shards: 7)


- longevity-10gb-3h-master-db-node-a2883744-eastus-7 ( | 10.0.0.14) (shards: 7)


- longevity-10gb-3h-master-db-node-a2883744-eastus-6 ( | 10.0.0.10) (shards: 7)


- longevity-10gb-3h-master-db-node-a2883744-eastus-5 ( | 10.0.0.9) (shards: 7)


- longevity-10gb-3h-master-db-node-a2883744-eastus-4 ( | 10.0.0.8) (shards: 7)


- longevity-10gb-3h-master-db-node-a2883744-eastus-3 ( | 10.0.0.7) (shards: 7)


- longevity-10gb-3h-master-db-node-a2883744-eastus-2 ( | 10.0.0.6) (shards: 7)


- longevity-10gb-3h-master-db-node-a2883744-eastus-12 ( | 10.0.0.19) (shards: 7)


- longevity-10gb-3h-master-db-node-a2883744-eastus-11 ( | 10.0.0.18) (shards: 7)


- longevity-10gb-3h-master-db-node-a2883744-eastus-10 ( | 10.0.0.17) (shards: 7)


- longevity-10gb-3h-master-db-node-a2883744-eastus-1 ( | 10.0.0.5) (shards: 7)

OS / Image: /subscriptions/6c268694-47ab-43ab-b306-3c5514bc4112/resourceGroups/scylla-images/providers/Microsoft.Compute/images/scylla-2025.4.0-dev-x86_64-2025-08-29T01-58-29 (azure: N/A)

Test: longevity-10gb-3h-azure-test
Test id: a2883744-7196-4b5f-a8dd-ea027b252f5e
Test name: scylla-master/longevity/longevity-10gb-3h-azure-test

Test method: `longevity_test.LongevityTest.test_custom_time`

Test config file(s):

Logs and commands
  • Restore Monitor Stack command: $ hydra investigate show-monitor a2883744-7196-4b5f-a8dd-ea027b252f5e
  • Restore monitor on AWS instance using Jenkins job
  • Show all stored logs command: $ hydra investigate show-logs a2883744-7196-4b5f-a8dd-ea027b252f5e

Logs:

- **[longevity-10gb-3h-master-db-node-a2883744-eastus-2](https://argus.scylladb.com/api/v1/tests/scylla-cluster-tests/a2883744-7196-4b5f-a8dd-ea027b252f5e/log/longevity-10gb-3h-master-db-node-a2883744-eastus-2/download)**


- **[longevity-10gb-3h-master-db-node-a2883744-eastus-1](https://argus.scylladb.com/api/v1/tests/scylla-cluster-tests/a2883744-7196-4b5f-a8dd-ea027b252f5e/log/longevity-10gb-3h-master-db-node-a2883744-eastus-1/download)**


- **[longevity-10gb-3h-master-db-node-a2883744-eastus-3](https://argus.scylladb.com/api/v1/tests/scylla-cluster-tests/a2883744-7196-4b5f-a8dd-ea027b252f5e/log/longevity-10gb-3h-master-db-node-a2883744-eastus-3/download)**


- **[longevity-10gb-3h-master-db-node-a2883744-eastus-8](https://argus.scylladb.com/api/v1/tests/scylla-cluster-tests/a2883744-7196-4b5f-a8dd-ea027b252f5e/log/longevity-10gb-3h-master-db-node-a2883744-eastus-8/download)**


- **[longevity-10gb-3h-master-db-node-a2883744-eastus-7](https://argus.scylladb.com/api/v1/tests/scylla-cluster-tests/a2883744-7196-4b5f-a8dd-ea027b252f5e/log/longevity-10gb-3h-master-db-node-a2883744-eastus-7/download)**


- **[longevity-10gb-3h-master-db-node-a2883744-eastus-6](https://argus.scylladb.com/api/v1/tests/scylla-cluster-tests/a2883744-7196-4b5f-a8dd-ea027b252f5e/log/longevity-10gb-3h-master-db-node-a2883744-eastus-6/download)**


- **[db-cluster-a2883744.tar.zst](https://argus.scylladb.com/api/v1/tests/scylla-cluster-tests/a2883744-7196-4b5f-a8dd-ea027b252f5e/log/db-cluster-a2883744.tar.zst/download)**


- **[schema-logs-a2883744.tar.zst](https://argus.scylladb.com/api/v1/tests/scylla-cluster-tests/a2883744-7196-4b5f-a8dd-ea027b252f5e/log/schema-logs-a2883744.tar.zst/download)**


- **[sct-runner-events-a2883744.tar.zst](https://argus.scylladb.com/api/v1/tests/scylla-cluster-tests/a2883744-7196-4b5f-a8dd-ea027b252f5e/log/sct-runner-events-a2883744.tar.zst/download)**


- **[sct-a2883744.log.tar.zst](https://argus.scylladb.com/api/v1/tests/scylla-cluster-tests/a2883744-7196-4b5f-a8dd-ea027b252f5e/log/sct-a2883744.log.tar.zst/download)**


- **[loader-set-a2883744.tar.zst](https://argus.scylladb.com/api/v1/tests/scylla-cluster-tests/a2883744-7196-4b5f-a8dd-ea027b252f5e/log/loader-set-a2883744.tar.zst/download)**


- **[monitor-set-a2883744.tar.zst](https://argus.scylladb.com/api/v1/tests/scylla-cluster-tests/a2883744-7196-4b5f-a8dd-ea027b252f5e/log/monitor-set-a2883744.tar.zst/download)**


- **[parallel-timelines-report-a2883744.tar.zst](https://argus.scylladb.com/api/v1/tests/scylla-cluster-tests/a2883744-7196-4b5f-a8dd-ea027b252f5e/log/parallel-timelines-report-a2883744.tar.zst/download)**


- **[builder-a2883744.log.tar.gz](https://argus.scylladb.com/api/v1/tests/scylla-cluster-tests/a2883744-7196-4b5f-a8dd-ea027b252f5e/log/builder-a2883744.log.tar.gz/download)**

Jenkins job URL
Argus

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions