-
Notifications
You must be signed in to change notification settings - Fork 108
Description
Packages
Scylla version: 2025.4.0~dev-20250827.01bb7b629ad9
with build-id 6322d8df63c89dc033a079304733fd6702ed869f
Kernel Version: 6.11.0-1018-azure
Issue description
- This issue is a regression.
- It is unknown if this issue is a regression.
A followup of command: nodetool status failed with 500 internal after binary and gossip was disabled scylladb#24670 (comment)
GrowShrinkCluster nemesis shrinks it's new added nodes.
node-7 finished it's decommission right at the time of nodetool status command.
So this is an undefined scylla behavior on a decommissioned node, that should be filtered.
The GrowShrinkCluster
decommissioned node-7:
< t:2025-08-29 02:32:15,842 f:nemesis.py l:4276 c:sdcm.nemesis p:INFO > sdcm.nemesis.SisyphusMonkey: Start shrink cluster by 3 nodes
< t:2025-08-29 02:32:15,861 f:nemesis.py l:400 c:sdcm.nemesis p:INFO > sdcm.nemesis.SisyphusMonkey: GrowShrinkCluster-f6488d8e: target node selected by allocator - Node longevity-10gb-3h-master-db-node-a2883744-eastus-6 [None | 10.0.0.10] (rack: RACK0)
< t:2025-08-29 02:32:16,057 f:nemesis.py l:400 c:sdcm.nemesis p:INFO > sdcm.nemesis.SisyphusMonkey: GrowShrinkCluster-f6488d8e: target node selected by allocator - Node longevity-10gb-3h-master-db-node-a2883744-eastus-8 [None | 10.0.0.15] (rack: RACK1)
< t:2025-08-29 02:32:16,247 f:nemesis.py l:400 c:sdcm.nemesis p:INFO > sdcm.nemesis.SisyphusMonkey: GrowShrinkCluster-f6488d8e: target node selected by allocator - Node longevity-10gb-3h-master-db-node-a2883744-eastus-7 [None | 10.0.0.14] (rack: RACK2)
< t:2025-08-29 02:32:23,685 f:remote_base.py l:598 c:RemoteLibSSH2CmdRunner p:DEBUG > <10.0.0.14>: Running command "echo "/usr/bin/nodetool decommission " > /tmp/remoter_cmd_f36ac5a4-1889-40d4-9baa-487154f31e8d.sh"...
< t:2025-08-29 02:32:23,696 f:remote_base.py l:598 c:RemoteLibSSH2CmdRunner p:DEBUG > <10.0.0.10>: Running command "echo "/usr/bin/nodetool decommission " > /tmp/remoter_cmd_f1f87ea7-ddc3-446e-b982-f369e066e152.sh"...
< t:2025-08-29 02:32:23,741 f:base.py l:147 c:RemoteLibSSH2CmdRunner p:DEBUG > <10.0.0.14>: Command "echo "/usr/bin/nodetool decommission " > /tmp/remoter_cmd_f36ac5a4-1889-40d4-9baa-487154f31e8d.sh" finished with status 0
< t:2025-08-29 02:32:23,884 f:remote_base.py l:598 c:RemoteLibSSH2CmdRunner p:DEBUG > <10.0.0.15>: Running command "echo "/usr/bin/nodetool decommission " > /tmp/remoter_cmd_acb2ceb0-58c0-4352-bb4e-f20a7bc3bcfe.sh"...
< t:2025-08-29 02:32:24,242 f:base.py l:147 c:RemoteLibSSH2CmdRunner p:DEBUG > <10.0.0.10>: Command "echo "/usr/bin/nodetool decommission " > /tmp/remoter_cmd_f1f87ea7-ddc3-446e-b982-f369e066e152.sh" finished with status 0
< t:2025-08-29 02:32:24,466 f:base.py l:147 c:RemoteLibSSH2CmdRunner p:DEBUG > <10.0.0.15>: Command "echo "/usr/bin/nodetool decommission " > /tmp/remoter_cmd_acb2ceb0-58c0-4352-bb4e-f20a7bc3bcfe.sh" finished with status 0
< t:2025-08-29 02:33:07,641 f:cluster.py l:2733 c:sdcm.cluster_azure p:DEBUG > Node longevity-10gb-3h-master-db-node-a2883744-eastus-8 [None | 10.0.0.15] (rack: RACK1): Command '/usr/bin/nodetool decommission ' duration -> 42.854328472001725 s
< t:2025-08-29 02:33:53,291 f:cluster.py l:2733 c:sdcm.cluster_azure p:DEBUG > Node longevity-10gb-3h-master-db-node-a2883744-eastus-7 [None | 10.0.0.14] (rack: RACK2): Command '/usr/bin/nodetool decommission ' duration -> 89.28596406500037 s
< t:2025-08-29 02:34:05,254 f:cluster.py l:2733 c:sdcm.cluster_azure p:DEBUG > Node longevity-10gb-3h-master-db-node-a2883744-eastus-6 [None | 10.0.0.10] (rack: RACK0): Command '/usr/bin/nodetool decommission ' duration -> 100.68179145699833 s
< t:2025-08-29 02:35:45,574 f:db_log_reader.py l:123 c:sdcm.db_log_reader p:DEBUG > 2025-08-29T02:32:24.156+00:00 longevity-10gb-3h-master-db-node-a2883744-eastus-8 !INFO | scylla-cluster-tests[8845]: executing nodetool /usr/bin/nodetool decommission on longevity-10gb-3h-master-db-node-a2883744-eastus-8 [10.0.0.15]
< t:2025-08-29 02:35:56,355 f:db_log_reader.py l:123 c:sdcm.db_log_reader p:DEBUG > 2025-08-29T02:32:23.920+00:00 longevity-10gb-3h-master-db-node-a2883744-eastus-7 !INFO | scylla-cluster-tests[10438]: executing nodetool /usr/bin/nodetool decommission on longevity-10gb-3h-master-db-node-a2883744-eastus-7 [10.0.0.14]
< t:2025-08-29 02:36:07,080 f:db_log_reader.py l:123 c:sdcm.db_log_reader p:DEBUG > 2025-08-29T02:32:23.953+00:00 longevity-10gb-3h-master-db-node-a2883744-eastus-6 !INFO | scylla-cluster-tests[11789]: executing nodetool /usr/bin/nodetool decommission on longevity-10gb-3h-master-db-node-a2883744-eastus-6 [10.0.0.10]
< t:2025-08-29 02:41:56,832 f:nemesis.py l:4288 c:sdcm.nemesis p:INFO > sdcm.nemesis.SisyphusMonkey: Cluster shrink finished. Current number of data nodes 6
node-7 decommission ended at 02:33:53
:
< t:2025-08-29 02:47:28,732 f:db_log_reader.py l:123 c:sdcm.db_log_reader p:DEBUG > 2025-08-29T02:33:53.601+00:00 longevity-10gb-3h-master-db-node-a2883744-eastus-7 !INFO | scylla-cluster-tests[11372]: nodetool /usr/bin/nodetool decommission completed after 89.61s on longevity-10gb-3h-master-db-node-a2883744-eastus-7 [10.0.0.14]
The nodetool status error happened at 02:34:06
:
< t:2025-08-29 02:47:28,862 f:db_log_reader.py l:123 c:sdcm.db_log_reader p:DEBUG > 2025-08-29T02:34:06.852+00:00 longevity-10gb-3h-master-db-node-a2883744-eastus-7 !INFO | scylla-cluster-tests[11394]: executing nodetool /usr/bin/nodetool status on longevity-10gb-3h-master-db-node-a2883744-eastus-7 [10.0.0.14]
< t:2025-08-29 02:47:28,864 f:db_log_reader.py l:123 c:sdcm.db_log_reader p:DEBUG > 2025-08-29T02:34:07.852+00:00 longevity-10gb-3h-master-db-node-a2883744-eastus-7 !INFO | scylla-cluster-tests[11401]: nodetool /usr/bin/nodetool status failed after 0.88s on longevity-10gb-3h-master-db-node-a2883744-eastus-7 [10.0.0.14] - Error: Encountered a bad command exit code!
< t:2025-08-29 02:47:28,879 f:db_log_reader.py l:123 c:sdcm.db_log_reader p:DEBUG > Command: '/usr/bin/nodetool status '
< t:2025-08-29 02:47:28,879 f:db_log_reader.py l:123 c:sdcm.db_log_reader p:DEBUG > error executing GET request to http://localhost:10000/storage_service/host_id with parameters {}: remote replied with status code 500 Internal Server Error:
< t:2025-08-29 02:47:28,880 f:db_log_reader.py l:123 c:sdcm.db_log_reader p:DEBUG > std::runtime_error (The gossiper is not ready yet)
Impact
Describe the impact this issue causes to the user.
How frequently does it reproduce?
Describe the frequency with how this issue can be reproduced.
Installation details
Cluster size: 6 nodes (Standard_L8s_v3)
Scylla Nodes used in this run:
- longevity-10gb-3h-master-db-node-a2883744-eastus-9 ( | 10.0.0.16) (shards: 7)
- longevity-10gb-3h-master-db-node-a2883744-eastus-8 ( | 10.0.0.15) (shards: 7)
- longevity-10gb-3h-master-db-node-a2883744-eastus-7 ( | 10.0.0.14) (shards: 7)
- longevity-10gb-3h-master-db-node-a2883744-eastus-6 ( | 10.0.0.10) (shards: 7)
- longevity-10gb-3h-master-db-node-a2883744-eastus-5 ( | 10.0.0.9) (shards: 7)
- longevity-10gb-3h-master-db-node-a2883744-eastus-4 ( | 10.0.0.8) (shards: 7)
- longevity-10gb-3h-master-db-node-a2883744-eastus-3 ( | 10.0.0.7) (shards: 7)
- longevity-10gb-3h-master-db-node-a2883744-eastus-2 ( | 10.0.0.6) (shards: 7)
- longevity-10gb-3h-master-db-node-a2883744-eastus-12 ( | 10.0.0.19) (shards: 7)
- longevity-10gb-3h-master-db-node-a2883744-eastus-11 ( | 10.0.0.18) (shards: 7)
- longevity-10gb-3h-master-db-node-a2883744-eastus-10 ( | 10.0.0.17) (shards: 7)
- longevity-10gb-3h-master-db-node-a2883744-eastus-1 ( | 10.0.0.5) (shards: 7)
OS / Image: /subscriptions/6c268694-47ab-43ab-b306-3c5514bc4112/resourceGroups/scylla-images/providers/Microsoft.Compute/images/scylla-2025.4.0-dev-x86_64-2025-08-29T01-58-29
(azure: N/A)
Test: longevity-10gb-3h-azure-test
Test id: a2883744-7196-4b5f-a8dd-ea027b252f5e
Test name: scylla-master/longevity/longevity-10gb-3h-azure-test
Test method: `longevity_test.LongevityTest.test_custom_time`
Test config file(s):
Logs and commands
- Restore Monitor Stack command:
$ hydra investigate show-monitor a2883744-7196-4b5f-a8dd-ea027b252f5e
- Restore monitor on AWS instance using Jenkins job
- Show all stored logs command:
$ hydra investigate show-logs a2883744-7196-4b5f-a8dd-ea027b252f5e
Logs:
- **[longevity-10gb-3h-master-db-node-a2883744-eastus-2](https://argus.scylladb.com/api/v1/tests/scylla-cluster-tests/a2883744-7196-4b5f-a8dd-ea027b252f5e/log/longevity-10gb-3h-master-db-node-a2883744-eastus-2/download)**
- **[longevity-10gb-3h-master-db-node-a2883744-eastus-1](https://argus.scylladb.com/api/v1/tests/scylla-cluster-tests/a2883744-7196-4b5f-a8dd-ea027b252f5e/log/longevity-10gb-3h-master-db-node-a2883744-eastus-1/download)**
- **[longevity-10gb-3h-master-db-node-a2883744-eastus-3](https://argus.scylladb.com/api/v1/tests/scylla-cluster-tests/a2883744-7196-4b5f-a8dd-ea027b252f5e/log/longevity-10gb-3h-master-db-node-a2883744-eastus-3/download)**
- **[longevity-10gb-3h-master-db-node-a2883744-eastus-8](https://argus.scylladb.com/api/v1/tests/scylla-cluster-tests/a2883744-7196-4b5f-a8dd-ea027b252f5e/log/longevity-10gb-3h-master-db-node-a2883744-eastus-8/download)**
- **[longevity-10gb-3h-master-db-node-a2883744-eastus-7](https://argus.scylladb.com/api/v1/tests/scylla-cluster-tests/a2883744-7196-4b5f-a8dd-ea027b252f5e/log/longevity-10gb-3h-master-db-node-a2883744-eastus-7/download)**
- **[longevity-10gb-3h-master-db-node-a2883744-eastus-6](https://argus.scylladb.com/api/v1/tests/scylla-cluster-tests/a2883744-7196-4b5f-a8dd-ea027b252f5e/log/longevity-10gb-3h-master-db-node-a2883744-eastus-6/download)**
- **[db-cluster-a2883744.tar.zst](https://argus.scylladb.com/api/v1/tests/scylla-cluster-tests/a2883744-7196-4b5f-a8dd-ea027b252f5e/log/db-cluster-a2883744.tar.zst/download)**
- **[schema-logs-a2883744.tar.zst](https://argus.scylladb.com/api/v1/tests/scylla-cluster-tests/a2883744-7196-4b5f-a8dd-ea027b252f5e/log/schema-logs-a2883744.tar.zst/download)**
- **[sct-runner-events-a2883744.tar.zst](https://argus.scylladb.com/api/v1/tests/scylla-cluster-tests/a2883744-7196-4b5f-a8dd-ea027b252f5e/log/sct-runner-events-a2883744.tar.zst/download)**
- **[sct-a2883744.log.tar.zst](https://argus.scylladb.com/api/v1/tests/scylla-cluster-tests/a2883744-7196-4b5f-a8dd-ea027b252f5e/log/sct-a2883744.log.tar.zst/download)**
- **[loader-set-a2883744.tar.zst](https://argus.scylladb.com/api/v1/tests/scylla-cluster-tests/a2883744-7196-4b5f-a8dd-ea027b252f5e/log/loader-set-a2883744.tar.zst/download)**
- **[monitor-set-a2883744.tar.zst](https://argus.scylladb.com/api/v1/tests/scylla-cluster-tests/a2883744-7196-4b5f-a8dd-ea027b252f5e/log/monitor-set-a2883744.tar.zst/download)**
- **[parallel-timelines-report-a2883744.tar.zst](https://argus.scylladb.com/api/v1/tests/scylla-cluster-tests/a2883744-7196-4b5f-a8dd-ea027b252f5e/log/parallel-timelines-report-a2883744.tar.zst/download)**
- **[builder-a2883744.log.tar.gz](https://argus.scylladb.com/api/v1/tests/scylla-cluster-tests/a2883744-7196-4b5f-a8dd-ea027b252f5e/log/builder-a2883744.log.tar.gz/download)**