Swim bugs completed #141

syed-khadeerahmed · 2025-10-07T06:42:09Z

Type of Change

Bug fix
New feature
Breaking change
Documentation update

Description

6 Bugs fixed

Bug Fix:

[Swim]: Some issues from 'dnac_api_task_timeout' need to be checked
[Swim]: The module enables concurrent activations without verifying device status, leading to errors and inaccurate updates
[Swim]: Incorrect DNAC version mapping in the module causes the run of unsupported APIs.
[Swim]: Filtering devices without providing the 'site_name' parameter does not work accurately.
[Swim]: Filtering devices that provide 'site_name' = 'Global' is not functioning correctly
[Swim]: The message returned when activated successfully is not correct

Root Cause (if applicable):

Issue: dnac_api_task_timeout causing incomplete activations
Fix: Added a new configurable parameter api_task_timeout for both activation and distribution processes, as these operations can take longer depending on network conditions. The default value is set to 1800 seconds, but customers can modify it based on their network speed and performance needs.
Issue: Concurrent activations triggered without verifying device status
Fix: Implemented the same api_task_timeout mechanism to ensure proper synchronization during concurrent activations and prevent inaccurate task updates or errors caused by premature timeouts.
Issue: Incorrect DNAC version mapping causing unsupported API execution
Fix: Updated and correctly mapped all relevant APIs to DNAC version 3.1.3.0, ensuring compatibility and stable module behavior with the supported API set.
Issue: Device filtering not functioning accurately when site_name parameter is not provided
Fix: Implemented a default behavior that automatically treats the site as ‘Global’ when no site_name is specified, ensuring all child sites are included in the output.
Issue: Device filtering fails when site_name is explicitly set to ‘Global’
Fix: Applied the same logic as above—when the site_name or site_type is ‘Global’, the module now retrieves and returns all devices from the global and its child sites correctly.
Issue: Incorrect success message upon activation completion
Fix: Corrected the final success message logic to map and display the accurate device IPs associated with the activation result, ensuring clarity in task completion feedback.

Enhancement: N/A
Enhancement Description: N/A
Impact Area: N/A

Testing Done:

Manual testing
Unit tests
Integration tests

Test cases covered: [Mention test case IDs or brief points]

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
Any dependent changes have been merged and published in downstream modules
All the sanity checks have been completed and the sanity test cases have been executed

Ansible Best Practices

Tasks are idempotent (can be run multiple times without changing state)
Variables and secrets are handled securely (e.g., using ansible-vault or environment variables)
Playbooks are modular and reusable
Handlers are used for actions that need to run on change

Documentation

All options and parameters are documented clearly.
Examples are provided and tested.
Notes and limitations are clearly stated.

Screenshots (if applicable)

Notes to Reviewers

madhansansel · 2025-10-09T15:35:01Z

plugins/modules/swim_workflow_manager.py

            description: Specify the name of the device
              family such as Switches and Hubs, etc.
            type: str
+          api_task_timeout:


a) Can we say what is the difference b/w global timeout and this value?
b) In document you mentioned as default of 2400 seconds but that is not true. Instead can you put in recommendations?
c) Add the version when do we start to support it..

Check the sample snippet and update if required.. Also get approval from @DNACENSolutions

api_task_timeout: description: | Timeout duration in seconds for image distribution API operations. Controls how long the system waits for image distribution tasks to complete, including image transfer to target devices, network propagation, and distribution validation. Operation phases covered by this timeout: - Image preparation and validation on Catalyst Center - Network transfer to target devices - Installation verification on target devices - Distribution status confirmation Default of 1800 seconds (30 minutes) accounts for: - Large image files (up to several GB) - Multiple target devices in site-based operations - Network latency and bandwidth constraints - Device processing and storage capabilities Recommended timeout values: - Small networks (1-10 devices): 900-1800 seconds - Medium networks (10-50 devices): 1800-3600 seconds - Large networks (50+ devices): 3600-7200 seconds Note: This timeout is independent of the global 'dnac_api_task_timeout' parameter and specifically applies to distribution operations only. type: int default: 1800 version_added: ??????

madhansansel · 2025-10-09T15:35:31Z

plugins/modules/swim_workflow_manager.py

              ACCESS, BORDER ROUTER, DISTRIBUTION, and
              CORE.
            type: str
+          api_task_timeout:


api_task_timeout: description: | Timeout duration in seconds for image activation API operations. Controls how long the system waits for image activation processes to complete before timing out, including device reboot and startup verification. Operation phases covered by this timeout: - Image validation and preparation for activation - Device upgrade mode processing (install/bundle/currentlyExists) - Device reboot and startup sequence (if required) - Post-activation connectivity and status verification - Golden image validation (if applicable) Default of 1800 seconds (30 minutes) accommodates: - Device boot time variations (switches: 5-15 min, routers: 10-20 min) - Image installation and verification processes - Network convergence and connectivity restoration - Multiple devices in concurrent activation scenarios Recommended timeout values by device type: - Access switches: 1200-1800 seconds (20-30 minutes) - Distribution/Core switches: 1800-2700 seconds (30-45 minutes) - Routers and complex devices: 2700-3600 seconds (45-60 minutes) Warning: Setting timeout too low may cause premature failure reporting for successful but slow activation processes. type: int default: 1800 version_added: ??????

I would ike the name api_task_timeout to be changed, it should be named for the activity it is addressing to. i.e. image_activation_timeout and image_distribution_timeout, both should be provided to user so they can individually plan to timeouts correctly. For example distribution on slow. network may take1Hr or more, activation will very based on if the device is a stack and and may vary from 10 mins to 30 minutes.

madhansansel · 2025-10-09T15:40:36Z

plugins/modules/swim_workflow_manager.py

          device_family_name:
            description: Specify the name of the device
              family such as Switches and Hubs, etc.
            type: str


The name "api_task_timeout" may get confused with the existing global dnac_api_task_timeout parameter. Please note that the parameter name should reflect its specific operation scope.

Can we say

distribution_task_timeout: Timeout duration in seconds specifically for SWIM image distribution operations. Controls how long the system waits for distribution tasks to complete.... activation_task_timeout: Timeout duration in seconds specifically for SWIM image activation operations. Controls how long the system waits for activation tasks to complete.....

OR

swim_operation_timeout: Timeout duration in seconds for SWIM operations (distribution/activation).. Overrides the global dnac_api_task_timeout for SWIM-specific tasks...........

OR

image_distribution_timeout. .. image_activation_timeout ...
Check with @DNACENSolutions and update...

type: int
default: 1800

madhansansel · 2025-10-09T15:42:28Z

plugins/modules/swim_workflow_manager.py

-                "Site name not specified; defaulting to 'Global' to fetch all devices under this category",
-                "INFO",
-            )
+            if self.compare_dnac_versions(self.get_ccc_version(), "2.3.5.3") <= 0:


current_version = self.get_ccc_version() if self.compare_dnac_versions(current_version, "2.3.5.3") <= 0: site_name = "Global/.*" self.log( "Catalyst Center version {0} (≤2.3.5.3) detected - using wildcard pattern 'Global/.*' " "to fetch devices from Global site and all child sites via legacy API".format(current_version), "INFO", ) else: site_name = "Global" self.log( "Catalyst Center version {0} (>2.3.5.3) detected - using 'Global' site name " "to fetch devices via enhanced site hierarchy API".format(current_version), "INFO", )

madhansansel · 2025-10-09T15:43:14Z

plugins/modules/swim_workflow_manager.py

                    for item_dict in item["response"]:
                        site_response_list.append(item_dict)
        else:
+            self.log(site_name, "DEBUG")


Please don't print the variable name directly. We instructed this many times in Code review.

madhansansel · 2025-10-09T15:49:22Z

plugins/modules/swim_workflow_manager.py

+            return self
+
+        if not isinstance(response, dict):
+            self.msg = "response is not a dictionary"


self.msg = "Invalid response format from API '{0}' - expected dictionary, got {1}".format( api_name, type(response).__name__ )

madhansansel · 2025-10-09T15:50:12Z

plugins/modules/swim_workflow_manager.py

+            self.status = "exited"
+            return self
+
+        task_info = response.get("response")


first check the task_info, if not then return..

task_info = response.get("response") if not task_info: self.msg = "Missing task information in response from API '{0}'".format(api_name) self.log(self.msg, "ERROR") self.status = "failed" return self if task_info.get("errorcode") is not None: error_detail = task_info.get("detail", "Unknown error occurred") self.msg = "API '{0}' returned error: {1}".format(api_name, error_detail) self.log(self.msg, "ERROR") self.status = "failed" return self

madhansansel · 2025-10-09T15:50:40Z

plugins/modules/swim_workflow_manager.py

+            self.status = "failed"
+            return self
+
+        task_id = task_info.get("taskId")


if task_id is none, then return??

if not task_id: self.msg = "Missing taskId in response from API '{0}'".format(api_name) self.log(self.msg, "ERROR") self.status = "failed" return self

madhansansel · 2025-10-09T15:51:46Z

plugins/modules/swim_workflow_manager.py

+            return self
+
+        task_id = task_info.get("taskId")
+        start_time = time.time()


it is good to determine the timeout for this operation..

# Determine appropriate timeout for this operation if operation_details: operation_type = api_name.lower().replace("trigger_software_image_", "").replace("_", "") max_timeout = self.get_operation_timeout(operation_type, operation_details) else: max_timeout = getattr(self, 'max_timeout', self.params.get("dnac_api_task_timeout", 1800)) poll_interval = self.params.get("dnac_task_poll_interval", 2) self.log( "Monitoring task '{0}' from API '{1}' with timeout {2}s and poll interval {3}s".format( task_id, api_name, max_timeout, poll_interval ), "INFO" ) start_time = time.time() task_status_history = [] while True: elapsed_time = time.time() - start_time

madhansansel · 2025-10-09T16:00:38Z

plugins/modules/swim_workflow_manager.py

+
+        task_id = task_info.get("taskId")
+        start_time = time.time()
+        while True:


Add a sample code to check the task_status.. But you can refine and update if required..

task_status_history = [] while True: elapsed_time = time.time() - start_time # Check for timeout if elapsed_time >= max_timeout: self.msg = ( "Timeout of {0} seconds reached for task '{1}' from API '{2}'. " "Task may still be running on Catalyst Center. " "Consider increasing timeout or checking task status manually." ).format(max_timeout, task_id, api_name) self.log(self.msg, "WARNING") self.status = "failed" break # Get current task status try: task_details = self.get_tasks_by_id(task_id) self.log( "Retrieved task details for ID '{0}': {1}".format(task_id, task_details), "DEBUG" ) except Exception as e: self.msg = "Failed to retrieve task details for ID '{0}': {1}".format(task_id, str(e)) self.log(self.msg, "ERROR") self.status = "failed" break if not task_details: self.log( "Empty task details received for ID '{0}', retrying...".format(task_id), "WARNING" ) time.sleep(poll_interval) continue task_status = task_details.get("status", "UNKNOWN") # Track status changes for debugging if not task_status_history or task_status_history[-1] != task_status: task_status_history.append(task_status) self.log( "Task '{0}' status changed to: {1} (elapsed: {2:.1f}s)".format( task_id, task_status, elapsed_time ), "INFO" ) # Handle terminal states if task_status == "FAILURE": try: failure_details = self.get_task_details_by_id(task_id) failure_reason = failure_details.get("failureReason", "Unknown failure reason") self.msg = "Task '{0}' from API '{1}' failed: {2}".format( task_id, api_name, failure_reason ) self.log(self.msg, "ERROR") except Exception as e: self.msg = "Task '{0}' failed and unable to retrieve failure details: {1}".format( task_id, str(e) ) self.log(self.msg, "ERROR") self.status = "failed" break elif task_status == "SUCCESS": self.result["changed"] = True self.msg = "Task '{0}' from API '{1}' completed successfully in {2:.1f} seconds".format( task_id, api_name, elapsed_time ) self.log(self.msg, "INFO") self.status = "success" break # Handle unknown/unexpected status elif task_status not in ["IN_PROGRESS", "PENDING", "RUNNING"]: self.log( "Unexpected task status '{0}' for task '{1}'. Continuing to monitor...".format( task_status, task_id ), "WARNING" ) # Log periodic progress updates if int(elapsed_time) % 30 == 0 and elapsed_time > 0: # Every 30 seconds progress_info = task_details.get("progress", "No progress information") self.log( "Task '{0}' progress update: {1} (status: {2}, elapsed: {3:.1f}s)".format( task_id, progress_info, task_status, elapsed_time ), "DEBUG" ) time.sleep(poll_interval) # Log final task monitoring summary self.log( "Task monitoring completed for '{0}'. Final status: {1}, Total time: {2:.1f}s, Status history: {3}".format( task_id, self.status, time.time() - start_time, " -> ".join(task_status_history) ), "INFO" )

def get_operation_timeout(self, operation_type, operation_details): """ Get the appropriate timeout value for specific SWIM operations. Args: operation_type (str): Type of operation ('distribution' or 'activation') operation_details (dict): Operation configuration details Returns: int: Timeout value in seconds """ self.log("Determining timeout for {0} operation".format(operation_type), "DEBUG") # Map operation types to their specific timeout parameter names timeout_key_mapping = { "distribution": "distribution_task_timeout", # Recommended name... Check and update the name "activation": "activation_task_timeout" # Recommended name.... Check and update the name } # Get operation-specific timeout if provided timeout_key = timeout_key_mapping.get(operation_type) if timeout_key and timeout_key in operation_details: operation_timeout = operation_details[timeout_key] self.log( "Using operation-specific {0}: {1} seconds for {2}".format( timeout_key, operation_timeout, operation_type ), "INFO" ) return operation_timeout # Fallback to global timeout global_timeout = self.params.get("dnac_api_task_timeout", 1800) # Correct global name self.log( "Using global dnac_api_task_timeout: {0} seconds for {1}".format( global_timeout, operation_type ), "INFO" ) return global_timeout

madhansansel · 2025-10-13T10:27:06Z

plugins/modules/swim_workflow_manager.py

            description: Specify the name of the device
              family such as Switches and Hubs, etc.
            type: str
+          Image_distribution_timeout:


image_distribution_timeout..

not Image_distribution_timeout

6 bugs done

f06ed05

syed-khadeerahmed requested a review from madhansansel as a code owner October 7, 2025 06:42

6 bugs done

a2703a8

madhansansel reviewed Oct 9, 2025

View reviewed changes

SyedKhadeerAhmed added 2 commits October 10, 2025 15:46

param confirmation is pending, testing is pending

999dabe

review completed

1afa78a

madhansansel reviewed Oct 13, 2025

View reviewed changes

addressed review comments

6d272d3

madhansansel approved these changes Oct 13, 2025

View reviewed changes

sanity fixed

c006930

madhansansel merged commit 352e7fa into cisco-en-programmability:main Oct 13, 2025
12 checks passed

Swim bugs completed #141

Swim bugs completed #141

Uh oh!

Conversation

syed-khadeerahmed commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Type of Change

Description

Checklist

Ansible Best Practices

Documentation

Screenshots (if applicable)

Notes to Reviewers

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

syed-khadeerahmed commented Oct 7, 2025 •

edited

Loading