Feat/connection draining #104

Eeshu-Yadav · 2025-09-19T10:29:30Z

fixes : #102

kmesh-bot · 2025-09-19T10:29:37Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign hzxuzhonghu for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

gemini-code-assist · 2025-09-19T10:30:00Z

Summary of Changes

Hello @Eeshu-Yadav, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a robust connection draining and multi-version listener management system. The primary goal is to enable graceful handling of listener updates and shutdowns, preventing abrupt connection terminations. By allowing multiple listener configurations to coexist and transition smoothly, the system can maintain service availability during dynamic configuration changes, ensuring a more resilient and user-friendly experience.

Highlights

Connection Draining Implementation: Introduced a comprehensive connection draining mechanism, allowing for graceful shutdown and dynamic updates of listeners without immediately terminating active connections. This includes defining drain scenarios, strategies (immediate, gradual), and protocol-specific behaviors (e.g., 'Connection: close' for HTTP/1.1, GOAWAY for HTTP/2).
Multi-Version Listener Management: Refactored the listener management system to support multiple versions of a listener running concurrently. This enables seamless transitions during configuration updates, where old listener versions can gracefully drain connections while new versions become active.
Enhanced Connection Tracking: Implemented a DefaultConnectionManager to track individual connection states (active, draining, closed) across listeners. This manager facilitates monitoring drain progress, cleaning up stale connections, and enforcing drain timeouts.
Configurable Drain Behavior: Added new configuration options to listeners and HTTP connection managers, including DrainType (Default, ModifyOnly) for listeners and drain_timeout for HTTP connection managers, allowing fine-grained control over how connections are drained.
Dependency Updates: Updated Cargo.toml and Cargo.lock to include new dependencies such as async-trait, dashmap, and multimap, which are integral to the new connection management and draining features.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a significant feature for connection draining. The implementation is extensive, touching configuration, listener management, and HTTP connection handling. While the feature is valuable, the current implementation has several critical and high-severity issues that need to be addressed before merging. These include flawed logic in listener updates, incorrect handling of draining listeners, and broken tests. I've also identified several areas with code duplication and opportunities for refactoring to improve maintainability.

gemini-code-assist · 2025-09-19T10:33:08Z

orion-lib/src/listeners/lds_update.rs

@@ -0,0 +1,213 @@
+// Copyright 2025 The kmesh Authors


This new file appears to contain critically flawed logic for listener updates and draining.

In handle_lds_update, listeners.insert will overwrite the existing listener entry, immediately dropping its JoinHandle and aborting its task. This prevents any graceful draining of the old listener. A MultiMap would be needed to manage multiple versions of a listener, similar to what's done in listeners_manager.rs.

In remove_listener, listeners.remove is called before starting the drain timeout. This means the timeout task will never find the listener to abort it after the drain period.

This implementation seems to contradict the more robust versioning and draining logic being introduced in listeners_manager.rs. If this file is intended for use, it needs a major rework. If it's experimental or dead code, it should be removed to avoid confusion.

gemini-code-assist · 2025-09-19T10:33:08Z

orion-lib/src/listeners/http_connection_manager.rs


        result.map(|mut response| {
+            // Apply drain signaling to response if draining is active
+            manager.drain_signaling.apply_http1_drain_signal_sync(&mut response);


This call to apply_http1_drain_signal_sync is both redundant and incorrect. It's redundant because drain signaling is already handled within the to_response implementation, which is called just before this. It's incorrect because apply_http1_drain_signal_sync unconditionally adds the Connection: close header without checking if draining is active, which can lead to prematurely closed connections. This line should be removed.

gemini-code-assist · 2025-09-19T10:33:08Z

orion-lib/src/listeners/listeners_manager.rs

+                        warn!(
+                            "Gracefully draining old listener {} version {} - monitored by background task",
+                            listener_name, version
+                        );
+                    }


In the case of a Gradual drain strategy, the listener task is not being aborted. This is a bug, as the listener will continue to accept new connections while it's supposed to be stopping. You should abort the listener's JoinHandle here, just as you do for the Immediate strategy, to prevent new connections from being accepted.

warn!( "Gracefully draining old listener {} version {} - monitored by background task", listener_name, version ); listener_info.handle.abort(); if let Some(drain_handle) = listener_info.drain_manager_handle.as_ref() { drain_handle.abort(); }

gemini-code-assist · 2025-09-19T10:33:08Z

orion-lib/src/listeners/listeners_manager.rs

+            version_info: None,
        };
-        man.start_listener(l1, l1_info.clone()).unwrap();
+        man.start_listener(l1, l1_info.clone()).await.unwrap();


The function start_listener is not async, but it's being called with .await. This will cause a compilation error. The same issue exists in test_config_change_drains_old_listener (line 1537) and test_address_conflict_resolution_graceful (line 1826). Please remove the .await from these calls.

man.start_listener(l1, l1_info.clone()).unwrap();

gemini-code-assist · 2025-09-19T10:33:08Z

orion-lib/src/listeners/drain_signaling.rs

+    fn should_drain(scenario: DrainScenario, drain_type: ConfigDrainType) -> bool {
+        use ConfigDrainType::*;
+        use DrainScenario::*;
+
+        match (scenario, drain_type) {
+            (_, Default) => true,
+            (HealthCheckFail, ModifyOnly) => false,
+            (ListenerUpdate, ModifyOnly) => true,
+            (HotRestart, ModifyOnly) => true,
+        }


This private should_drain function duplicates the logic of the public DrainScenario::should_drain method. To avoid code duplication and improve maintainability, you should remove this function and use the public method from DrainScenario in start_listener_draining.

gemini-code-assist · 2025-09-19T10:33:08Z

orion-lib/src/listeners/drain_signaling.rs

+            match &drain_state.protocol_behavior {
+                super::listeners_manager::ProtocolDrainBehavior::Http1 { connection_close: true } => {
+                    use hyper::header::{HeaderValue, CONNECTION};
+                    response.headers_mut().insert(CONNECTION, HeaderValue::from_static("close"));
+                    debug!("Applied 'Connection: close' header for HTTP/1.1 drain signaling");
+                },
+                super::listeners_manager::ProtocolDrainBehavior::Auto => {
+                    use hyper::header::{HeaderValue, CONNECTION};
+                    response.headers_mut().insert(CONNECTION, HeaderValue::from_static("close"));
+                    debug!("Applied 'Connection: close' header for HTTP/1.1 drain signaling");
+                },
+                _ => {
+                    debug!("Skipping Connection: close header for non-HTTP/1.1 protocol");
+                },
+            }


The match arms for ProtocolDrainBehavior::Http1 and ProtocolDrainBehavior::Auto contain identical code. You can combine them using a | pattern to reduce code duplication and improve readability.

match &drain_state.protocol_behavior { super::listeners_manager::ProtocolDrainBehavior::Http1 { connection_close: true } | super::listeners_manager::ProtocolDrainBehavior::Auto => { use hyper::header::{HeaderValue, CONNECTION}; response.headers_mut().insert(CONNECTION, HeaderValue::from_static("close")); debug!("Applied 'Connection: close' header for HTTP/1.1 drain signaling"); }, _ => { debug!("Skipping Connection: close header for non-HTTP/1.1 protocol"); }, }

gemini-code-assist · 2025-09-19T10:33:09Z

orion-lib/src/listeners/drain_signaling.rs

+        Ok(context)
+    }
+
+    async fn monitor_drain_progress(&self, context: Arc<ListenerDrainContext>, listener_id: String) {


The monitor_drain_progress function takes listener_id: String as a separate argument, but this information is already available within the context: Arc<ListenerDrainContext>. To simplify the function signature and reduce redundancy, consider removing the listener_id parameter and accessing it directly from the context (e.g., context.listener_id.clone()). You will need to update the call site in initiate_listener_drain as well.

async fn monitor_drain_progress(&self, context: Arc<ListenerDrainContext>) {

gemini-code-assist · 2025-09-19T10:33:09Z

orion-lib/src/listeners/listener.rs

+        drain_handler: Option<Arc<DefaultConnectionHandler>>,
    ) -> Result<()> {
        let shard_id = std::thread::current().id();
+        let connection_id = format!("{}:{}:{}", local_address, peer_addr, start_instant.elapsed().as_nanos());


Using start_instant.elapsed().as_nanos() to generate a unique connection_id is not robust. Under high load or on systems with low-resolution timers, this could lead to collisions. Consider using a more reliable method for generating unique IDs, such as an atomic counter or a UUID. Since uuid is already a dependency, you could use uuid::Uuid::new_v4() after adding use uuid;.

Suggested change

let connection_id = format!("{}:{}:{}", local_address, peer_addr, start_instant.elapsed().as_nanos());

let connection_id = format!("{}:{}:{}", local_address, peer_addr, uuid::Uuid::new_v4());

Eeshu-Yadav · 2025-09-19T18:43:20Z

orion-lib/src/listeners/listeners_manager.rs

+    }
+
+    pub fn stop_listener(&mut self, listener_name: &str) -> Result<()> {
+        if let Some(mut listeners) = self.listener_handles.remove(listener_name) {


@dawid-nowak here is logic is updated as per the sgguestion, that you referrring in the #99 . this will be merged after that , should i have improve the logic there as it will be merged after that .kindly tell if needed will modify in that branch , when removing the draining functionality that feedback is also correct during that time , so mistakenly this is here misplaced . kidnly let me know , if want the same logic otherisw this corrected as this pr is merged

hzxuzhonghu

Just reviewed a small part, and does not see it handles well a listener with different protocols{contains different filters,both envoy.filters.network.http_connection_manager and envoy.filters.network.tcp_proxy}, and not see the draintimeout passed from HttpConnectionManager.DrainTimeout

hzxuzhonghu · 2025-09-22T06:44:04Z

orion-lib/src/listeners/drain_signaling.rs

+    Tcp { global_timeout: Duration },
+    Http { global_timeout: Duration, drain_timeout: Duration },
+    Immediate,
+    Gradual,


what does Gradual mean, does envoy have such strategy

The Gradual strategy has been removed from the current implementation. Envoy uses protocol-specific draining strategies (HTTP with drain_timeout, TCP with global timeout) rather than a generic "Gradual" approach

hzxuzhonghu · 2025-09-22T06:49:03Z

orion-lib/src/listeners/drain_signaling.rs

+        }
+    }
+
+    pub fn with_timeouts(global_drain_timeout: Duration, default_http_drain_timeout: Duration) -> Self {


Wonder if a listener handles both http and tcp protocol, how do you support them?

This is now supported through the Mixed drain strategy.

pub enum DrainStrategy { // ... other variants Mixed { global_timeout: Duration, http_drain_timeout: Duration, tcp_connections: bool, http_connections: bool, }, }

hzxuzhonghu · 2025-09-22T06:51:41Z

orion-lib/src/listeners/drain_signaling.rs

+        }
+    }
+
+    pub async fn start_listener_draining(&self, drain_state: ListenerDrainState) {


not seeing these methods called, is this supported in a following pr?

The methods are now integrated. The DrainSignalingManager provides:
initiate_listener_drain_from_filter_analysis() ,start_listener_draining(),stop_listener_draining()

hzxuzhonghu · 2025-09-22T06:52:21Z

orion-lib/src/listeners/drain_signaling.rs

+}
+
+#[derive(Debug, Clone)]
+pub struct ListenerDrainState {


not quite understand what does ListenerDrainState mean

ListenerDrainState represents the current state of a listener's draining process. It tracks:
started_at: When draining began
strategy: What drain strategy to use (HTTP/TCP/Mixed)
protocol_behavior: How to handle different protocols during draining
drain_scenario: What triggered the drain (health check fail, listener update, etc.)
drain_type: The configured drain behavior

dawid-nowak · 2025-09-22T16:51:02Z

let's get #102 merged and then we can start reviewing this one

Copilot

Pull Request Overview

This PR implements connection draining functionality for the Orion proxy to handle graceful listener updates and shutdowns. The implementation addresses issue #102 by adding comprehensive connection tracking and draining mechanisms.

Implements graceful connection draining for listener updates and shutdowns
Adds comprehensive connection state management with drain timeouts and strategies
Introduces multi-version listener support with automatic cleanup policies

Reviewed Changes

Copilot reviewed 10 out of 11 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
orion-proxy/src/admin/config_dump.rs	Updates test fixtures to include new drain_type and drain_timeout fields
orion-lib/src/listeners/mod.rs	Adds new drain_signaling and lds_update modules to module exports
orion-lib/src/listeners/listeners_manager.rs	Major refactor implementing connection tracking, drain management, and multi-version listener support
orion-lib/src/listeners/listener.rs	Integrates connection tracking and drain handling into listener processing
orion-lib/src/listeners/lds_update.rs	New module for managing LDS updates with graceful draining
orion-lib/src/listeners/http_connection_manager.rs	Extends HTTP connection manager with drain signaling and connection lifecycle management
orion-lib/src/listeners/drain_signaling.rs	New core module implementing drain strategies and connection state management
orion-lib/Cargo.toml	Adds dependencies for connection tracking (dashmap, multimap)
orion-configuration/src/config/network_filters/http_connection_manager.rs	Adds drain_timeout configuration field
orion-configuration/src/config/listener.rs	Adds drain_type and version_info fields for listener configuration

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-09-23T12:14:21Z

orion-lib/src/listeners/listeners_manager.rs

+                            let drain_rate = drained_count as f64 / elapsed.as_secs_f64();
+
+                            if drain_rate > 0.0 {
+                                let estimated_remaining_time = current_connections as f64 / drain_rate;
+                                return Some(Instant::now() + Duration::from_secs_f64(estimated_remaining_time));


Division by zero vulnerability if elapsed.as_secs_f64() returns 0.0. This can occur when the elapsed time is very small (less than 1 second). Add a check to ensure the elapsed time is non-zero before division.

Suggested change

let drain_rate = drained_count as f64 / elapsed.as_secs_f64();

if drain_rate > 0.0 {

let estimated_remaining_time = current_connections as f64 / drain_rate;

return Some(Instant::now() + Duration::from_secs_f64(estimated_remaining_time));

let elapsed_secs = elapsed.as_secs_f64();

if elapsed_secs > 0.0 {

let drain_rate = drained_count as f64 / elapsed_secs;

if drain_rate > 0.0 {

let estimated_remaining_time = current_connections as f64 / drain_rate;

return Some(Instant::now() + Duration::from_secs_f64(estimated_remaining_time));

}

Copilot · 2025-09-23T12:14:21Z

orion-lib/src/listeners/listeners_manager.rs

+        assert!(third_version > 0);
+        assert!(third_version > 0);


Redundant assertions - the same condition third_version > 0 is checked three times. This appears to be a copy-paste error and should be reduced to a single assertion.

Suggested change

assert!(third_version > 0);

assert!(third_version > 0);

Copilot · 2025-09-23T12:14:22Z

orion-lib/src/listeners/listener.rs

    sync::broadcast::{self},
 };
 use tracing::{debug, info, warn};
+use uuid;


[nitpick] The uuid import should be more specific, e.g., use uuid::Uuid instead of importing the entire crate. This makes dependencies clearer and can improve compile times.

Suggested change

use uuid;

use uuid::Uuid;

Copilot · 2025-09-23T12:14:22Z

orion-lib/src/listeners/listener.rs

+        drain_handler: Option<Arc<DefaultConnectionHandler>>,
    ) -> Result<()> {
        let shard_id = std::thread::current().id();
+        let connection_id = format!("{}:{}:{}", local_address, peer_addr, uuid::Uuid::new_v4());


[nitpick] Creating a UUID for every connection can be expensive under high load. Consider using a simpler connection ID scheme such as an atomic counter combined with the peer address, which would be more performant while still ensuring uniqueness.

Copilot · 2025-09-23T12:14:22Z

orion-lib/src/listeners/lds_update.rs

+                let mut to_remove = Vec::new();
+                for (i, listener_info) in versions.iter().enumerate() {
+                    if listener_info.is_draining() {
+                        to_remove.push(i);
+                    }
+                }
+
+                for &index in to_remove.iter().rev() {
+                    if let Some(listener_info) = versions.get_mut(index) {
+                        listener_info.handle.abort();
+                        info!("LDS: Draining version of listener '{}' forcibly closed after timeout", name);
+                    }
+                }
+                for &index in to_remove.iter().rev() {
+                    versions.remove(index);
+                }


[nitpick] Removing elements by index in a Vec is O(n) operation. Since multiple indices are being removed in reverse order, this could be inefficient. Consider using Vec::retain with a predicate to remove multiple elements in a single pass.

Suggested change

let mut to_remove = Vec::new();

for (i, listener_info) in versions.iter().enumerate() {

if listener_info.is_draining() {

to_remove.push(i);

}

}

for &index in to_remove.iter().rev() {

if let Some(listener_info) = versions.get_mut(index) {

listener_info.handle.abort();

info!("LDS: Draining version of listener '{}' forcibly closed after timeout", name);

}

}

for &index in to_remove.iter().rev() {

versions.remove(index);

}

// Abort and log for draining listeners, then remove them in a single pass

versions.iter_mut()

.filter(|listener_info| listener_info.is_draining())

.for_each(|listener_info| {

listener_info.handle.abort();

info!("LDS: Draining version of listener '{}' forcibly closed after timeout", name);

});

versions.retain(|listener_info| !listener_info.is_draining());

Copilot · 2025-09-23T12:14:23Z

orion-lib/src/listeners/http_connection_manager.rs

    }

    pub fn remove_route(&self) {
+        self.http_filters_per_route.swap(Arc::new(HashMap::new()));


[nitpick] The empty HashMap creation could be extracted to a static or const to avoid repeated allocations. Consider creating a shared empty HashMap instance.

Eeshu-Yadav · 2025-09-23T18:00:57Z

Just reviewed a small part, and does not see it handles well a listener with different protocols{contains different filters,both envoy.filters.network.http_connection_manager and envoy.filters.network.tcp_proxy}, and not see the draintimeout passed from HttpConnectionManager.DrainTimeout

Now implemented. The code extracts the drain_timeout from the HttpConnectionManager configuration:

MainFilter::Http(http_config) => {
    has_http = true;
    http_drain_timeout = http_config.drain_timeout;

Eeshu-Yadav · 2025-09-28T10:03:28Z

@dawid-nowak kindly review this one

hzxuzhonghu

Same here, we need a design proposal FYI https://github.com/kubernetes/enhancements/

Eeshu-Yadav · 2025-09-30T05:05:04Z

Same here, we need a design proposal FYI https://github.com/kubernetes/enhancements/

i don't understand like , for this functionality why

Same here, we need a design proposal FYI https://github.com/kubernetes/enhancements/

in the kubernetes repo , which link is given ? in other pr link of man kmesh repo , are referring to kmesh one or the kubernetes one ? kindly clarify please

hzxuzhonghu · 2025-09-30T08:33:12Z

I just give you an reference, we can add the proposal in orion ./docs/proposal

Eeshu-Yadav · 2025-09-30T09:37:18Z

I just give you an reference, we can add the proposal in orion ./docs/proposal

okk

Copilot

Pull Request Overview

Copilot reviewed 12 out of 13 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

orion-lib/src/listeners/listeners_manager.rs:1

The swap operation on http_filters_per_route seems disconnected from the remove_route method's purpose. Consider moving this logic to a more appropriate location or adding a comment explaining why it's necessary.

// Copyright 2025 The kmesh Authors

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-09T11:36:03Z

orion-lib/src/listeners/listener.rs

+        let selected_filterchain =
+            Self::select_filterchain(&filter_chains, &downstream_metadata, server_name.as_deref())?;


Duplicate call to select_filterchain - the same operation is performed twice consecutively on lines 469-470 and 472-473. Remove the duplicate call to avoid unnecessary computation.

Suggested change

let selected_filterchain =

Self::select_filterchain(&filter_chains, &downstream_metadata, server_name.as_deref())?;

Copilot · 2025-10-09T11:36:04Z

orion-lib/src/listeners/http_connection_manager.rs

            //
            // 1. evaluate InitHttpContext, if logging is enabled
-            eval_http_init_context(&request, &trans_handler, downstream_metadata.server_name.as_deref());
+            eval_http_init_context(&request, &trans_handler, None);


The server_name parameter is hardcoded to None instead of using the available server_name from downstream_metadata. This could result in missing server name information in access logs and tracing contexts.

- Optimize start_listener to avoid second hashmap lookup - Simplify stop_listener using remove() directly instead of get_mut + drain + remove - Extract duplicated test code into create_test_listener_config helper function - Improve code readability and performance Signed-off-by: Eeshu-Yadav <[email protected]>

… versions Enhance the existing multiple listener versions approach with configurable cleanup policies and resource management, following Envoy's patterns for graceful listener updates. Key improvements: - Add ListenerManagerConfig with configurable cleanup policies - Implement automatic cleanup of old listener versions - Support CountBasedOnly, TimeBasedOnly, and Hybrid cleanup strategies - Prevent resource leaks while maintaining connection continuity - Add comprehensive test coverage for cleanup behaviors Benefits: - Graceful LDS-style updates without connection interruption - Long-running connections don't block listener deployments - Configurable behavior for different production environments - Automatic resource management prevents memory/handle leaks Configuration options: - max_versions_per_listener: limit concurrent versions (default: 2) - cleanup_policy: define cleanup strategy (default: CountBasedOnly(2)) - cleanup_interval: background cleanup frequency (default: 60s) Signed-off-by: Eeshu-Yadav <[email protected]>

Implement comprehensive improvements: - Replace HashMap<String, Vec<ListenerInfo>> with MultiMap for cleaner API - Move default config assignment inside ListenerManager::new() constructor - Change listener exit logging from warn to info level - Merge redundant version logging into single informative message - Fix unnecessary double cloning of listener_name - Use Vec::drain for efficient batch removal of old versions - Remove unimplemented TimeBasedOnly cleanup policy to avoid incorrect behavior - Remove redundant Hybrid cleanup policy, keeping only CountBasedOnly Signed-off-by: Eeshu-Yadav <[email protected]>

- Add drain_timeout field with 5000ms default (Envoy compliant) - Integrate with drain signaling and timeout enforcement - Support protocol-specific draining (HTTP/1.1, HTTP/2, TCP) - Add comprehensive test coverage for drain behaviors Signed-off-by: Eeshu-Yadav <[email protected]>

…l support - Add ListenerProtocolConfig enum to handle HTTP, TCP, and mixed protocol listeners - Respect HttpConnectionManager.drain_timeout field from configuration - Support listeners with both http_connection_manager and tcp_proxy filters - Remove ambiguous 'Gradual' strategy, align with Envoy's draining behavior - Add initiate_listener_drain_from_filter_analysis() for proper integration Signed-off-by: Eeshu-Yadav <[email protected]>

Copilot

Pull Request Overview

Copilot reviewed 12 out of 13 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

orion-lib/src/listeners/listeners_manager.rs:1

Server name information is being lost - the function is called with None instead of using the available server name from downstream_metadata.server_name.as_deref() which was used in the previous call.

// Copyright 2025 The kmesh Authors

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-09T12:12:50Z

orion-lib/src/listeners/listener.rs

+        let selected_filterchain =
+            Self::select_filterchain(&filter_chains, &downstream_metadata, server_name.as_deref())?;
+


Duplicate call to select_filterchain - this function is called twice with identical parameters (lines 572-573 and 575-576). Remove the duplicate call to avoid unnecessary processing.

Suggested change

let selected_filterchain =

Self::select_filterchain(&filter_chains, &downstream_metadata, server_name.as_deref())?;

Copilot · 2025-10-09T12:12:50Z

orion-lib/src/listeners/http_connection_manager.rs

    }

    pub fn remove_route(&self) {
+        self.http_filters_per_route.swap(EMPTY_HASHMAP.clone());


[nitpick] Consider using Arc::clone(&EMPTY_HASHMAP) instead of EMPTY_HASHMAP.clone() for clarity, as this is cloning the Arc wrapper rather than the HashMap contents.

Suggested change

self.http_filters_per_route.swap(EMPTY_HASHMAP.clone());

self.http_filters_per_route.swap(Arc::clone(&EMPTY_HASHMAP));

Copilot

Pull Request Overview

Copilot reviewed 13 out of 14 changed files in this pull request and generated 6 comments.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-09T12:54:56Z

orion-lib/src/listeners/listeners_manager.rs

+            let drain_state = super::drain_signaling::ListenerDrainState {
+                started_at: std::time::Instant::now(),
+                strategy: DrainStrategy::Gradual,
+                protocol_behavior: ProtocolDrainBehavior::Auto,
+                drain_scenario: super::drain_signaling::DrainScenario::ListenerUpdate,
+                drain_type: orion_configuration::config::listener::DrainType::Default,
+            };


[nitpick] The DrainState initialization uses hardcoded default values. Consider creating a builder or default constructor for ListenerDrainState to encapsulate these default values and make them configurable based on listener configuration.

Suggested change

let drain_state = super::drain_signaling::ListenerDrainState {

started_at: std::time::Instant::now(),

strategy: DrainStrategy::Gradual,

protocol_behavior: ProtocolDrainBehavior::Auto,

drain_scenario: super::drain_signaling::DrainScenario::ListenerUpdate,

drain_type: orion_configuration::config::listener::DrainType::Default,

};

// Use the default constructor for ListenerDrainState to encapsulate default values

let drain_state = super::drain_signaling::ListenerDrainState::default();

Copilot · 2025-10-09T12:54:57Z

orion-lib/src/listeners/listeners_manager.rs

+                                draining_listener.handle.abort();
+                                if let Some(drain_handle) = draining_listener.drain_manager_handle {
+                                    drain_handle.abort();
+                                }
+                            } else {
+                                warn!(
+                                    "Gracefully draining old listener {} version {} - monitored by background task",
+                                    listener_name, draining_listener.version
+                                );
+                                draining_listener.handle.abort();
+                                if let Some(drain_handle) = draining_listener.drain_manager_handle {
+                                    drain_handle.abort();
+                                }


[nitpick] The abort logic for drain handles is duplicated across multiple locations (lines 1204-1207, 1246-1249, 1291-1294, 1300-1303). Consider extracting this into a helper method like abort_listener_handles() to reduce code duplication.

Suggested change

draining_listener.handle.abort();

if let Some(drain_handle) = draining_listener.drain_manager_handle {

drain_handle.abort();

}

} else {

warn!(

"Gracefully draining old listener {} version {} - monitored by background task",

listener_name, draining_listener.version

);

draining_listener.handle.abort();

if let Some(drain_handle) = draining_listener.drain_manager_handle {

drain_handle.abort();

}

abort_listener_handles(&draining_listener);

} else {

warn!(

"Gracefully draining old listener {} version {} - monitored by background task",

listener_name, draining_listener.version

);

abort_listener_handles(&draining_listener);

Copilot · 2025-10-09T12:54:57Z

orion-lib/src/listeners/http_connection_manager.rs

            //
            // 1. evaluate InitHttpContext, if logging is enabled
-            eval_http_init_context(&request, &trans_handler, downstream_metadata.server_name.as_deref());
+            eval_http_init_context(&request, &trans_handler, None);


The server_name parameter is hardcoded to None, but the original code used downstream_metadata.server_name.as_deref(). This change could break TLS SNI functionality and access logging that depends on server name information.

Suggested change

eval_http_init_context(&request, &trans_handler, None);

eval_http_init_context(&request, &trans_handler, downstream_metadata.server_name.as_deref());

Copilot · 2025-10-09T12:54:58Z

orion-lib/src/listeners/drain_signaling.rs

+        _connection_id: String,
+        _protocol: super::listeners_manager::ConnectionProtocol,
+        _peer_addr: std::net::SocketAddr,
+    ) {


The DefaultConnectionHandler::register_connection method is a no-op with unused parameters. If this is a placeholder for future implementation, consider adding a TODO comment or unimplemented!() macro to make the intent clear.

Suggested change

) {

) {

unimplemented!("DefaultConnectionHandler::register_connection is not yet implemented");

Copilot · 2025-10-09T12:54:58Z

orion-lib/src/listeners/http_connection_manager.rs

+static EMPTY_HASHMAP: LazyLock<Arc<HashMap<RouteMatch, Vec<Arc<HttpFilter>>>>> =
+    LazyLock::new(|| Arc::new(HashMap::new()));
+


[nitpick] Using a static LazyLock for an empty HashMap may be over-engineered. Consider using a const empty HashMap or simply creating an empty HashMap inline where needed, as the allocation cost is minimal compared to the complexity of a static.

Suggested change

static EMPTY_HASHMAP: LazyLock<Arc<HashMap<RouteMatch, Vec<Arc<HttpFilter>>>>> =

LazyLock::new(|| Arc::new(HashMap::new()));

// Use Arc::new(HashMap::new()) inline instead of a static LazyLock for an empty HashMap.

Copilot · 2025-10-09T12:54:58Z

orion-lib/src/listeners/listeners_manager.rs

+    #[allow(dead_code)]
+    fn count_draining_versions(listener_infos: &[ListenerInfo]) -> usize {
+        listener_infos.iter().filter(|info| info.is_draining()).count()
+    }
+


Functions marked with #[allow(dead_code)] should either be used or removed. If these are intended for future use or debugging, consider documenting their purpose or moving them to a separate utility module.

Suggested change

#[allow(dead_code)]

fn count_draining_versions(listener_infos: &[ListenerInfo]) -> usize {

listener_infos.iter().filter(|info| info.is_draining()).count()

}

Add comprehensive proposal for implementing connection draining with multiple listener versions during LDS updates. Signed-off-by: Eeshu-Yadav <[email protected]>

Eeshu-Yadav · 2025-10-12T14:18:53Z

@dawid-nowak kindly review this

kmesh-bot requested review from dawid-nowak and hzxuzhonghu September 19, 2025 10:29

kmesh-bot added the size/XXL label Sep 19, 2025

gemini-code-assist bot reviewed Sep 19, 2025

View reviewed changes

Eeshu-Yadav force-pushed the feat/connection-draining branch 3 times, most recently from 61acff6 to db23221 Compare September 19, 2025 15:45

Eeshu-Yadav commented Sep 19, 2025

View reviewed changes

hzxuzhonghu reviewed Sep 22, 2025

View reviewed changes

Eeshu-Yadav mentioned this pull request Sep 22, 2025

feat: allow multiple versions of the same listener #99

Merged

hzxuzhonghu requested a review from Copilot September 23, 2025 12:12

Copilot AI reviewed Sep 23, 2025

View reviewed changes

Eeshu-Yadav force-pushed the feat/connection-draining branch 2 times, most recently from 99c2e88 to 2da8a93 Compare September 23, 2025 21:06

Eeshu-Yadav requested a review from hzxuzhonghu September 25, 2025 18:40

hzxuzhonghu requested changes Sep 29, 2025

View reviewed changes

Copilot AI review requested due to automatic review settings October 9, 2025 11:35

Copilot AI reviewed Oct 9, 2025

View reviewed changes

Eeshu-Yadav force-pushed the feat/connection-draining branch from ea63501 to 99b926d Compare October 9, 2025 11:36

Eeshu-Yadav added 3 commits October 9, 2025 17:31

Eeshu-Yadav added 2 commits October 9, 2025 17:37

Copilot AI review requested due to automatic review settings October 9, 2025 12:11

Eeshu-Yadav force-pushed the feat/connection-draining branch from 99b926d to 58de48f Compare October 9, 2025 12:11

Eeshu-Yadav requested a review from hzxuzhonghu October 9, 2025 12:12

Copilot AI reviewed Oct 9, 2025

View reviewed changes

Eeshu-Yadav force-pushed the feat/connection-draining branch from 58de48f to d0b5bd4 Compare October 9, 2025 12:25

Copilot AI review requested due to automatic review settings October 9, 2025 12:53

Eeshu-Yadav force-pushed the feat/connection-draining branch from d0b5bd4 to 899ab06 Compare October 9, 2025 12:53

Copilot AI reviewed Oct 9, 2025

View reviewed changes

docs: add connection draining proposal

f3ec244

Add comprehensive proposal for implementing connection draining with multiple listener versions during LDS updates. Signed-off-by: Eeshu-Yadav <[email protected]>

Eeshu-Yadav force-pushed the feat/connection-draining branch from 899ab06 to f3ec244 Compare October 9, 2025 13:20

	let connection_id = format!("{}:{}:{}", local_address, peer_addr, start_instant.elapsed().as_nanos());
	let connection_id = format!("{}:{}:{}", local_address, peer_addr, uuid::Uuid::new_v4());

		let selected_filterchain =
		Self::select_filterchain(&filter_chains, &downstream_metadata, server_name.as_deref())?;

	self.http_filters_per_route.swap(EMPTY_HASHMAP.clone());
	self.http_filters_per_route.swap(Arc::clone(&EMPTY_HASHMAP));

	eval_http_init_context(&request, &trans_handler, None);
	eval_http_init_context(&request, &trans_handler, downstream_metadata.server_name.as_deref());

	) {
	) {
	unimplemented!("DefaultConnectionHandler::register_connection is not yet implemented");

		static EMPTY_HASHMAP: LazyLock<Arc<HashMap<RouteMatch, Vec<Arc<HttpFilter>>>>> =
		LazyLock::new(\|\| Arc::new(HashMap::new()));

	static EMPTY_HASHMAP: LazyLock<Arc<HashMap<RouteMatch, Vec<Arc<HttpFilter>>>>> =
	LazyLock::new(\|\| Arc::new(HashMap::new()));
	// Use Arc::new(HashMap::new()) inline instead of a static LazyLock for an empty HashMap.

Feat/connection draining #104

Are you sure you want to change the base?

Feat/connection draining #104

Uh oh!

Conversation

Eeshu-Yadav commented Sep 19, 2025

Uh oh!

kmesh-bot commented Sep 19, 2025

Uh oh!

gemini-code-assist bot commented Sep 19, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hzxuzhonghu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dawid-nowak commented Sep 22, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 23, 2025

Eeshu-Yadav commented Sep 30, 2025 •

edited

Loading