-
-
Notifications
You must be signed in to change notification settings - Fork 59
Improve port parsing patterns to prevent FlowStorm false positives #113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…tespace - Add case-insensitive matching to port, Started, and Listening patterns - Use bounded whitespace (\s{1,10}) instead of unbounded (\s+) for predictability - Add optional space after colon in port number pattern (:\s?) - Add word boundary (\b) to port pattern to prevent false matches - Add regression test for FlowStorm output (issue #109) These changes maintain backward compatibility with all existing nREPL output formats while preventing false positives from FlowStorm's 'mem_reporter__4628' output that was matching the old regex. Co-authored-by: gveres <[email protected]>
WalkthroughReplaces broad regex patterns used to extract the nREPL port with a concise set of stricter patterns and adds a test ensuring FlowStorm output does not produce a false-positive port. No control-flow, API, or error-handling changes. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Poem
Pre-merge checks and finishing touches✅ Passed checks (5 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/clojure_mcp/nrepl_launcher.clj (1)
18-21
: Optional: Consider adding word boundary before "port" for consistency.While the current pattern
#"(?i)port\s{1,10}(\d+)\b"
correctly prevents the FlowStorm issue, adding a leading word boundary (\bport
) would provide additional strictness against edge cases like "support 12345" or "transport 12345". However, the current pattern is sufficient for realistic nREPL output.Additionally, note the intentional design choice: pattern 1 uses
(\d+)
(any digits) while patterns 2-4 use(\d{4,5})
(4-5 digits). This allows pattern 1 to capture ports in formats like "port: 8080" before falling back to the colon pattern, relying on the range validation at lines 28-29 to filter invalid ports.If you prefer maximum strictness and consistency, consider this diff:
- (let [patterns [#"(?i)port\s{1,10}(\d+)\b" ; port 12345 + (let [patterns [#"(?i)\bport\s{1,10}(\d{4,5})\b" ; port 12345 #":\s?(\d{4,5})\b" ; :12345 or : 12345 (4-5 digit ports) #"(?i)Started on\s{1,10}(\d{4,5})\b" ; Started on 12345 #"(?i)Listening on\s{1,10}(\d{4,5})\b"] ; Listening on 12345However, verify that all existing test cases (especially "port 123" and "port 999999" at test lines 27-28) still pass with this change, as they rely on the range validation to filter invalid ports.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/clojure_mcp/nrepl_launcher.clj
(1 hunks)test/clojure_mcp/nrepl_launcher_test.clj
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{clj,cljc}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{clj,cljc}
: Use kebab-case for vars/functions; end predicates with?
(e.g.,is-top-level-form?
)
Usetry/catch
with specific exception handling; use atom for tracking errors
Use 2-space indentation and maintain whitespace in edited forms
Align namespaces with directory structure (e.g.,clojure-mcp.repl-tools
)
Files:
test/clojure_mcp/nrepl_launcher_test.clj
src/clojure_mcp/nrepl_launcher.clj
test/**/*.{clj,cljc}
📄 CodeRabbit inference engine (CLAUDE.md)
Use
deftest
with descriptive names;testing
for subsections;is
for assertions in tests
Files:
test/clojure_mcp/nrepl_launcher_test.clj
src/**/*.{clj,cljc}
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.{clj,cljc}
: Use:require
with ns aliases for imports (e.g.,[clojure.string :as string]
)
Include clear tool:description
for LLM guidance
Validate inputs and provide helpful error messages in MCP tools
Return structured data with both result and error status in MCP tools
Maintain atom-based state for consistent service access in MCP tools
Files:
src/clojure_mcp/nrepl_launcher.clj
🧠 Learnings (1)
📚 Learning: 2025-08-18T00:39:24.837Z
Learnt from: hugoduncan
PR: bhauman/clojure-mcp#86
File: src/clojure_mcp/subprocess.clj:0-0
Timestamp: 2025-08-18T00:39:24.837Z
Learning: In the clojure-mcp project, the user prefers to only parse stdout for nREPL port discovery in the subprocess.clj module, and explicitly does not want to parse stderr for port information.
Applied to files:
src/clojure_mcp/nrepl_launcher.clj
🔇 Additional comments (2)
test/clojure_mcp/nrepl_launcher_test.clj (1)
29-31
: LGTM! Excellent regression test for issue #109.The FlowStorm test case correctly validates that output containing "reporter__4628" (digits embedded in identifiers) returns nil instead of extracting "4628" as a port. The inline comment clearly documents the purpose.
src/clojure_mcp/nrepl_launcher.clj (1)
18-21
: Excellent improvements to prevent FlowStorm false positives.The updated patterns successfully address issue #109 by:
- Replacing unbounded
[^\d]*
with bounded\s{1,10}
to prevent matching "reporter__4628"- Adding word boundaries
\b
to prevent substring matches- Using case-insensitive matching
(?i)
for flexibility- Restricting patterns 2-4 to 4-5 digit ports (valid range: 1024-65535)
The changes correctly prevent matching:
- "flow_storm.jobs$run_jobs$mem_reporter__4628@..." (no "port" followed by whitespace)
- "Started batch job 45000" (requires "Started on", not "Started batch")
- "Listening to 30000 events" (requires "Listening on", not "Listening to")
Per CodeRabbit review suggestion, add \b before 'port' to prevent edge cases like 'support 12345' or 'transport 12345', and restrict to 4-5 digits for consistency with other patterns.
Thank you @bhauman , this is great! Appreciate the quick fix! |
This PR improves upon PR #110 by refining the port parsing regex patterns to be both stricter (preventing false positives) and more flexible (supporting case-insensitive matching).
Changes
(?i)
\s{1,10}
instead of unbounded\s+
or.*?
for predictability:\s?
\b
to port pattern to prevent false matchesFixes #109
The original issue reported that FlowStorm's output:
Was incorrectly matching the old regex pattern and extracting
4628
as the port number.Final Regex Patterns
What These Patterns Don't Match
flow_storm.jobs$run_jobs$mem_reporter__4628@348137e8...
(the original issue)Started batch job 45000
Listening to 30000 events
These patterns are stricter than the original code while maintaining backward compatibility with all existing test cases.
Co-authored-by: gveres [email protected]
Summary by CodeRabbit
Bug Fixes
Tests