Skip to content

Conversation

bhauman
Copy link
Owner

@bhauman bhauman commented Oct 5, 2025

This PR improves upon PR #110 by refining the port parsing regex patterns to be both stricter (preventing false positives) and more flexible (supporting case-insensitive matching).

Changes

  • Add case-insensitive matching to port, Started, and Listening patterns using (?i)
  • Use bounded whitespace \s{1,10} instead of unbounded \s+ or .*? for predictability
  • Add optional space after colon in port number pattern :\s?
  • Add word boundary \b to port pattern to prevent false matches
  • Add regression test for FlowStorm output to ensure it doesn't match

Fixes #109

The original issue reported that FlowStorm's output:

flow_storm.jobs$run_jobs$mem_reporter__4628@348137e8 function scheduled every 1000 millis

Was incorrectly matching the old regex pattern and extracting 4628 as the port number.

Final Regex Patterns

#"(?i)port\s{1,10}(\d+)\b"              ; Matches: "port 12345", "PORT 12345"
#":\s?(\d{4,5})\b"                      ; Matches: ":12345", ": 12345"
#"(?i)Started on\s{1,10}(\d{4,5})\b"    ; Matches: "Started on 12345"
#"(?i)Listening on\s{1,10}(\d{4,5})\b"  ; Matches: "Listening on 12345"

What These Patterns Don't Match

  • flow_storm.jobs$run_jobs$mem_reporter__4628@348137e8... (the original issue)
  • Started batch job 45000
  • Listening to 30000 events

These patterns are stricter than the original code while maintaining backward compatibility with all existing test cases.

Co-authored-by: gveres [email protected]

Summary by CodeRabbit

  • Bug Fixes

    • Streamlined port-detection logic to reduce false positives and improve accuracy across varied startup logs; behavior unchanged for valid outputs.
    • Now explicitly ignores FlowStorm-style messages to avoid reporting incorrect ports.
  • Tests

    • Added a negative test for FlowStorm output and expanded parsing/range checks to improve reliability and coverage.

…tespace

- Add case-insensitive matching to port, Started, and Listening patterns
- Use bounded whitespace (\s{1,10}) instead of unbounded (\s+) for predictability
- Add optional space after colon in port number pattern (:\s?)
- Add word boundary (\b) to port pattern to prevent false matches
- Add regression test for FlowStorm output (issue #109)

These changes maintain backward compatibility with all existing nREPL
output formats while preventing false positives from FlowStorm's
'mem_reporter__4628' output that was matching the old regex.

Co-authored-by: gveres <[email protected]>
Copy link

coderabbitai bot commented Oct 5, 2025

Walkthrough

Replaces broad regex patterns used to extract the nREPL port with a concise set of stricter patterns and adds a test ensuring FlowStorm output does not produce a false-positive port. No control-flow, API, or error-handling changes.

Changes

Cohort / File(s) Summary of changes
nREPL port parsing regexes
src/clojure_mcp/nrepl_launcher.clj
Replaced multiple broad regexes with a focused set: (?i)\bport\s{1,10}(\d{4,5})\b, :\s?(\d{4,5})\b, (?i)Started on\s{1,10}(\d{4,5})\b, (?i)Listening on\s{1,10}(\d{4,5})\b. Trimming, matching order, parsing, and validation logic retained.
Tests for FlowStorm output
test/clojure_mcp/nrepl_launcher_test.clj
Added a test case asserting parse-port-from-output returns nil for FlowStorm job-scheduler-like output (addresses false positive matching documented in issue #109).

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Poem

A rabbit peeks where servers start,
No more chasing fragments of a part,
With tidy patterns, hops are true,
The port appears — a clearer view,
FlowStorm calmed, the tunnel through. 🐇

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title succinctly describes the main change by indicating that the port parsing patterns have been improved specifically to avoid false positives from FlowStorm output, aligning directly with the core purpose of the changeset.
Linked Issues Check ✅ Passed The pull request addresses issue #109 by narrowing and making the regex patterns case-insensitive, adding word boundaries and bounded whitespace, and including a regression test to prevent FlowStorm false positives, thus fully meeting the linked issue’s coding requirements.
Out of Scope Changes Check ✅ Passed All modifications are confined to regex adjustments in nrepl_launcher.clj and the addition of a targeted regression test, with no unrelated or extraneous changes beyond the scope defined by the linked issue.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch improved-port-parsing-patterns

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b6ab25e and 90ea4a1.

📒 Files selected for processing (1)
  • src/clojure_mcp/nrepl_launcher.clj (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/clojure_mcp/nrepl_launcher.clj

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/clojure_mcp/nrepl_launcher.clj (1)

18-21: Optional: Consider adding word boundary before "port" for consistency.

While the current pattern #"(?i)port\s{1,10}(\d+)\b" correctly prevents the FlowStorm issue, adding a leading word boundary (\bport) would provide additional strictness against edge cases like "support 12345" or "transport 12345". However, the current pattern is sufficient for realistic nREPL output.

Additionally, note the intentional design choice: pattern 1 uses (\d+) (any digits) while patterns 2-4 use (\d{4,5}) (4-5 digits). This allows pattern 1 to capture ports in formats like "port: 8080" before falling back to the colon pattern, relying on the range validation at lines 28-29 to filter invalid ports.

If you prefer maximum strictness and consistency, consider this diff:

-    (let [patterns [#"(?i)port\s{1,10}(\d+)\b" ; port 12345
+    (let [patterns [#"(?i)\bport\s{1,10}(\d{4,5})\b" ; port 12345
                     #":\s?(\d{4,5})\b" ; :12345 or : 12345 (4-5 digit ports)
                     #"(?i)Started on\s{1,10}(\d{4,5})\b" ; Started on 12345
                     #"(?i)Listening on\s{1,10}(\d{4,5})\b"] ; Listening on 12345

However, verify that all existing test cases (especially "port 123" and "port 999999" at test lines 27-28) still pass with this change, as they rely on the range validation to filter invalid ports.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 19c071b and b6ab25e.

📒 Files selected for processing (2)
  • src/clojure_mcp/nrepl_launcher.clj (1 hunks)
  • test/clojure_mcp/nrepl_launcher_test.clj (1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{clj,cljc}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{clj,cljc}: Use kebab-case for vars/functions; end predicates with ? (e.g., is-top-level-form?)
Use try/catch with specific exception handling; use atom for tracking errors
Use 2-space indentation and maintain whitespace in edited forms
Align namespaces with directory structure (e.g., clojure-mcp.repl-tools)

Files:

  • test/clojure_mcp/nrepl_launcher_test.clj
  • src/clojure_mcp/nrepl_launcher.clj
test/**/*.{clj,cljc}

📄 CodeRabbit inference engine (CLAUDE.md)

Use deftest with descriptive names; testing for subsections; is for assertions in tests

Files:

  • test/clojure_mcp/nrepl_launcher_test.clj
src/**/*.{clj,cljc}

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.{clj,cljc}: Use :require with ns aliases for imports (e.g., [clojure.string :as string])
Include clear tool :description for LLM guidance
Validate inputs and provide helpful error messages in MCP tools
Return structured data with both result and error status in MCP tools
Maintain atom-based state for consistent service access in MCP tools

Files:

  • src/clojure_mcp/nrepl_launcher.clj
🧠 Learnings (1)
📚 Learning: 2025-08-18T00:39:24.837Z
Learnt from: hugoduncan
PR: bhauman/clojure-mcp#86
File: src/clojure_mcp/subprocess.clj:0-0
Timestamp: 2025-08-18T00:39:24.837Z
Learning: In the clojure-mcp project, the user prefers to only parse stdout for nREPL port discovery in the subprocess.clj module, and explicitly does not want to parse stderr for port information.

Applied to files:

  • src/clojure_mcp/nrepl_launcher.clj
🔇 Additional comments (2)
test/clojure_mcp/nrepl_launcher_test.clj (1)

29-31: LGTM! Excellent regression test for issue #109.

The FlowStorm test case correctly validates that output containing "reporter__4628" (digits embedded in identifiers) returns nil instead of extracting "4628" as a port. The inline comment clearly documents the purpose.

src/clojure_mcp/nrepl_launcher.clj (1)

18-21: Excellent improvements to prevent FlowStorm false positives.

The updated patterns successfully address issue #109 by:

  • Replacing unbounded [^\d]* with bounded \s{1,10} to prevent matching "reporter__4628"
  • Adding word boundaries \b to prevent substring matches
  • Using case-insensitive matching (?i) for flexibility
  • Restricting patterns 2-4 to 4-5 digit ports (valid range: 1024-65535)

The changes correctly prevent matching:

  • "flow_storm.jobs$run_jobs$mem_reporter__4628@..." (no "port" followed by whitespace)
  • "Started batch job 45000" (requires "Started on", not "Started batch")
  • "Listening to 30000 events" (requires "Listening on", not "Listening to")

Per CodeRabbit review suggestion, add \b before 'port' to prevent
edge cases like 'support 12345' or 'transport 12345', and restrict
to 4-5 digits for consistency with other patterns.
@bhauman bhauman merged commit 9cd4df1 into main Oct 5, 2025
2 checks passed
@gveres
Copy link
Contributor

gveres commented Oct 5, 2025

Thank you @bhauman , this is great! Appreciate the quick fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

When clojure-mcp launches a JVM with FlowStorm enabled (via :start-nrepl-cmd), nREPL port parsing breaks
2 participants