ROB-2161 improved truncation methodology #984

nherment · 2025-09-19T14:02:06Z

No description provided.

coderabbitai · 2025-09-19T14:02:14Z

Walkthrough

Adds StructuredToolResult.llm_data and per-tool token accounting (ToolCallResult.size); integrates hierarchical Prometheus and New Relic metric compression into tool flows; updates context-window limiter to use per-call sizes; and introduces tests, an HTTP traffic generator fixture, and a Kubernetes manifest for end-to-end testing.

Changes

Cohort / File(s)	Summary
Tool result model `holmes/core/tools.py`	Add public field `llm_data: Optional[str] = None` to `StructuredToolResult`.
Result formatting logic `holmes/core/models.py`	In `format_tool_result_data`, if `tool_result.llm_data` exists and `tool_result.data` is a JSON string containing both `"random_key"` and `"data"`, parse and shallow-copy the object, replace inner `"data"` with `llm_data`, set `llm_data=None`, swallow exceptions, then proceed with existing string/non-string handling.
Per-tool-call token accounting `holmes/core/tool_calling_llm.py`	Construct `ToolCallResult`, build message representation, count tokens via LLM, set `ToolCallResult.size`, include `token_count` in tool-call logs, and return the enriched result. `ToolCallResult` gains public `size`.
Context window limiter `holmes/core/tools_utils/tool_context_window_limiter.py`	Lazily compute `tool_call_result.size` when unset, use `size` for threshold/percentage calculations and error reporting, pass `size` to Sentry metadata, and clear `size` after error handling.
Prometheus compression module (new) `holmes/plugins/toolsets/prometheus/data_compression.py`	New module with `CompressedMetric`, `Group`, `PreFilteredMetrics`, conversion/grouping/prefiltering, and rendering/summarization helpers (`format_labels`, `format_data`, `compact_metrics`, etc.).
Prometheus toolset integration `holmes/plugins/toolsets/prometheus/prometheus.py`	Add config fields `compress_range_metrics` and `compress_range_metrics_minimum_ratio`; convert range results to compressed metrics, compute original vs compressed sizes and ratio; set `StructuredToolResult.llm_data` to compressed payload when beneficial and attach `compression_info` and `raw_data`; otherwise keep original data and produce summaries for overly-large responses.
New Relic metrics compression `holmes/plugins/toolsets/newrelic/newrelic.py`	Add `compact_metrics_data` flow, change `to_prometheus_records` to return `PromResponse`, attach `llm_data` when compression is beneficial, and add config fields `compact_metrics` / `compact_metrics_minimum_ratio`.
Tests for compression `tests/plugins/toolsets/prometheus/test_data_compression.py`, `.../raw_prometheus_data.json`	New unit tests for label-frequency detection, grouping, formatting, and realistic summarization/ratio assertions exercising the compression helpers.
LLM test fixture & toolset config `tests/llm/fixtures/test_ask_holmes/160_http_404_monitoring/test_case.yaml`, `.../toolsets.yaml`	New test case describing periodic 404s, setup/teardown steps, and toolset fixture pointing Prometheus to `http://localhost:9160`.
Traffic generator & K8s manifest (fixtures) `tests/llm/fixtures/test_ask_holmes/160_http_404_monitoring/http_traffic_generator.py`, `.../manifest.yaml`	New high-cardinality HTTP traffic generator module and Kubernetes manifest (namespace, secret embedding script, deployments, Service, ServiceMonitor, traffic-simulator) for end-to-end testing.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant Tool as PrometheusTool
  participant Prom as Prometheus API
  participant DC as DataCompression
  participant FM as Formatter
  Note over Tool,DC: Range query execution with optional compression
  User->>Tool: ExecuteRangeQuery(params)
  Tool->>Prom: HTTP range query
  Prom-->>Tool: Raw JSON result
  Tool->>DC: Convert raw -> CompressedMetric[]
  DC-->>Tool: Compressed metrics + rendered payload
  Tool->>Tool: Compute original_size, compressed_size, ratio
  alt Compression beneficial
    Tool->>FM: Use compressed payload as llm_data
    FM-->>Tool: llm_data string
    Note right of Tool: Set StructuredToolResult.llm_data
  else Not beneficial
    Note right of Tool: Keep original data, attach compression_info
  end
  opt Response too large
    Tool->>FM: create_data_summary_for_large_result(...)
    FM-->>Tool: Summary text
  end
  Tool-->>User: StructuredToolResult { data, llm_data?, compression_info? }

sequenceDiagram
  autonumber
  participant Runner
  participant Tool as Tool Runner
  participant LLM as LLM Client
  participant Log as Logger
  Note over Tool,LLM: Per-tool-call token counting
  Runner->>Tool: Invoke tool
  Tool->>Tool: Build ToolCallResult
  Tool->>LLM: as_tool_call_message -> count_tokens()
  LLM-->>Tool: token_count
  Tool->>Tool: Set ToolCallResult.size = token_count
  Tool->>Log: Log metadata { token_count: size, ... }
  Tool-->>Runner: ToolCallResult(size, ...)

sequenceDiagram
  autonumber
  participant Limiter as ContextWindowLimiter
  participant Res as ToolCallResult
  participant LLM as LLM Client
  Note over Limiter,Res: Size-based window check with lazy init
  Limiter->>Res: Read size
  alt size unset
    Limiter->>LLM: count_tokens(as_tool_call_message)
    LLM-->>Limiter: token_count
    Limiter->>Res: Set size = token_count
  end
  alt size > max_allowed
    Limiter-->>Limiter: Reject / truncate per policy
  else
    Limiter-->>Limiter: Accept
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

improve prometheus integration #975 — Related Prometheus toolset changes and range-query handling overlap with compression/llm_data integration.
ensure token_count is added to the final message #963 — Overlaps per-tool LLM token-counting and logging changes in holmes/core/tool_calling_llm.py.
ROB-1933 tool approval workflow #919 — Touches format_tool_result_data and serialization surfaces; overlaps llm_data-aware formatting logic.

Suggested reviewers

moshemorad
aantn

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings, 1 inconclusive)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The pull request description is completely missing and therefore provides no context or summary for the extensive changes, which span multiple modules and introduce significant new functionality. Without any description, reviewers cannot quickly grasp the intent, scope, or objectives of this work. Including a high-level summary of the objectives and key updates would greatly improve the review process.	Please add a descriptive pull request description that outlines the goals of this change, such as improved LLM data handling via JSON parsing, token usage tracking, and the addition of Prometheus/NewRelic compression features. A clear summary will guide reviewers and ensure alignment on the intended enhancements.
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Title Check	❓ Inconclusive	The title “ROB-2161 improved truncation methodology” is too generic and does not clearly reflect the extensive changes in this pull request, which include JSON parsing for LLM data, token usage tracking, and new data compression pipelines for Prometheus and NewRelic integrations. It fails to convey the primary enhancement or the scope of the work to a reviewer scanning the history. A more specific title highlighting the core improvements in LLM data handling and truncation logic would provide clearer context.	Please rename the pull request to summarize the main change succinctly, for example “Enhance LLM response truncation with JSON llm_data parsing, token counting, and compression utilities.” This will help reviewers immediately understand the key improvements without needing to inspect the full diff.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch ROB-2161_improved_truncation_methodology

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (21)

holmes/core/tools.py (1)
78-84: Consider documenting the llm_data field's purpose and usage.

The new llm_data field has been added without documentation. While its integration with the Prometheus compression features is clear from the context, future maintainers would benefit from a clear docstring explaining:

When to use llm_data vs data

The expected format/content

How it relates to token optimization
 class StructuredToolResult(BaseModel):
     schema_version: str = "robusta:v1.0.0"
     status: StructuredToolResultStatus
     error: Optional[str] = None
     return_code: Optional[int] = None
     data: Optional[Any] = None
-    llm_data: Optional[str] = None
+    llm_data: Optional[str] = Field(
+        None, 
+        description="Optimized representation of data for LLM consumption, typically compressed or formatted to reduce token usage"
+    )
     url: Optional[str] = None
holmes/core/tools_utils/tool_context_window_limiter.py (1)
23-27: Consider edge case handling for zero-size tool results.

When tool_call_result.size is 0, the relative percentage calculation will result in undefined behavior (0/0). While unlikely in practice, it's worth adding a guard.
         if tool_call_result.size > max_tokens_allowed:
+            if tool_call_result.size == 0:
+                relative_pct = 100.0
+            else:
                 relative_pct = (
                     (tool_call_result.size - max_tokens_allowed) / tool_call_result.size
                 ) * 100
holmes/core/tool_calling_llm.py (1)
707-707: Verify consistent naming: token_count vs size in metadata.

The metadata uses token_count while the field is named size. Consider using consistent naming to avoid confusion.
             metadata={
                 "status": tool_call_result.result.status,
                 "description": tool_call_result.description,
-                "token_count": tool_call_result.size,
+                "size": tool_call_result.size,  # or rename both to token_count for clarity
             },
holmes/core/models.py (2)

71-74: Consider the cascading effect of the prioritization change.

The change from checking data first to checking llm_data first means that when both fields are populated, the LLM will now receive the compressed/optimized version instead of the raw data. This aligns with the PR's goal of improved truncation methodology, but ensure this doesn't affect debugging or scenarios where raw data visibility is important.

Consider adding logging or metrics to track when llm_data is used over data to monitor the effectiveness of the compression strategy and identify any unexpected behaviors in production.

69-85: llm_data prioritization is intentional — add a unit test and confirm callers

Evidence: holmes/plugins/toolsets/prometheus/prometheus.py constructs StructuredToolResult with both data and llm_data, so the new behavior will return llm_data for that plugin.

Tests: tests/test_structured_toolcall_result.py exercises format_tool_result_data but contains no test for the llm_data-present case — add a unit test asserting llm_data is used when present.

Action: add the test and verify callers that expect the raw data (e.g., holmes/core/models.py → as_tool_call_message) still behave as intended or document the prioritization.
holmes/plugins/toolsets/prometheus/prometheus.py (1)
1418-1421: Consider using a more specific exception type.

The broad exception catch could mask unexpected errors. Consider catching specific exceptions that might occur during compression (e.g., ValidationError from Pydantic, AttributeError for missing fields).
-                    except Exception as e:
+                    except (ValidationError, AttributeError, KeyError) as e:
                        logging.warning(
                            f"Failed to compress data: {e}, the original data will be used"
                        )
tests/plugins/toolsets/prometheus/test_data_compression.py (2)
303-307: Consider removing debug print statements from tests.

Debug print statements should be removed or converted to proper assertions for production test code. Consider using pytest's -s flag when debugging is needed.
-        print("**** EXPECTED:")
-        print(format_compressed_metrics(expected_metrics))
-        print("**** ACTUAL:")
-        print(format_compressed_metrics(compressed_metrics))
         assert expected_metrics == compressed_metrics
396-402: Replace debug prints with proper assertions.

Similar to the previous comment, debug prints should be removed from test code.
-        print("**** EXPECTED:")
-        print(expected_str)
-        print("**** ACTUAL:")
-        print(actual_str)
-
         assert expected_str.strip() in actual_str.strip()
tests/llm/fixtures/test_ask_holmes/160_http_404_monitoring/http_traffic_generator.py (5)
255-257: Use logging.exception for better error context.

When catching exceptions in error handlers, use logging.exception to preserve the stack trace.
         except Exception as e:
-            logger.error(f"Error handling request: {e}")
+            logger.exception("Error handling request")
336-336: Rename unused loop variable.

The loop variable i is not used within the loop body.
-        for i in range(random.randint(3, 8)):  # 3-8 instances per region
+        for _ in range(random.randint(3, 8)):  # 3-8 instances per region
570-572: Rename unused loop variable in endpoint generation.

Similar to the previous comment, rename unused loop variable for clarity.
-            for i in range(10):
+            for _ in range(10):
                 if "users" in base or "products" in base or "orders" in base:
                     self.endpoints.append(f"{base}/{random.randint(1000, 9999)}")
698-702: Use logging.exception in thread error handlers.

For better debugging, use logging.exception to preserve stack traces in thread error handlers.
                 except Exception as e:
-                    logger.error(
-                        f"Error in background traffic generation thread {thread_id}: {e}"
-                    )
+                    logger.exception(
+                        f"Error in background traffic generation thread {thread_id}"
+                    )
                     time.sleep(1)
795-795: Document the binding to all interfaces.

The server binds to 0.0.0.0 which is necessary for containerized environments. Consider adding a comment to clarify this is intentional for the test environment.
     # Start HTTP server
+    # Bind to all interfaces (0.0.0.0) to allow access from other containers
     server_address = ("0.0.0.0", 8080)
holmes/plugins/toolsets/prometheus/data_compression.py (5)
42-43: Define custom exception types for better error handling.

Using generic Exception makes error handling less precise. Consider creating custom exception classes.
+class LabelError(Exception):
+    """Raised when label operations fail."""
+    pass
+
 def format_labels(
     labels: set[tuple[str, Any]], section_name: str, line_prefix: str = ""
 ) -> list[str]:
     lines = []
     if labels:
         sorted_labels = sorted(
             labels, key=lambda t: t[0]
         )  # keep label list stable in the output by sorting them by key
         if len(sorted_labels) <= 10:
             labels_dict = {}
             for k, v in sorted_labels:
                 labels_dict[k] = v
             lines.append(f"{line_prefix}{section_name} {json.dumps(labels_dict)}")
         else:
             lines.append(line_prefix + section_name)
             for k, v in sorted_labels:
                 lines.append(f"{line_prefix}{INDENT_SPACES}- {str(k)}: {str(v)}")
     else:
-        raise Exception("No label")
+        raise LabelError("No label")
     return lines
137-138: Use custom exception for data validation errors.

Similar to the previous comment, use a custom exception type.
+class DataFormatError(Exception):
+    """Raised when data formatting fails."""
+    pass
+
 def format_zero_values_data(
     data: Union[Group, CompressedMetric], line_prefix: str = ""
 ) -> str:
     # ... existing code ...
     else:
-        raise Exception("Data has no metrics and is not a CompressedMetric")
+        raise DataFormatError("Data has no metrics and is not a CompressedMetric")
217-218: Remove unused parameter and fix mutable default.

The logging_prefix parameter is unused and should be removed. Also, mutable default arguments should be avoided.
 def group_metrics(
     metrics_to_process: list[CompressedMetric],
-    globally_common_labels: set[tuple[str, Any]] = set(),
-    logging_prefix: str = "",
+    globally_common_labels: Optional[set[tuple[str, Any]]] = None,
 ) -> list[Union[Group, CompressedMetric]]:
+    if globally_common_labels is None:
+        globally_common_labels = set()
     global idx
     idx = idx + 1
184-185: Consider thread-safe counter implementation.

The global idx counter could cause issues in multi-threaded environments. Consider using a thread-safe implementation or removing it if it's only for debugging.
-idx = 0
+import threading
+_counter_lock = threading.Lock()
+_idx = 0
+
+def _get_next_idx():
+    global _idx
+    with _counter_lock:
+        _idx += 1
+        return _idx
Then in group_metrics:
-    global idx
-    idx = idx + 1
+    current_idx = _get_next_idx()  # Use if needed for debugging
293-297: Clarify the purpose of set_default function.

This function appears to be for JSON serialization but lacks documentation and has a generic name. Consider renaming and adding documentation.
-def set_default(obj):
+def json_set_encoder(obj):
+    """Custom JSON encoder for set objects.
+    
+    Used for JSON serialization of sets in metric data.
+    
+    Args:
+        obj: Object to encode
+        
+    Returns:
+        List representation of set
+        
+    Raises:
+        TypeError: If object is not a set
+    """
     if isinstance(obj, set):
         return list(obj)
     raise TypeError
tests/llm/fixtures/test_ask_holmes/160_http_404_monitoring/manifest.yaml (3)
583-644: Address security configurations for production-like testing.

The deployment lacks security context settings, which static analysis correctly flags. Since this is a test fixture simulating production conditions, consider adding security contexts for more realistic testing.
     spec:
+      securityContext:
+        runAsNonRoot: true
+        runAsUser: 1000
+        runAsGroup: 3000
+        fsGroup: 2000
       containers:
       - name: http-service
         image: python:3.9-slim
+        securityContext:
+          allowPrivilegeEscalation: false
+          readOnlyRootFilesystem: false
+          capabilities:
+            drop:
+            - ALL
         command: ["sh", "-c"]
665-681: Consider monitoring interval for high-volume scenario.

The 2-second scrape interval is aggressive for a high-cardinality metrics scenario. This could generate substantial data volume that might impact the test's ability to demonstrate context window issues.
   endpoints:
   - port: metrics
-    interval: 2s
+    interval: 15s
     path: /metrics
685-757: Apply consistent security configurations.

Apply the same security context improvements to the traffic-simulator deployment for consistency with production-like testing conditions.
     spec:
+      securityContext:
+        runAsNonRoot: true
+        runAsUser: 1000
+        runAsGroup: 3000
+        fsGroup: 2000
       containers:
       - name: traffic-simulator
         image: curlimages/curl:latest
+        securityContext:
+          allowPrivilegeEscalation: false
+          readOnlyRootFilesystem: true
+          capabilities:
+            drop:
+            - ALL
         command: ["sh", "-c"]

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4c18d7b and 7f8f31d.

📒 Files selected for processing (11)

holmes/core/models.py (1 hunks)
holmes/core/tool_calling_llm.py (3 hunks)
holmes/core/tools.py (1 hunks)
holmes/core/tools_utils/tool_context_window_limiter.py (1 hunks)
holmes/plugins/toolsets/prometheus/data_compression.py (1 hunks)
holmes/plugins/toolsets/prometheus/prometheus.py (3 hunks)
tests/llm/fixtures/test_ask_holmes/160_http_404_monitoring/http_traffic_generator.py (1 hunks)
tests/llm/fixtures/test_ask_holmes/160_http_404_monitoring/manifest.yaml (1 hunks)
tests/llm/fixtures/test_ask_holmes/160_http_404_monitoring/test_case.yaml (1 hunks)
tests/llm/fixtures/test_ask_holmes/160_http_404_monitoring/toolsets.yaml (1 hunks)
tests/plugins/toolsets/prometheus/test_data_compression.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (12)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Always place Python imports at the top of the file, not inside functions or methods
All Python code must include type hints (mypy enforced)

Files:

holmes/core/tools.py
holmes/core/models.py
holmes/core/tools_utils/tool_context_window_limiter.py
tests/plugins/toolsets/prometheus/test_data_compression.py
holmes/plugins/toolsets/prometheus/prometheus.py
holmes/plugins/toolsets/prometheus/data_compression.py
holmes/core/tool_calling_llm.py
tests/llm/fixtures/test_ask_holmes/160_http_404_monitoring/http_traffic_generator.py

holmes/{core,plugins}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

All tools must return detailed error messages from underlying APIs, including executed command/query, time ranges/parameters, and full API error response; 'no data' responses must specify what was searched and where

Files:

holmes/core/tools.py
holmes/core/models.py
holmes/core/tools_utils/tool_context_window_limiter.py
holmes/plugins/toolsets/prometheus/prometheus.py
holmes/plugins/toolsets/prometheus/data_compression.py
holmes/core/tool_calling_llm.py

tests/llm/**/toolsets.yaml

📄 CodeRabbit inference engine (CLAUDE.md)

In evals, all toolset-specific configuration must be under a top-level 'config' field in toolsets.yaml; do not place toolset config directly at the top level

Files:

tests/llm/fixtures/test_ask_holmes/160_http_404_monitoring/toolsets.yaml

{holmes/plugins/toolsets/**/*.yaml,tests/llm/**/toolsets.yaml}

📄 CodeRabbit inference engine (CLAUDE.md)

Valid top-level fields for toolset YAMLs are limited to: enabled, name, description, additional_instructions, prerequisites, tools, docs_url, icon_url, installation_instructions, config, url (MCP only)

Files:

tests/llm/fixtures/test_ask_holmes/160_http_404_monitoring/toolsets.yaml

tests/llm/fixtures/**/*.yaml

📄 CodeRabbit inference engine (CLAUDE.md)

tests/llm/fixtures/**/*.yaml: Each LLM test must use a dedicated Kubernetes namespace 'app-' to prevent conflicts when running in parallel
All pod names in evals must be unique across tests; never reuse pod names

Files:

tests/llm/fixtures/test_ask_holmes/160_http_404_monitoring/toolsets.yaml
tests/llm/fixtures/test_ask_holmes/160_http_404_monitoring/test_case.yaml
tests/llm/fixtures/test_ask_holmes/160_http_404_monitoring/manifest.yaml

tests/llm/fixtures/**/*.{yaml,md,log,txt}

📄 CodeRabbit inference engine (CLAUDE.md)

Eval artifacts must be realistic and neutral: no obvious/fake logs, no filenames or resource names that hint at the problem, and no messages revealing simulation

Files:

tests/llm/fixtures/test_ask_holmes/160_http_404_monitoring/toolsets.yaml
tests/llm/fixtures/test_ask_holmes/160_http_404_monitoring/test_case.yaml
tests/llm/fixtures/test_ask_holmes/160_http_404_monitoring/manifest.yaml

tests/llm/**/*.yaml

📄 CodeRabbit inference engine (CLAUDE.md)

Always use Kubernetes Secrets for scripts in evals; do not embed scripts inline in manifests or ConfigMaps

Files:

tests/llm/fixtures/test_ask_holmes/160_http_404_monitoring/toolsets.yaml
tests/llm/fixtures/test_ask_holmes/160_http_404_monitoring/test_case.yaml
tests/llm/fixtures/test_ask_holmes/160_http_404_monitoring/manifest.yaml

tests/**

📄 CodeRabbit inference engine (CLAUDE.md)

Test layout should mirror the source structure under tests/

Files:

tests/llm/fixtures/test_ask_holmes/160_http_404_monitoring/toolsets.yaml
tests/plugins/toolsets/prometheus/test_data_compression.py
tests/llm/fixtures/test_ask_holmes/160_http_404_monitoring/test_case.yaml
tests/llm/fixtures/test_ask_holmes/160_http_404_monitoring/manifest.yaml
tests/llm/fixtures/test_ask_holmes/160_http_404_monitoring/http_traffic_generator.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Only use pytest markers that are declared in pyproject.toml; never introduce undeclared markers/tags

Files:

tests/plugins/toolsets/prometheus/test_data_compression.py
tests/llm/fixtures/test_ask_holmes/160_http_404_monitoring/http_traffic_generator.py

tests/llm/**/test_case.yaml

📄 CodeRabbit inference engine (CLAUDE.md)

Do not include toolset configuration in test_case.yaml; if runbooks are specified, entries must point to .md files in the same directory

Files:

tests/llm/fixtures/test_ask_holmes/160_http_404_monitoring/test_case.yaml

holmes/plugins/toolsets/**

📄 CodeRabbit inference engine (CLAUDE.md)

Toolsets must be organized under holmes/plugins/toolsets/ as either {name}.yaml files or directories

Files:

holmes/plugins/toolsets/prometheus/prometheus.py
holmes/plugins/toolsets/prometheus/data_compression.py

tests/llm/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Always run LLM tests with RUN_LIVE=true to ensure results match real-world behavior

Files:

tests/llm/fixtures/test_ask_holmes/160_http_404_monitoring/http_traffic_generator.py

🧠 Learnings (2)

📚 Learning: 2025-09-15T07:09:16.052Z

Learnt from: CR
PR: robusta-dev/holmesgpt#0
File: CLAUDE.md:0-0
Timestamp: 2025-09-15T07:09:16.052Z
Learning: Applies to {holmes/plugins/toolsets/**/*.yaml,tests/llm/**/toolsets.yaml} : Valid top-level fields for toolset YAMLs are limited to: enabled, name, description, additional_instructions, prerequisites, tools, docs_url, icon_url, installation_instructions, config, url (MCP only)

Applied to files:

tests/llm/fixtures/test_ask_holmes/160_http_404_monitoring/toolsets.yaml

📚 Learning: 2025-08-05T00:42:23.792Z

Learnt from: vishiy
PR: robusta-dev/holmesgpt#782
File: config.example.yaml:31-49
Timestamp: 2025-08-05T00:42:23.792Z
Learning: In robusta-dev/holmesgpt config.example.yaml, the azuremonitorlogs toolset configuration shows "enabled: true" as an example of how to enable the toolset, not as a default setting. The toolset is disabled by default and requires explicit enablement in user configurations.

Applied to files:

tests/llm/fixtures/test_ask_holmes/160_http_404_monitoring/toolsets.yaml

🧬 Code graph analysis (4)

holmes/core/tools_utils/tool_context_window_limiter.py (3)

holmes/core/models.py (1)

as_tool_call_message (30-42)

holmes/core/tools.py (1)

StructuredToolResultStatus (51-75)

holmes/utils/sentry_helper.py (1)

capture_toolcall_contains_too_many_tokens (22-34)

tests/plugins/toolsets/prometheus/test_data_compression.py (1)

holmes/plugins/toolsets/prometheus/data_compression.py (10)

CompressedMetric (14-16)

Group (19-21)

RawMetric (7-11)

find_most_common_label (319-357)

format_compressed_metrics (152-160)

format_data (49-84)

group_metrics (215-290)

raw_metric_to_compressed_metric (163-173)

summarize_metrics (299-316)

remove_labels (176-181)

holmes/plugins/toolsets/prometheus/prometheus.py (1)

holmes/plugins/toolsets/prometheus/data_compression.py (4)

RawMetric (7-11)

raw_metric_to_compressed_metric (163-173)

summarize_metrics (299-316)

remove_labels (176-181)

holmes/core/tool_calling_llm.py (2)

holmes/core/models.py (2)

ToolCallResult (23-66)

as_tool_call_message (30-42)

holmes/core/llm.py (2)

count_tokens_for_message (52-53)

count_tokens_for_message (210-236)

🪛 Ruff (0.13.1)

holmes/plugins/toolsets/prometheus/prometheus.py

1418-1418: Do not catch blind exception: Exception

(BLE001)

holmes/plugins/toolsets/prometheus/data_compression.py

40-40: Use explicit conversion flag

Replace with conversion flag

(RUF010)

40-40: Use explicit conversion flag

Replace with conversion flag

(RUF010)

42-42: Create your own exception

(TRY002)

42-42: Avoid specifying long messages outside the exception class

(TRY003)

57-57: Use explicit conversion flag

Replace with conversion flag

(RUF010)

57-57: Use explicit conversion flag

Replace with conversion flag

(RUF010)

65-67: Avoid specifying long messages outside the exception class

(TRY003)

137-137: Create your own exception

(TRY002)

137-137: Avoid specifying long messages outside the exception class

(TRY003)

217-217: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

218-218: Unused function argument: logging_prefix

(ARG001)

tests/llm/fixtures/test_ask_holmes/160_http_404_monitoring/http_traffic_generator.py

156-158: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

175-175: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

179-179: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

247-247: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

250-250: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

255-255: Do not catch blind exception: Exception

(BLE001)

256-256: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

287-287: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

303-303: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

309-309: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

318-318: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

320-320: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

324-324: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

325-325: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

328-328: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

331-331: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

336-336: Loop control variable i not used within loop body

Rename unused i to _i

(B007)

336-336: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

337-337: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

338-338: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

342-342: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

344-344: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

347-347: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

349-349: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

356-356: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

358-358: Unused method argument: endpoint

(ARG002)

364-364: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

365-365: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

366-366: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

367-367: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

368-370: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

374-374: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

376-376: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

384-384: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

387-387: Unused method argument: endpoint

(ARG002)

394-394: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

403-403: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

404-404: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

407-407: Unused method argument: user_agent

(ARG002)

411-411: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

416-416: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

421-423: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

433-433: Unused method argument: method

(ARG002)

463-463: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

468-468: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

469-469: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

474-474: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

484-484: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

488-488: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

490-490: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

492-492: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

536-536: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

540-540: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

570-570: Loop control variable i not used within loop body

(B007)

572-572: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

604-604: Loop control variable i not used within loop body

(B007)

606-606: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

649-649: Probable insecure usage of temporary file or directory: "/tmp"

(S108)

676-676: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

683-683: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

684-684: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

685-687: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

693-693: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

696-696: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

698-698: Do not catch blind exception: Exception

(BLE001)

699-701: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

733-735: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

737-737: Loop control variable i not used within loop body

Rename unused i to _i

(B007)

741-741: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

747-747: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

753-753: Do not catch blind exception: Exception

(BLE001)

754-754: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

775-775: Do not catch blind exception: Exception

(BLE001)

776-776: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

795-795: Possible binding to all interfaces

(S104)

🪛 Checkov (3.2.334)

tests/llm/fixtures/test_ask_holmes/160_http_404_monitoring/manifest.yaml

[medium] 583-645: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)

[medium] 583-645: Minimize the admission of root containers

(CKV_K8S_23)

[medium] 685-757: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)

[medium] 685-757: Minimize the admission of root containers

(CKV_K8S_23)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: llm_evals

🔇 Additional comments (11)

tests/llm/fixtures/test_ask_holmes/160_http_404_monitoring/toolsets.yaml (1)

1-5: Verify that toolset configuration follows required structure.

Based on the retrieved learnings, toolset-specific configuration should be under a top-level 'config' field, not directly under the toolset name. The current structure appears correct with prometheus_url properly placed under the config field.

holmes/core/tools_utils/tool_context_window_limiter.py (1)

14-17: LGTM! Lazy initialization of token count is efficient.

The implementation correctly uses lazy initialization to compute the token count only when needed, avoiding unnecessary computation if size is already set.

holmes/core/tool_calling_llm.py (1)

682-695: Good implementation of token tracking for tool calls.

The code correctly:

Creates the ToolCallResult object first

Generates the message representation

Counts tokens for the message

Stores the token count in the size field

This approach ensures consistent token accounting across the system.

tests/llm/fixtures/test_ask_holmes/160_http_404_monitoring/test_case.yaml (1)

13-18: Test namespace naming convention looks good.

The test correctly uses the app-160 namespace pattern required for LLM test fixtures to prevent conflicts during parallel execution. The setup and teardown commands are properly configured.

holmes/plugins/toolsets/prometheus/prometheus.py (2)

1373-1421: Effective implementation of data compression for range queries.

The compression logic is well-structured with proper fallback handling. The 20% compression ratio threshold is a reasonable heuristic to avoid unnecessary transformations.

1429-1453: Clear handling of oversized data with comprehensive summaries.

The summary generation when data exceeds the size limit is well-implemented with detailed debugging information and actionable suggestions.

tests/plugins/toolsets/prometheus/test_data_compression.py (1)

425-426: Good compression ratio validation.

The test verifies that compression achieves at least 69% reduction (ratio < 0.31), which is a meaningful benchmark for the compression effectiveness.

tests/llm/fixtures/test_ask_holmes/160_http_404_monitoring/http_traffic_generator.py (1)

156-158: Random values used for simulation, not security.

The use of random module is acceptable here since this is a test fixture for traffic simulation, not cryptographic purposes. The S311 warning can be safely ignored.

tests/llm/fixtures/test_ask_holmes/160_http_404_monitoring/manifest.yaml (3)

1-5: LGTM: Correctly follows naming convention for test isolation.

The namespace app-160 follows the required pattern app-<testid> to prevent conflicts during parallel test execution.

14-580: LGTM: Comprehensive traffic generator with high cardinality metrics.

The embedded Python script creates a realistic HTTP traffic generator with extensive Prometheus metrics. The implementation includes proper error handling, realistic response patterns, and generates the high-cardinality data needed for testing LLM context window limitations.

Key strengths:

Rich metric labeling across multiple dimensions (region, IP, user agent, service version, etc.)

Realistic traffic patterns with varied response times and error rates

Proper separation of legitimate vs. problematic traffic (192.168.50.100)

Background traffic generation with multiple threads for scale

708-751: Disable or align duplicate traffic generators.

Manifest runs two generators: the shell-based simulator (tests/llm/fixtures/test_ask_holmes/160_http_404_monitoring/manifest.yaml — shell loop around lines 708–751) and the Python TrafficGenerator (secret http_traffic_generator.py — class/start at ~382/554–561; container runs python http_traffic_generator.py at ~603–609).

Both produce background traffic and periodic 404 bursts; running both concurrently duplicates load and can cause unpredictable test behavior — either disable one, or explicitly align endpoints/IP ranges/rates/timing (or add a config flag to select which generator runs).

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

holmes/plugins/toolsets/prometheus/data_compression.py (1)

314-353: Docstring and behavior diverge in find_most_common_label.

The code returns the most frequent label (count > 1), not labels “that appear in ALL label sets”. Align the docstring or change logic.

-    """
-    Find label keys and values that are most common across all label sets.
-    Returns labels that appear in ALL label sets with the same value.
-...
-        label_sets: List of label dictionaries
-...
-        Dictionary of common labels (key -> most common value)
-    """
+    """
+    Find the (key, value) label pair that is most frequent across the given metrics,
+    excluding any labels in ignore_label_set. Returns the label and its match count;
+    if no label appears more than once, returns (None, 0).
+    """

holmes/plugins/toolsets/newrelic/newrelic.py (1)

174-189: Remove noisy print and fix override condition ('result' vs 'results').

Debug print to stdout will spam logs.
PromResponse JSON uses data.result, not data.results; override never triggers.

-            prom_data = self.to_new_relic_records(result, params=enriched_params)
-
-            return_result = prom_data.to_json()
-            print(json.dumps(return_result, indent=2))
-            if len(return_result.get("data", {}).get("results", [])):
-                return_result = result  # type: ignore[assignment]
+            prom_data = self.to_new_relic_records(result, params=enriched_params)
+            return_result = prom_data.to_json()
+            # If PromResponse failed to build (edge cases), fall back to raw result
+            if not isinstance(return_result.get("data", {}).get("result"), list):
+                return_result = result  # type: ignore[assignment]

Also applies to: 180-184

🧹 Nitpick comments (13)

holmes/plugins/toolsets/prometheus/data_compression.py (7)

19-38: Raise specific exceptions and fix typo.

Avoid bare Exception; raise ValueError with context.
Typo: “outpout” → “output”.
Prefer explicit conversion flags in f-strings (RUF010).

-        )  # keep label list stable in the outpout by sorting them by key
+        )  # keep label list stable in the output by sorting them by key
@@
-                lines.append(f"{line_prefix}{INDENT_SPACES}- {str(k)}: {str(v)}")
+                lines.append(f"{line_prefix}{INDENT_SPACES}- {k!s}: {v!s}")
@@
-    else:
-        raise Exception("No label")
+    else:
+        raise ValueError("format_labels called with empty labels")

44-56: Use explicit conversion flags in value rendering.

Small polish for consistency with RUF010.

-            lines.append("values:")
+            lines.append("values:")
             for value in data.values:
-                lines.append(f"{INDENT_SPACES}- {str(value[0])}: {str(value[1])}")
+                lines.append(f"{INDENT_SPACES}- {value[0]!s}: {value[1]!s}")

Also applies to: 50-53

82-90: Avoid bare Exception in format_zero_values_data.

Return ValueError with context instead of Exception.

-    else:
-        raise Exception("Data has no metrics and is not a CompressedMetric")
+    else:
+        raise ValueError("format_zero_values_data received neither Group nor CompressedMetric with metrics")

Also applies to: 131-133

98-119: Avoid complex expressions in f-strings for JSON.

Build the object first, then dump. Easier to read and safer.

-            lines.append(
-                f"{line_prefix}{INDENT_SPACES}- {json.dumps({compress_key:compressed_label_values})}"
-            )
+            compressed = {compress_key: compressed_label_values}
+            lines.append(f"{line_prefix}{INDENT_SPACES}- {json.dumps(compressed)}")

171-177: Annotate return type and avoid shadowing param name.

Return type is None; also shadowing remove_labels arg inside body is confusing.

-def remove_labels(metric: CompressedMetric, remove_labels: set[tuple[str, Any]]):
-    labels: set[tuple[str, Any]] = set()
-    for label in metric.labels:
-        if label not in remove_labels:
-            labels.add(label)
-    metric.labels = labels
+def remove_labels(metric: CompressedMetric, remove_labels: set[tuple[str, Any]]) -> None:
+    kept: set[tuple[str, Any]] = set()
+    for label in metric.labels:
+        if label not in remove_labels:
+            kept.add(label)
+    metric.labels = kept

288-292: Type hints and usage for set_default.

Function is unused and lacks type hints for return. Either remove it or type it and use as a json default.

-def set_default(obj):
-    if isinstance(obj, set):
-        return list(obj)
-    raise TypeError
+from typing import Iterable, Any as _Any
+def set_default(obj: _Any) -> _Any:
+    if isinstance(obj, set):
+        return list(obj)
+    raise TypeError(f"Object of type {type(obj).__name__} is not JSON serializable")

294-312: Threshold heuristic: make 10% comparison robust.

Guard division by zero and use '>=' consistently with intent.

-    if (
-        len(filtered_metrics.metrics_with_only_zero_values)
-        >= len(filtered_metrics.other_metrics) * 0.1
-    ):
+    other_count = len(filtered_metrics.other_metrics)
+    if other_count == 0 or len(filtered_metrics.metrics_with_only_zero_values) >= other_count * 0.1:

holmes/plugins/toolsets/prometheus/prometheus.py (2)

1354-1363: ‘No data’ path should include query/time range details per guidelines.

Enhance error_message to specify what was searched and where.

-                if status == "success" and not result_has_data(data):
+                if status == "success" and not result_has_data(data):
                     status = "Failed"
-                    error_message = (
-                        "The prometheus query returned no result. Is the query correct?"
-                    )
+                    error_message = (
+                        f"No data returned for query='{query}' "
+                        f"range=[{start} → {end}] step={step}. "
+                        f"Verify metric selectors and time window."
+                    )

1428-1428: Catching broad Exception.

Given untrusted payloads, a broad catch is acceptable; consider narrowing to (TypeError, ValueError, ValidationError) and keep exc_info=True.

holmes/plugins/toolsets/newrelic/newrelic.py (4)

246-252: Comment mismatch.

Update the comment to match default (30 → 30%).

-    compress_metrics_minimum_ratio: int = 30  # 20 means 20% size reduction
+    compress_metrics_minimum_ratio: int = 30  # 30 means 30% size reduction

69-73: Typo in parameter description.

“breif” → “brief”.

-                    description="A breif 6 word human understandable description of the query you are running.",
+                    description="A brief 6‑word human‑understandable description of the query you are running.",

136-145: Include query in error paths per guidelines.

When NR credentials missing, the ValueError lacks context. Append qtype/query or params as applicable.

197-240: Log formatting may leak large records.

logging.exception with the entire records may be huge; log only sizes or keys.

-        except Exception:
-            logging.exception(f"Failed to reformat newrelic logs {records}")
+        except Exception:
+            logging.exception("Failed to reformat newrelic logs (count=%d, keys=%s)",
+                              len(records), list(records[0].keys()) if records and isinstance(records[0], dict) else [])

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7f8f31d and 8247dab.

📒 Files selected for processing (6)

holmes/core/models.py (1 hunks)
holmes/plugins/toolsets/newrelic/newrelic.py (6 hunks)
holmes/plugins/toolsets/prometheus/data_compression.py (1 hunks)
holmes/plugins/toolsets/prometheus/prometheus.py (4 hunks)
tests/llm/fixtures/test_ask_holmes/160_http_404_monitoring/test_case.yaml (1 hunks)
tests/plugins/toolsets/prometheus/test_data_compression.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (3)

holmes/core/models.py
tests/plugins/toolsets/prometheus/test_data_compression.py
tests/llm/fixtures/test_ask_holmes/160_http_404_monitoring/test_case.yaml

🧰 Additional context used

📓 Path-based instructions (3)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Always place Python imports at the top of the file, not inside functions or methods
All Python code must include type hints (mypy enforced)

Files:

holmes/plugins/toolsets/newrelic/newrelic.py
holmes/plugins/toolsets/prometheus/data_compression.py
holmes/plugins/toolsets/prometheus/prometheus.py

holmes/plugins/toolsets/**

📄 CodeRabbit inference engine (CLAUDE.md)

Toolsets must be organized under holmes/plugins/toolsets/ as either {name}.yaml files or directories

Files:

holmes/plugins/toolsets/newrelic/newrelic.py
holmes/plugins/toolsets/prometheus/data_compression.py
holmes/plugins/toolsets/prometheus/prometheus.py

holmes/{core,plugins}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

All tools must return detailed error messages from underlying APIs, including executed command/query, time ranges/parameters, and full API error response; 'no data' responses must specify what was searched and where

Files:

holmes/plugins/toolsets/newrelic/newrelic.py
holmes/plugins/toolsets/prometheus/data_compression.py
holmes/plugins/toolsets/prometheus/prometheus.py

🧬 Code graph analysis (3)

holmes/plugins/toolsets/newrelic/newrelic.py (3)

holmes/plugins/toolsets/prometheus/data_compression.py (3)

raw_metric_to_compressed_metric (158-168)

summarize_metrics (294-311)

remove_labels (171-176)

holmes/plugins/toolsets/prometheus/model.py (2)

PromResponse (19-255)

to_json (105-110)

holmes/core/tools.py (3)

StructuredToolResult (78-103)

StructuredToolResultStatus (51-75)

Toolset (521-736)

holmes/plugins/toolsets/prometheus/data_compression.py (1)

holmes/plugins/toolsets/prometheus/model.py (1)

PromSeries (9-11)

holmes/plugins/toolsets/prometheus/prometheus.py (2)

holmes/plugins/toolsets/prometheus/data_compression.py (3)

raw_metric_to_compressed_metric (158-168)

summarize_metrics (294-311)

remove_labels (171-176)

holmes/plugins/toolsets/prometheus/model.py (1)

PromSeries (9-11)

🪛 Ruff (0.13.1)

holmes/plugins/toolsets/newrelic/newrelic.py

113-113: Do not catch blind exception: Exception

(BLE001)

holmes/plugins/toolsets/prometheus/data_compression.py

35-35: Use explicit conversion flag

Replace with conversion flag

(RUF010)

35-35: Use explicit conversion flag

Replace with conversion flag

(RUF010)

37-37: Create your own exception

(TRY002)

37-37: Avoid specifying long messages outside the exception class

(TRY003)

52-52: Use explicit conversion flag

Replace with conversion flag

(RUF010)

52-52: Use explicit conversion flag

Replace with conversion flag

(RUF010)

60-62: Avoid specifying long messages outside the exception class

(TRY003)

132-132: Create your own exception

(TRY002)

132-132: Avoid specifying long messages outside the exception class

(TRY003)

212-212: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

213-213: Unused function argument: logging_prefix

(ARG001)

holmes/plugins/toolsets/prometheus/prometheus.py

1428-1428: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Pre-commit checks
GitHub Check: llm_evals

🔇 Additional comments (1)

holmes/plugins/toolsets/prometheus/prometheus.py (1)

103-105: Config flags for compression look good.

Defaults and semantics are reasonable; ensure docs/jinja template mention these knobs.

holmes/plugins/toolsets/newrelic/newrelic.py

holmes/plugins/toolsets/prometheus/data_compression.py

holmes/plugins/toolsets/prometheus/prometheus.py

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8247dab and e1107a2.

📒 Files selected for processing (2)

holmes/core/models.py (1 hunks)
holmes/plugins/toolsets/newrelic/newrelic.py (6 hunks)

🧰 Additional context used

📓 Path-based instructions (3)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Always place Python imports at the top of the file, not inside functions or methods
All Python code must include type hints (mypy enforced)

Files:

holmes/plugins/toolsets/newrelic/newrelic.py
holmes/core/models.py

holmes/plugins/toolsets/**

📄 CodeRabbit inference engine (CLAUDE.md)

Toolsets must be organized under holmes/plugins/toolsets/ as either {name}.yaml files or directories

Files:

holmes/plugins/toolsets/newrelic/newrelic.py

holmes/{core,plugins}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

All tools must return detailed error messages from underlying APIs, including executed command/query, time ranges/parameters, and full API error response; 'no data' responses must specify what was searched and where

Files:

holmes/plugins/toolsets/newrelic/newrelic.py
holmes/core/models.py

🧬 Code graph analysis (1)

holmes/plugins/toolsets/newrelic/newrelic.py (3)

holmes/plugins/toolsets/prometheus/data_compression.py (3)

raw_metric_to_compressed_metric (158-168)

summarize_metrics (294-311)

remove_labels (171-176)

holmes/plugins/toolsets/prometheus/model.py (2)

PromResponse (19-255)

to_json (105-110)

holmes/core/tools.py (2)

StructuredToolResult (78-103)

StructuredToolResultStatus (51-75)

🪛 Ruff (0.13.1)

holmes/plugins/toolsets/newrelic/newrelic.py

113-113: Do not catch blind exception: Exception

(BLE001)

holmes/core/models.py

87-88: try-except-pass detected, consider logging the exception

(S110)

87-87: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Pre-commit checks
GitHub Check: llm_evals

🔇 Additional comments (6)

holmes/plugins/toolsets/newrelic/newrelic.py (6)

13-16: Imports look good.

Only needed symbols are imported and placed at top-level.

118-135: Return type change to PromResponse is sound.

Wrapper delegates to PromResponse.from_newrelic_records and preserves params; good for richer error handling and future extensions.

249-251: Reasonable defaults for config.

Defaults align with Prometheus path and are explicit.

257-259: Toolset fields mirror config; good.

Public attributes added with sensible defaults.

83-117: Compression gating and size ratio calc are incorrect; fix gating, measure sizes correctly, and avoid empty llm_data.

Don’t gate on self._toolset.config; it can be None and is unrelated.
Don’t json.dumps a string to measure size; use the string length directly.
Use compact JSON for original size to avoid skew.
Avoid returning an empty string as llm_data.
Replace blind except with logging.exception (ruff BLE001).

Apply:

-        try:
-            if self._toolset.config and self._toolset.compress_metrics:
+        try:
+            if self._toolset.compress_metrics:
                 metrics = [
                     raw_metric_to_compressed_metric(metric, remove_labels=set())
                     for metric in response.data.result
                 ]
 
                 compressed_data = summarize_metrics(metrics)
-                original_size = len(json.dumps(response.to_json()))
-                compressed_size = len(json.dumps(compressed_data))
+                original_size = len(
+                    json.dumps(response.to_json(), separators=(",", ":"), ensure_ascii=False)
+                )
+                compressed_size = (
+                    len(compressed_data)
+                    if isinstance(compressed_data, str)
+                    else len(json.dumps(compressed_data, separators=(",", ":"), ensure_ascii=False))
+                )
                 compression_ratio = (
                     (1 - compressed_size / original_size) * 100
                     if original_size > 0
                     else 0
                 )
 
-                if compression_ratio > self._toolset.compress_metrics_minimum_ratio:
+                if compressed_size > 0 and compression_ratio > self._toolset.compress_metrics_minimum_ratio:
                     # below this amount it's likely not worth mutating the response
                     llm_data = compressed_data
                     logging.info(
                         f"Compressed Newrelic metrics: {original_size:,} → {compressed_size:,} chars "
                         f"({compression_ratio:.1f}% reduction)"
                     )
                 else:
                     logging.info(
                         f"Compressed Newrelic metrics: {original_size:,} → {compressed_size:,} chars "
                         f"({compression_ratio:.1f}% reduction). Original data will be used instead."
                     )
-        except Exception:
-            logging.warning("Failed to compress newrelic data", exc_info=True)
+        except Exception:
+            logging.exception("Failed to compress New Relic metrics data")

289-292: Don’t coalesce with or; preserve explicit False/0 from config.

Assign directly so that user-provided False/0 are respected.

-            self.compress_metrics = nr_config.compress_metrics or True
-            self.compress_metrics_minimum_ratio = (
-                nr_config.compress_metrics_minimum_ratio or 30
-            )
+            self.compress_metrics = nr_config.compress_metrics
+            self.compress_metrics_minimum_ratio = nr_config.compress_metrics_minimum_ratio

Optional (nearby, not in this hunk): also drop or False for EU flag to preserve False and only default on None:

# Current (line 288):
# self.is_eu_datacenter = nr_config.is_eu_datacenter or False
# Recommended:
self.is_eu_datacenter = (
    bool(nr_config.is_eu_datacenter) if nr_config.is_eu_datacenter is not None else False
)

holmes/core/models.py

holmes/plugins/toolsets/newrelic/newrelic.py

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (9)

holmes/plugins/toolsets/newrelic/newrelic.py (2)

113-114: Avoid blind except (Ruff BLE001).
Catch specific exceptions or re‑raise after logging.

Suggested tweak (already included above):
-except Exception:
+except (TypeError, ValueError):
     logging.warning("Failed to compress newrelic data", exc_info=True)
249-251: Constrain ratio via pydantic for safer config (0–100).
Prevents misconfiguration.

Apply:
-class NewrelicConfig(BaseModel):
+class NewrelicConfig(BaseModel):
@@
-    compress_metrics_minimum_ratio: int = 30  # 20 means 20% size reduction
+    compress_metrics_minimum_ratio: int = 30  # percent
And update imports (outside this hunk):
-from pydantic import BaseModel
+from pydantic import BaseModel, Field
Then refine the field:
-    compress_metrics_minimum_ratio: int = 30  # percent
+    compress_metrics_minimum_ratio: int = Field(30, ge=0, le=100)  # percent

holmes/plugins/toolsets/prometheus/data_compression.py (7)

19-38: Make format_labels robust (empty section names, no-labels) + fix typos and RUF010.

Prevents stray spaces/blank header lines when section_name == "", avoids raising on empty labels, fixes “outpout” typo, and uses f-string conversion flags.

Apply:

 def format_labels(
     labels: set[tuple[str, Any]], section_name: str, line_prefix: str = ""
 ) -> list[str]:
     lines = []
     if labels:
-        sorted_labels = sorted(
-            labels, key=lambda t: t[0]
-        )  # keep label list stable in the outpout by sorting them by key
+        sorted_labels = sorted(
+            labels, key=lambda t: t[0]
+        )  # keep label list stable in the output by sorting them by key
         if len(sorted_labels) <= 10:
-            labels_dict = {}
-            for k, v in sorted_labels:
-                labels_dict[k] = v
-            lines.append(f"{line_prefix}{section_name} {json.dumps(labels_dict)}")
+            labels_dict = {k: v for k, v in sorted_labels}
+            if section_name:
+                lines.append(f"{line_prefix}{section_name} {json.dumps(labels_dict)}")
+            else:
+                lines.append(f"{line_prefix}{json.dumps(labels_dict)}")
         else:
-            lines.append(line_prefix + section_name)
+            if section_name:
+                lines.append(line_prefix + section_name)
             for k, v in sorted_labels:
-                lines.append(f"{line_prefix}{INDENT_SPACES}- {str(k)}: {str(v)}")
+                lines.append(f"{line_prefix}{INDENT_SPACES}- {k!s}: {v!s}")
     else:
-        raise Exception("No label")
+        if section_name:
+            lines.append(f"{line_prefix}{section_name} NO_LABELS")
     return lines

44-57: Use explicit f-string conversion flags (RUF010).

Avoid str() inside f-strings.

Apply:

-            for value in data.values:
-                lines.append(f"{INDENT_SPACES}- {str(value[0])}: {str(value[1])}")
+            for value in data.values:
+                lines.append(f"{INDENT_SPACES}- {value[0]!s}: {value[1]!s}")

131-133: Prefer a custom, typed exception over bare Exception.

This path represents a formatting contract violation; be explicit.

Apply:

-    else:
-        raise Exception("Data has no metrics and is not a CompressedMetric")
+    else:
+        raise DataCompressionFormatError(
+            "Data has no metrics and is not a CompressedMetric"
+        )

And declare the exception near the top (see next comment).

6-8: Add a domain-specific exception type.

Improves signal and satisfies TRY002/TRY003 guidance elsewhere.

Apply:

 from holmes.plugins.toolsets.prometheus.model import PromSeries
 
 
+class DataCompressionFormatError(ValueError):
+    """Raised when compressed/grouped metric structures are malformed for formatting."""
+

171-177: Add return type; keep function pure in signature.

Mypy: annotate -> None.

Apply:

-def remove_labels(metric: CompressedMetric, remove_labels: set[tuple[str, Any]]):
+def remove_labels(metric: CompressedMetric, remove_labels: set[tuple[str, Any]]) -> None:

313-322: Fix incorrect docstring for find_most_common_label.

Docstring mentions different params/return.

Apply:

-    """
-    Find label keys and values that are most common across all label sets.
-    Returns labels that appear in ALL label sets with the same value.
-
-    Args:
-        label_sets: List of label dictionaries
-
-    Returns:
-        Dictionary of common labels (key -> most common value)
-    """
+    """
+    Find the (key, value) label pair most common across provided metrics,
+    ignoring pairs in `ignore_label_set`.
+
+    Returns:
+        (most_common_label, count) where `most_common_label` is a (key, value)
+        tuple or None when no label appears more than once, and `count` is the
+        number of metrics containing that pair.
+    """

284-288: Type‑annotate or remove set_default (holmes/plugins/toolsets/prometheus/data_compression.py:284)

Add a clear JSON-serializer signature and error message, or delete the unused helper (only its definition was found).

-def set_default(obj):
-    if isinstance(obj, set):
-        return list(obj)
-    raise TypeError
+def set_default(obj: Any) -> Any:
+    """JSON serializer hook: convert sets to lists."""
+    if isinstance(obj, set):
+        return list(obj)
+    raise TypeError(f"Object of type {type(obj).__name__} is not JSON serializable")

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e1107a2 and b537550.

📒 Files selected for processing (2)

holmes/plugins/toolsets/newrelic/newrelic.py (6 hunks)
holmes/plugins/toolsets/prometheus/data_compression.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (3)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Always place Python imports at the top of the file, not inside functions or methods
All Python code must include type hints (mypy enforced)

Files:

holmes/plugins/toolsets/newrelic/newrelic.py
holmes/plugins/toolsets/prometheus/data_compression.py

holmes/plugins/toolsets/**

📄 CodeRabbit inference engine (CLAUDE.md)

Toolsets must be organized under holmes/plugins/toolsets/ as either {name}.yaml files or directories

Files:

holmes/plugins/toolsets/newrelic/newrelic.py
holmes/plugins/toolsets/prometheus/data_compression.py

holmes/{core,plugins}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

All tools must return detailed error messages from underlying APIs, including executed command/query, time ranges/parameters, and full API error response; 'no data' responses must specify what was searched and where

Files:

holmes/plugins/toolsets/newrelic/newrelic.py
holmes/plugins/toolsets/prometheus/data_compression.py

🧬 Code graph analysis (2)

holmes/plugins/toolsets/newrelic/newrelic.py (2)

holmes/plugins/toolsets/prometheus/data_compression.py (3)

raw_metric_to_compressed_metric (158-168)

summarize_metrics (290-307)

remove_labels (171-176)

holmes/plugins/toolsets/prometheus/model.py (2)

PromResponse (19-255)

to_json (105-110)

holmes/plugins/toolsets/prometheus/data_compression.py (1)

holmes/plugins/toolsets/prometheus/model.py (1)

PromSeries (9-11)

🪛 Ruff (0.13.1)

holmes/plugins/toolsets/newrelic/newrelic.py

113-113: Do not catch blind exception: Exception

(BLE001)

holmes/plugins/toolsets/prometheus/data_compression.py

35-35: Use explicit conversion flag

Replace with conversion flag

(RUF010)

35-35: Use explicit conversion flag

Replace with conversion flag

(RUF010)

37-37: Create your own exception

(TRY002)

37-37: Avoid specifying long messages outside the exception class

(TRY003)

52-52: Use explicit conversion flag

Replace with conversion flag

(RUF010)

52-52: Use explicit conversion flag

Replace with conversion flag

(RUF010)

60-62: Avoid specifying long messages outside the exception class

(TRY003)

132-132: Create your own exception

(TRY002)

132-132: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: build (3.12)
GitHub Check: build (3.10)
GitHub Check: build (3.11)
GitHub Check: llm_evals

🔇 Additional comments (7)

holmes/plugins/toolsets/newrelic/newrelic.py (6)

13-16: LGTM: Prometheus compression utilities imported correctly.
The import placement and symbols look correct and typed usage downstream matches.

125-135: Return type modernization to PromResponse is sound.
Signature clarity improves callers; the simple pass-through return is fine.

257-259: Defaults mirrored in Toolset: OK.
Consistent with NewrelicConfig; keeps behavior predictable.

289-292: Nice: removed or coalescing; explicit values now respected.
This preserves False/0 semantics from config.

83-117: Fix compression gating and size ratio calculation (don’t JSON-dump a string; drop config check).
Compression is disabled if self._toolset.config is falsy, and the ratio is skewed by json.dumps on an already‑string compressed_data. Measure bytes, not pretty JSON size, and only set llm_data when non‑empty.

Apply:

 def compress_metrics_data(self, response: PromResponse) -> Optional[str]:
     llm_data: Optional[str] = None
     try:
-        if self._toolset.config and self._toolset.compress_metrics:
+        if self._toolset.compress_metrics:
             metrics = [
                 raw_metric_to_compressed_metric(metric, remove_labels=set())
                 for metric in response.data.result
             ]
 
             compressed_data = summarize_metrics(metrics)
-            original_size = len(json.dumps(response.to_json()))
-            compressed_size = len(json.dumps(compressed_data))
+            original_json = json.dumps(
+                response.to_json(), separators=(",", ":"), ensure_ascii=False
+            )
+            original_size = len(original_json.encode("utf-8"))
+            compressed_size = len(compressed_data.encode("utf-8"))
             compression_ratio = (
                 (1 - compressed_size / original_size) * 100
                 if original_size > 0
                 else 0
             )
 
-            if compression_ratio > self._toolset.compress_metrics_minimum_ratio:
+            if (
+                compression_ratio >= self._toolset.compress_metrics_minimum_ratio
+                and compressed_data.strip()
+            ):
                 # below this amount it's likely not worth mutating the response
                 llm_data = compressed_data
                 logging.info(
                     f"Compressed Newrelic metrics: {original_size:,} → {compressed_size:,} chars "
                     f"({compression_ratio:.1f}% reduction)"
                 )
             else:
                 logging.info(
                     f"Compressed Newrelic metrics: {original_size:,} → {compressed_size:,} chars "
                     f"({compression_ratio:.1f}% reduction). Original data will be used instead."
                 )
-    except Exception:
+    except (TypeError, ValueError):
         logging.warning("Failed to compress newrelic data", exc_info=True)
 
     return llm_data

180-183: Fix fallback: check 'data.result' (not 'results') and only fallback when empty.
Current logic replaces valid Prom‑style JSON with raw NR records and uses the wrong key. This breaks consumers.

Apply:

-            return_result = prom_data.to_json()
-            if len(return_result.get("data", {}).get("results", [])):
-                return_result = result  # type: ignore[assignment]
+            return_result = prom_data.to_json()
+            # Fallback to raw NR records only if the Prom-like response has no series
+            if not return_result.get("data", {}).get("result"):
+                return_result = result  # type: ignore[assignment]

#!/bin/bash
# Verify no other incorrect "results" lookups remain.
rg -nP --type=py -C2 '\.get\("results"\)' holmes || true

holmes/plugins/toolsets/prometheus/data_compression.py (1)

207-213: Thanks for fixing the mutable default and removing the unused global.

globally_common_labels now uses Optional[...] = None with local initialization; previous leakage risk is gone.

Please also confirm no call sites relied on passing a mutable default by omission (should be no behavior change). Quick check:
#!/bin/bash
rg -nP 'group_metrics\s*\(' -g '!**/dist/**' -g '!**/.venv/**'
Also applies to: 216-222

holmes/plugins/toolsets/prometheus/data_compression.py

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

holmes/plugins/toolsets/newrelic/newrelic.py (1)
249-259: Tidy up defaults/comments for compression ratio.

Comment says “20 means 20%” but default is 30; clarify comment to match.

Apply this diff:
-    compress_metrics_minimum_ratio: int = 30  # 20 means 20% size reduction
+    compress_metrics_minimum_ratio: int = 30  # percentage; e.g., 30 means 30% size reduction
@@
-    compress_metrics_minimum_ratio: int = 30
+    compress_metrics_minimum_ratio: int = 30  # percentage threshold

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b537550 and af0d9ec.

📒 Files selected for processing (1)

holmes/plugins/toolsets/newrelic/newrelic.py (6 hunks)

🧰 Additional context used

📓 Path-based instructions (3)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Always place Python imports at the top of the file, not inside functions or methods
All Python code must include type hints (mypy enforced)

Files:

holmes/plugins/toolsets/newrelic/newrelic.py

holmes/plugins/toolsets/**

📄 CodeRabbit inference engine (CLAUDE.md)

Toolsets must be organized under holmes/plugins/toolsets/ as either {name}.yaml files or directories

Files:

holmes/plugins/toolsets/newrelic/newrelic.py

holmes/{core,plugins}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

All tools must return detailed error messages from underlying APIs, including executed command/query, time ranges/parameters, and full API error response; 'no data' responses must specify what was searched and where

Files:

holmes/plugins/toolsets/newrelic/newrelic.py

🧬 Code graph analysis (1)

holmes/plugins/toolsets/newrelic/newrelic.py (3)

holmes/plugins/toolsets/prometheus/data_compression.py (3)

raw_metric_to_compressed_metric (158-168)

summarize_metrics (290-307)

remove_labels (171-176)

holmes/plugins/toolsets/prometheus/model.py (2)

PromResponse (19-255)

to_json (105-110)

holmes/core/tools.py (2)

StructuredToolResult (78-103)

StructuredToolResultStatus (51-75)

🪛 Ruff (0.13.1)

holmes/plugins/toolsets/newrelic/newrelic.py

113-113: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Pre-commit checks
GitHub Check: llm_evals

🔇 Additional comments (5)

holmes/plugins/toolsets/newrelic/newrelic.py (5)

13-16: LGTM: correct import of Prometheus compression helpers.
Consistent with the Prometheus path and placed at top per guidelines.

118-135: API return type change looks good.
Returning PromResponse centralizes formatting and aligns with Prometheus path.

83-117: Fix compression gating, size calc, and logging; avoid blind catch.

Don’t gate on self._toolset.config; rely solely on self._toolset.compress_metrics.
Compare actual payload sizes: minified JSON bytes for original vs UTF‑8 bytes of summarized text. Avoid json.dumps on an already‑string.
Skip empty summaries; use inclusive threshold (>=); use “New Relic” in logs.
Keep fail‑open but annotate the broad except to satisfy Ruff BLE001.

Apply this diff:

-    def compress_metrics_data(self, response: PromResponse) -> Optional[str]:
-        llm_data: Optional[str] = None
-        try:
-            if self._toolset.config and self._toolset.compress_metrics:
-                metrics = [
-                    raw_metric_to_compressed_metric(metric, remove_labels=set())
-                    for metric in response.data.result
-                ]
-
-                compressed_data = summarize_metrics(metrics)
-                original_size = len(json.dumps(response.to_json()))
-                compressed_size = len(json.dumps(compressed_data))
-                compression_ratio = (
-                    (1 - compressed_size / original_size) * 100
-                    if original_size > 0
-                    else 0
-                )
-
-                if compression_ratio > self._toolset.compress_metrics_minimum_ratio:
-                    # below this amount it's likely not worth mutating the response
-                    llm_data = compressed_data
-                    logging.info(
-                        f"Compressed Newrelic metrics: {original_size:,} → {compressed_size:,} chars "
-                        f"({compression_ratio:.1f}% reduction)"
-                    )
-                else:
-                    logging.info(
-                        f"Compressed Newrelic metrics: {original_size:,} → {compressed_size:,} chars "
-                        f"({compression_ratio:.1f}% reduction). Original data will be used instead."
-                    )
-        except Exception:
-            logging.warning("Failed to compress newrelic data", exc_info=True)
-
-        return llm_data
+    def compress_metrics_data(self, response: PromResponse) -> Optional[str]:
+        if not self._toolset.compress_metrics or not response.data.result:
+            return None
+        llm_data: Optional[str] = None
+        try:
+            metrics = [
+                raw_metric_to_compressed_metric(metric, remove_labels=set())
+                for metric in response.data.result
+            ]
+            compressed_data = summarize_metrics(metrics)
+            if not compressed_data.strip():
+                return None
+            original_size = len(
+                json.dumps(
+                    response.to_json(), separators=(",", ":"), ensure_ascii=False
+                ).encode("utf-8")
+            )
+            compressed_size = len(compressed_data.encode("utf-8"))
+            compression_ratio = (1 - compressed_size / original_size) * 100 if original_size > 0 else 0.0
+            if compression_ratio >= self._toolset.compress_metrics_minimum_ratio:
+                llm_data = compressed_data
+                logging.info(
+                    f"Compressed New Relic metrics: {original_size:,} → {compressed_size:,} bytes "
+                    f"({compression_ratio:.1f}% reduction)"
+                )
+            else:
+                logging.info(
+                    f"Compressed New Relic metrics: {original_size:,} → {compressed_size:,} bytes "
+                    f"({compression_ratio:.1f}% reduction). Keeping original data."
+                )
+        except Exception:  # noqa: BLE001 - non-critical path; fail open
+            logging.warning("Failed to compress New Relic data", exc_info=True)
+        return llm_data

180-187: Bug: fallback checks 'results' instead of 'result'; also propagate error status.

Prom-style payload uses data.result, not data.results.
Map StructuredToolResult.status from prom_data.status; don’t mark errors as SUCCESS.

Apply this diff:

-            return_result = prom_data.to_json()
-            if len(return_result.get("data", {}).get("results", [])):
-                return_result = result  # type: ignore[assignment]
-            return StructuredToolResult(
-                status=StructuredToolResultStatus.SUCCESS,
-                data=json.dumps(return_result, indent=2),
-                llm_data=self.compress_metrics_data(prom_data),
-                params=params,
-            )
+            return_result = prom_data.to_json()
+            # Fallback to raw NR records only if the Prom-like response has no series
+            if not return_result.get("data", {}).get("result"):
+                return_result = result  # type: ignore[assignment]
+            return StructuredToolResult(
+                status=(
+                    StructuredToolResultStatus.SUCCESS
+                    if prom_data.status == "success"
+                    else StructuredToolResultStatus.ERROR
+                ),
+                data=json.dumps(return_result, indent=2),
+                llm_data=self.compress_metrics_data(prom_data) if prom_data.status == "success" else None,
+                params=params,
+            )

288-292: Don’t coalesce config with or; preserve explicit False/0.
Assign directly (Pydantic defaults handle None) or coerce with bool().

-            self.is_eu_datacenter = nr_config.is_eu_datacenter or False
-            self.compress_metrics = nr_config.compress_metrics
-            self.compress_metrics_minimum_ratio = (
-                nr_config.compress_metrics_minimum_ratio
-            )
+            self.is_eu_datacenter = bool(nr_config.is_eu_datacenter)
+            self.compress_metrics = nr_config.compress_metrics
+            self.compress_metrics_minimum_ratio = nr_config.compress_metrics_minimum_ratio

aantn

Leaving a partial review below.

In addition to the comments, please clean up data_compression.py. Some of the comments in the code are flat out wrong. E.g. on a first glance I saw # Find labels that appear in ALL sets (100% frequency) and assumed that was true, which led to a series of PR comments in other places because the code was very strange assuming that is in fact true. Then I looked closer and saw the comment was incorrect, so I deleted all the previous PR comments. To keep things effective, please clean it up and I'll rereview.

holmes/plugins/toolsets/newrelic/newrelic.py

holmes/core/tools_utils/tool_context_window_limiter.py

holmes/core/tool_calling_llm.py

holmes/core/models.py

holmes/plugins/toolsets/prometheus/data_compression.py

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

holmes/plugins/toolsets/newrelic/newrelic.py (1)
110-112: Specify exception type for better error handling.

The bare Exception catch is too broad and the static analysis warning is valid.

Apply this diff:
-        except Exception:
+        except (ValueError, TypeError, json.JSONEncodeError) as e:
             logging.warning("Failed to compress newrelic data", exc_info=True)
This targets the specific exceptions that could occur during compression and JSON serialization while maintaining the same logging behavior.
holmes/plugins/toolsets/prometheus/data_compression.py (2)
288-306: Clarify zero-value filtering condition.

The condition >= len(...) * 0.1 means zero-value metrics comprise at least 10% relative to non-zero metrics, but the logic isn't immediately clear.

Apply this diff to clarify the intent:
 def compact_metrics(metrics_to_process: list[CompressedMetric]) -> str:
     summarized_text = ""
     filtered_metrics = pre_filter_metrics(metrics=metrics_to_process)
+    
+    # Include zero-value summary if they comprise at least 10% of the other metrics
     if (
         len(filtered_metrics.metrics_with_only_zero_values)
-        >= len(filtered_metrics.other_metrics) * 0.1
+        >= max(1, len(filtered_metrics.other_metrics) * 0.1)  # At least 1 metric, or 10% of others
     ):
         zero_metrics = group_metrics(filtered_metrics.metrics_with_only_zero_values)
         summarized_text += format_zero_values_metrics(zero_metrics)
This makes the threshold behavior clearer and ensures at least one zero-value metric is needed.

309-347: Improve function documentation accuracy.

The docstring and comments don't accurately describe what the function returns.

Apply this diff:
 def find_most_common_label(
     metrics: list[CompressedMetric], ignore_label_set: Set[tuple[str, Any]]
 ) -> tuple[Optional[tuple[str, Any]], int]:
     """
-    Find label keys and values that are most common across all label sets.
-    Returns labels that appear in ALL label sets with the same value.
+    Find the most frequently occurring (key, value) label pair across metrics.
+    Returns the label pair that appears most frequently (but at least twice).

     Args:
-        label_sets: List of label dictionaries
+        metrics: List of CompressedMetric objects
+        ignore_label_set: Set of (key, value) pairs to skip

     Returns:
-        Dictionary of common labels (key -> most common value)
+        Tuple of (most_common_label_pair, occurrence_count)
     """
The function finds the most frequently occurring label pair (that appears at least twice), not labels that appear in ALL metrics.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between af0d9ec and 9efedb4.

📒 Files selected for processing (4)

holmes/plugins/toolsets/newrelic/newrelic.py (6 hunks)
holmes/plugins/toolsets/prometheus/data_compression.py (1 hunks)
holmes/plugins/toolsets/prometheus/prometheus.py (4 hunks)
tests/plugins/toolsets/prometheus/test_data_compression.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

tests/plugins/toolsets/prometheus/test_data_compression.py

🧰 Additional context used

📓 Path-based instructions (3)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Always place Python imports at the top of the file, not inside functions or methods
All Python code must include type hints (mypy enforced)

Files:

holmes/plugins/toolsets/newrelic/newrelic.py
holmes/plugins/toolsets/prometheus/prometheus.py
holmes/plugins/toolsets/prometheus/data_compression.py

holmes/plugins/toolsets/**

📄 CodeRabbit inference engine (CLAUDE.md)

Toolsets must be organized under holmes/plugins/toolsets/ as either {name}.yaml files or directories

Files:

holmes/plugins/toolsets/newrelic/newrelic.py
holmes/plugins/toolsets/prometheus/prometheus.py
holmes/plugins/toolsets/prometheus/data_compression.py

holmes/{core,plugins}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

All tools must return detailed error messages from underlying APIs, including executed command/query, time ranges/parameters, and full API error response; 'no data' responses must specify what was searched and where

Files:

holmes/plugins/toolsets/newrelic/newrelic.py
holmes/plugins/toolsets/prometheus/prometheus.py
holmes/plugins/toolsets/prometheus/data_compression.py

🧬 Code graph analysis (3)

holmes/plugins/toolsets/newrelic/newrelic.py (3)

holmes/plugins/toolsets/prometheus/data_compression.py (3)

simplify_prometheus_metric_object (157-167)

compact_metrics (289-306)

remove_labels (170-175)

holmes/plugins/toolsets/prometheus/model.py (2)

PromResponse (19-255)

to_json (105-110)

holmes/core/tools.py (2)

StructuredToolResult (78-103)

StructuredToolResultStatus (51-75)

holmes/plugins/toolsets/prometheus/prometheus.py (2)

holmes/plugins/toolsets/prometheus/data_compression.py (3)

simplify_prometheus_metric_object (157-167)

compact_metrics (289-306)

remove_labels (170-175)

holmes/plugins/toolsets/prometheus/model.py (1)

PromSeries (9-11)

holmes/plugins/toolsets/prometheus/data_compression.py (1)

holmes/plugins/toolsets/prometheus/model.py (1)

PromSeries (9-11)

🪛 Ruff (0.13.1)

holmes/plugins/toolsets/newrelic/newrelic.py

110-110: Do not catch blind exception: Exception

(BLE001)

holmes/plugins/toolsets/prometheus/prometheus.py

1428-1428: Do not catch blind exception: Exception

(BLE001)

holmes/plugins/toolsets/prometheus/data_compression.py

38-38: Use explicit conversion flag

Replace with conversion flag

(RUF010)

38-38: Use explicit conversion flag

Replace with conversion flag

(RUF010)

40-40: Create your own exception

(TRY002)

40-40: Avoid specifying long messages outside the exception class

(TRY003)

52-52: Use explicit conversion flag

Replace with conversion flag

(RUF010)

52-52: Use explicit conversion flag

Replace with conversion flag

(RUF010)

60-62: Avoid specifying long messages outside the exception class

(TRY003)

133-133: Create your own exception

(TRY002)

133-133: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: llm_evals
GitHub Check: Pre-commit checks

🔇 Additional comments (16)

holmes/plugins/toolsets/newrelic/newrelic.py (6)
13-16: Good implementation aligning with the broader compression strategy.

The imports bring in the compression utilities developed for Prometheus data processing, indicating consistent toolset strategy across monitoring systems.

115-131: LGTM on method signature change.

The updated return type PromResponse instead of Dict improves type safety and aligns with the structured response model used throughout the codebase.

175-183: Good integration of compression pipeline.

The integration properly calls to_prometheus_records to get a PromResponse, then uses compact_metrics_data to generate the compressed llm_data for LLM consumption while keeping the full data available via prom_data.to_json().

244-245: LGTM on configuration additions.

The configuration fields follow established patterns and provide reasonable defaults for compression behavior.

Also applies to: 252-253

284-285: Good configuration assignment.

Direct assignment from config avoids the issues with boolean coalescing using or that would turn explicit False values into defaults.

83-113: Fix size calculation and remove unnecessary logging.

The compression size calculation is incorrect and logging level is too verbose for CLI output.

Apply this diff:
-            compacted_data = compact_metrics(metrics)
-            original_size = len(json.dumps(response.to_json()))
-            compacted_size = len(json.dumps(compacted_data))
+            compacted_data = compact_metrics(metrics)
+            original_size = len(json.dumps(response.to_json(), separators=(",", ":"), ensure_ascii=False))
+            compacted_size = len(compacted_data)
             compaction_ratio = (
                 (1 - compacted_size / original_size) * 100 if original_size > 0 else 0
             )

             if compaction_ratio > self._toolset.compact_metrics_minimum_ratio:
                 # below this amount it's likely not worth mutating the response
                 llm_data = compacted_data
-                logging.debug(
+                logging.debug(
                     f"Compressed Newrelic metrics: {original_size:,} → {compacted_size:,} chars "
                     f"({compaction_ratio:.1f}% reduction)"
                 )
             else:
-                logging.debug(
+                logging.debug(
                     f"Compressed Newrelic metrics: {original_size:,} → {compacted_size:,} chars "
                     f"({compaction_ratio:.1f}% reduction). Original data will be used instead."
                 )
Issues:

compacted_data is already a string, so len(json.dumps(compacted_data)) adds JSON encoding overhead which skews the compression ratio

JSON serialization should use compact format for accurate size measurement

Logging level was already correct (debug), so no change needed there
holmes/plugins/toolsets/prometheus/prometheus.py (6)
24-27: Good import alignment with compression strategy.

The imports bring in the same compression utilities used in the NewRelic toolset, showing consistent approach across monitoring integrations.

103-104: LGTM on configuration additions.

The configuration fields are well-documented and provide reasonable defaults for compression behavior. The 20% threshold for compress_range_metrics_minimum_ratio is sensible.

1376-1377: Good initialization of compression pipeline.

The llm_data variable is properly initialized to track compressed payloads for LLM consumption.

1470-1470: Good integration of llm_data field.

The llm_data field is properly passed to StructuredToolResult, enabling the compressed payload to be available for LLM consumption while maintaining the full structured response.

1382-1431: Fix compression size calculation and verify large payload handling.

The compression logic has the same size calculation issue seen in the NewRelic toolset, and there may be issues with large payload handling.

Apply this diff:
                     try:
                         if self.toolset.config.compress_range_metrics:
                             metrics_list_dict = result_data.get("result")
+                            if not isinstance(metrics_list_dict, list):
+                                metrics_list_dict = []
                             raw_metrics = [
                                 PromSeries(**metric) for metric in metrics_list_dict
                             ]
                             metrics = [
                                 simplify_prometheus_metric_object(
                                     metric, remove_labels=set()
                                 )
                                 for metric in raw_metrics
                             ]

                             compressed_data = compact_metrics(metrics)
                             response_data["raw_data"] = result_data
-                            original_size = len(json.dumps(result_data))
-                            compressed_size = len(json.dumps(compressed_data))
+                            original_size = len(json.dumps(result_data, separators=(",", ":"), ensure_ascii=False))
+                            compressed_size = len(compressed_data)
                             compression_ratio = (
                                 (1 - compressed_size / original_size) * 100
                                 if original_size > 0
                                 else 0
                             )

                             if (
                                 compression_ratio
                                 > self.toolset.config.compress_range_metrics_minimum_ratio
                             ):
                                 # below this amount it's likely not worth mutating the response
                                 llm_data = compressed_data
-                                logging.info(
+                                logging.debug(
                                     f"Compressed Prometheus data: {original_size:,} → {compressed_size:,} chars "
                                     f"({compression_ratio:.1f}% reduction)"
                                 )
                             else:
-                                logging.info(
+                                logging.debug(
                                     f"Compressed Prometheus data: {original_size:,} → {compressed_size:,} chars "
                                     f"({compression_ratio:.1f}% reduction). Original data will be used instead."
                                 )
Issues:

compressed_data is a string, so len(json.dumps(compressed_data)) adds unnecessary JSON encoding overhead

Need to guard against non-list metrics_list_dict

Logging should be debug level to avoid CLI spam

1433-1463: Unify size-gating and summary source for Prometheus query responses

holmes/plugins/toolsets/prometheus/prometheus.py — lines 1145–1146 and 1433–1436: two places compute data_size_chars. The second block measures using llm_data (if present) but still builds the summary from response_data["data"] (original data); the first block never considers llm_data at all.

Decide whether the size gate should use the final payload to be sent (llm_data if present) or always the original data; then make the measurement and summary use the same source (or document the intentional mismatch) and update the two sites accordingly.
holmes/plugins/toolsets/prometheus/data_compression.py (4)
22-41: Good label formatting with defensive handling.

The function handles both compact (≤10 labels) and expanded formatting appropriately. The exception on line 40 for empty labels provides a clear failure mode.

206-211: Good fix of mutable default argument.

The function signature properly uses Optional[set[...]] = None instead of a mutable default, preventing state leakage between calls.

14-17: Add Pydantic forward reference resolution.

Self-referential models require explicit forward reference resolution to avoid runtime validation errors.

Apply this diff after the Group class definition:
 class Group(BaseModel):
     common_labels: set[tuple[str, Any]]
     metrics: Sequence[Union["Group", CompressedMetric]]

+
+# Ensure forward refs are resolved (Pydantic v2 and fallback to v1).
+try:  # pydantic v2
+    Group.model_rebuild()
+except AttributeError:  # pydantic v1
+    Group.update_forward_refs()
In V2, model_rebuild() replaced update_forward_refs() from V1, and you may want to call model_rebuild() proactively when dealing with recursive models.

82-90: Guard against empty labels in formatting.

The function calls format_labels unconditionally, which will raise an exception if labels are empty after grouping.

Apply this diff:
 def format_zero_values_data(
     data: Union[Group, CompressedMetric], line_prefix: str = ""
 ) -> str:
     lines = []
     if isinstance(data, CompressedMetric):
-        if data.labels:
-            lines.extend(format_labels(labels=data.labels, section_name=""))
+        if data.labels:
+            lines.extend(format_labels(labels=data.labels, section_name=""))
+        # Note: empty labels case is handled by returning empty lines
         txt = "\n".join([line_prefix + line for line in lines])
         return txt
The current implementation already handles empty labels correctly by only calling format_labels when data.labels is truthy. No change needed, but the existing guard is essential.

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (6)

holmes/plugins/toolsets/prometheus/data_compression.py (6)

37-39: Use explicit f-string conversion flags (RUF010) and simplify label dict build.

Switch from str() calls to !s and compact the dict creation.

-        if len(sorted_labels) <= 10:
-            labels_dict = {}
-            for k, v in sorted_labels:
-                labels_dict[k] = v
-            lines.append(f"{line_prefix}{section_name} {json.dumps(labels_dict)}")
+        if len(sorted_labels) <= 10:
+            labels_dict = {k: v for k, v in sorted_labels}
+            lines.append(f"{line_prefix}{section_name} {json.dumps(labels_dict)}")
         else:
             lines.append(line_prefix + section_name)
             for k, v in sorted_labels:
-                lines.append(f"{line_prefix}{INDENT_SPACES}- {str(k)}: {str(v)}")
+                lines.append(f"{line_prefix}{INDENT_SPACES}- {k!s}: {v!s}")

50-53: Use explicit f-string conversion flags (RUF010).

-            for value in data.values:
-                lines.append(f"{INDENT_SPACES}- {str(value[0])}: {str(value[1])}")
+            for value in data.values:
+                lines.append(f"{INDENT_SPACES}- {value[0]!s}: {value[1]!s}")

39-41: Avoid bare Exception; raise a module-specific error.

Keeps errors meaningful and satisfies linters (TRY002/TRY003).

-    else:
-        raise Exception("No label")
+    else:
+        raise NoLabelsError("No labels provided")

29-29: Typo: “outpout” → “output”.

-        )  # keep label list stable in the outpout by sorting them by key
+        )  # keep label list stable in the output by sorting them by key

316-318: Typo: “occurence” → “occurrence”.

-        The most common label and its occurence count across all metrics: ((key, value), count)
+        The most common label and its occurrence count across all metrics: ((key, value), count)

191-199: Zero detection is brittle for "0.0"/"0e0" strings; consider numeric parsing.

Current check misses zero-like strings (e.g., "0.0"). Recommend Decimal-based check to avoid float issues.

Example:

from decimal import Decimal, InvalidOperation

for value in metric.values:
    v = value[1]
    try:
        is_zero = (Decimal(str(v)) == 0)
    except InvalidOperation:
        is_zero = (v in ("0", 0))
    if not is_zero:
        other_metrics.append(metric)
        metric_has_non_zero_value = True
        break

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9efedb4 and b0223ad.

📒 Files selected for processing (1)

holmes/plugins/toolsets/prometheus/data_compression.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (3)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Always place Python imports at the top of the file, not inside functions or methods
All Python code must include type hints (mypy enforced)

Files:

holmes/plugins/toolsets/prometheus/data_compression.py

holmes/plugins/toolsets/**

📄 CodeRabbit inference engine (CLAUDE.md)

Toolsets must be organized under holmes/plugins/toolsets/ as either {name}.yaml files or directories

Files:

holmes/plugins/toolsets/prometheus/data_compression.py

holmes/{core,plugins}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

All tools must return detailed error messages from underlying APIs, including executed command/query, time ranges/parameters, and full API error response; 'no data' responses must specify what was searched and where

Files:

holmes/plugins/toolsets/prometheus/data_compression.py

🧬 Code graph analysis (1)

holmes/plugins/toolsets/prometheus/data_compression.py (1)

holmes/plugins/toolsets/prometheus/model.py (1)

PromSeries (9-11)

🪛 Ruff (0.13.1)

holmes/plugins/toolsets/prometheus/data_compression.py

38-38: Use explicit conversion flag

Replace with conversion flag

(RUF010)

38-38: Use explicit conversion flag

Replace with conversion flag

(RUF010)

40-40: Create your own exception

(TRY002)

40-40: Avoid specifying long messages outside the exception class

(TRY003)

52-52: Use explicit conversion flag

Replace with conversion flag

(RUF010)

52-52: Use explicit conversion flag

Replace with conversion flag

(RUF010)

60-62: Avoid specifying long messages outside the exception class

(TRY003)

133-133: Create your own exception

(TRY002)

133-133: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: llm_evals
GitHub Check: Pre-commit checks

🔇 Additional comments (3)

holmes/plugins/toolsets/prometheus/data_compression.py (3)
86-90: Good guard against empty labels in zero-values path.

Prevents the previous crash when grouping strips all labels.

257-275: In-place mutation of input metrics; verify callers expect this side effect.

remove_labels mutates metric.labels within the provided list. If inputs are reused elsewhere, this can surprise callers.

Would you prefer to:

keep in-place mutation (document clearly), or

work on deep copies to avoid side effects?

14-17: Resolve Pydantic forward refs at import-time to prevent runtime validation errors.

Add a v2/v1-compatible rebuild right after the Group class.
 class Group(BaseModel):
     common_labels: set[tuple[str, Any]]
     metrics: Sequence[Union["Group", CompressedMetric]]
 
+# Ensure forward refs are resolved (Pydantic v2 with v1 fallback)
+try:
+    Group.model_rebuild()  # pydantic v2
+except AttributeError:
+    Group.update_forward_refs()  # pydantic v1

holmes/plugins/toolsets/prometheus/data_compression.py

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

holmes/plugins/toolsets/prometheus/data_compression.py (1)

22-41: Avoid generic Exception; minor formatting nits

Raise ValueError (or custom) instead of bare Exception.
Use explicit f-string conversions (!s) to appease linters.

-        )  # keep label list stable in the outpout by sorting them by key
+        )  # keep label list stable in the output by sorting by key
@@
-            for k, v in sorted_labels:
-                lines.append(f"{line_prefix}{INDENT_SPACES}- {str(k)}: {str(v)}")
+            for k, v in sorted_labels:
+                lines.append(f"{line_prefix}{INDENT_SPACES}- {k!s}: {v!s}")
     else:
-        raise Exception("No label")
+        raise ValueError("labels is empty")

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b0223ad and d03a0a1.

📒 Files selected for processing (4)

holmes/plugins/toolsets/newrelic/newrelic.py (6 hunks)
holmes/plugins/toolsets/prometheus/data_compression.py (1 hunks)
holmes/plugins/toolsets/prometheus/prometheus.py (4 hunks)
tests/plugins/toolsets/prometheus/test_data_compression.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

tests/plugins/toolsets/prometheus/test_data_compression.py

🧰 Additional context used

📓 Path-based instructions (3)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Always place Python imports at the top of the file, not inside functions or methods
All Python code must include type hints (mypy enforced)

Files:

holmes/plugins/toolsets/newrelic/newrelic.py
holmes/plugins/toolsets/prometheus/data_compression.py
holmes/plugins/toolsets/prometheus/prometheus.py

holmes/plugins/toolsets/**

📄 CodeRabbit inference engine (CLAUDE.md)

Toolsets must be organized under holmes/plugins/toolsets/ as either {name}.yaml files or directories

Files:

holmes/plugins/toolsets/newrelic/newrelic.py
holmes/plugins/toolsets/prometheus/data_compression.py
holmes/plugins/toolsets/prometheus/prometheus.py

holmes/{core,plugins}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

All tools must return detailed error messages from underlying APIs, including executed command/query, time ranges/parameters, and full API error response; 'no data' responses must specify what was searched and where

Files:

holmes/plugins/toolsets/newrelic/newrelic.py
holmes/plugins/toolsets/prometheus/data_compression.py
holmes/plugins/toolsets/prometheus/prometheus.py

🧬 Code graph analysis (3)

holmes/plugins/toolsets/newrelic/newrelic.py (3)

holmes/plugins/toolsets/prometheus/data_compression.py (2)

simplify_prometheus_metric_object (155-165)

compact_metrics (283-297)

holmes/plugins/toolsets/prometheus/model.py (2)

PromResponse (19-255)

to_json (105-110)

holmes/core/tools.py (2)

StructuredToolResult (78-103)

StructuredToolResultStatus (51-75)

holmes/plugins/toolsets/prometheus/data_compression.py (1)

holmes/plugins/toolsets/prometheus/model.py (1)

PromSeries (9-11)

holmes/plugins/toolsets/prometheus/prometheus.py (2)

holmes/plugins/toolsets/prometheus/data_compression.py (2)

simplify_prometheus_metric_object (155-165)

compact_metrics (283-297)

holmes/plugins/toolsets/prometheus/model.py (1)

PromSeries (9-11)

🪛 Ruff (0.13.1)

holmes/plugins/toolsets/newrelic/newrelic.py

110-110: Do not catch blind exception: Exception

(BLE001)

holmes/plugins/toolsets/prometheus/data_compression.py

38-38: Use explicit conversion flag

Replace with conversion flag

(RUF010)

38-38: Use explicit conversion flag

Replace with conversion flag

(RUF010)

40-40: Create your own exception

(TRY002)

40-40: Avoid specifying long messages outside the exception class

(TRY003)

52-52: Use explicit conversion flag

Replace with conversion flag

(RUF010)

52-52: Use explicit conversion flag

Replace with conversion flag

(RUF010)

60-62: Avoid specifying long messages outside the exception class

(TRY003)

holmes/plugins/toolsets/prometheus/prometheus.py

1428-1428: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Pre-commit checks
GitHub Check: llm_evals

🔇 Additional comments (11)

holmes/plugins/toolsets/newrelic/newrelic.py (4)

13-16: LGTM on imports

Imports of the Prometheus compression helpers are correct and placed at top.

175-182: Fallback when no Prom-style series; compute llm_data only when useful

Avoid returning empty Prom payloads; fall back to NR raw records when no series. Also compute llm_data only when we actually have Prom-style data.

-            prom_data = self.to_prometheus_records(result, params=enriched_params)
-
-            return_result = prom_data.to_json()
-            return StructuredToolResult(
+            prom_data = self.to_prometheus_records(result, params=enriched_params)
+            return_result = prom_data.to_json()
+            llm_data = None
+            if return_result.get("data", {}).get("result"):
+                llm_data = self.compact_metrics_data(prom_data)
+            else:
+                # Fallback to raw NR records if no Prom-style series
+                return_result = result  # type: ignore[assignment]
+            return StructuredToolResult(
                 status=StructuredToolResultStatus.SUCCESS,
                 data=json.dumps(return_result, indent=2),
-                llm_data=self.compact_metrics_data(prom_data),
+                llm_data=llm_data,
                 params=params,
             )

83-114: Fix compaction gating, size calculation, and broad exception

Respect toolset flag (skip when disabled).
Measure compressed size as len(compacted_data) (it’s already a str).
Use compact JSON for original size to avoid whitespace skew.
Don’t treat empty compacted_data as a 100% reduction.
Avoid blind except.

Apply:

 def compact_metrics_data(self, response: PromResponse) -> Optional[str]:
     llm_data: Optional[str] = None
     try:
+        if not self._toolset.compact_metrics:
+            return None
         metrics = [
             simplify_prometheus_metric_object(metric, labels_to_remove=set())
             for metric in response.data.result
         ]
 
         compacted_data = compact_metrics(metrics)
-        original_size = len(json.dumps(response.to_json()))
-        compacted_size = len(json.dumps(compacted_data))
+        original_size = len(json.dumps(response.to_json(), separators=(",", ":"), ensure_ascii=False))
+        compacted_size = len(compacted_data)
         compaction_ratio = (
             (1 - compacted_size / original_size) * 100 if original_size > 0 else 0
         )
 
-        if compaction_ratio > self._toolset.compact_metrics_minimum_ratio:
+        if compacted_data and compaction_ratio > self._toolset.compact_metrics_minimum_ratio:
             # below this amount it's likely not worth mutating the response
             llm_data = compacted_data
             logging.debug(
                 f"Compressed Newrelic metrics: {original_size:,} → {compacted_size:,} chars "
                 f"({compaction_ratio:.1f}% reduction)"
             )
         else:
             logging.debug(
                 f"Compressed Newrelic metrics: {original_size:,} → {compacted_size:,} chars "
                 f"({compaction_ratio:.1f}% reduction). Original data will be used instead."
             )
-    except Exception:
+    except (KeyError, TypeError, ValueError):
         logging.warning("Failed to compress newrelic data", exc_info=True)
 
     return llm_data

284-285: Don’t coalesce with “or”; preserve explicit False/0

Use a direct assignment (or explicit None check) to avoid converting False into default.

-            self.compact_metrics = nr_config.compact_metrics
-            self.compact_metrics_minimum_ratio = nr_config.compact_metrics_minimum_ratio
+            self.compact_metrics = nr_config.compact_metrics
+            self.compact_metrics_minimum_ratio = nr_config.compact_metrics_minimum_ratio

And for Line 283:

-            self.is_eu_datacenter = nr_config.is_eu_datacenter or False
+            self.is_eu_datacenter = bool(nr_config.is_eu_datacenter)

holmes/plugins/toolsets/prometheus/prometheus.py (3)

24-29: LGTM on imports

Compression utilities are imported once at module top and typed imports are correct.

103-105: Sane defaults for compression config

New config flags (compress_range_metrics, compress_range_metrics_minimum_ratio) look good and typed.

1376-1471: Fix compression sizing, guard inputs, use debug logs, and gate size on original payload; drop oversized data

Guard result_data["result"] type before constructing PromSeries.
Measure compressed as len(compressed_data) (string), and original with compact JSON.
Use logging.debug instead of info to reduce noise.
Gate size limit using the original response_data["data"] size, not llm_data.
When over limit, add summary and remove large “data” (and “raw_data”).

                 llm_data: Optional[str] = None
 
                 if self.toolset.config.tool_calls_return_data:
                     result_data = data.get("data", {})
                     response_data["data"] = result_data
 
                     try:
                         if self.toolset.config.compress_range_metrics:
-                            metrics_list_dict = result_data.get("result")
-                            raw_metrics = [
-                                PromSeries(**metric) for metric in metrics_list_dict
-                            ]
+                            metrics_list_dict = result_data.get("result")
+                            if isinstance(metrics_list_dict, list):
+                                raw_metrics = [PromSeries(**metric) for metric in metrics_list_dict]
+                            else:
+                                raw_metrics = []
                             metrics = [
                                 simplify_prometheus_metric_object(
                                     metric, labels_to_remove=set()
                                 )
                                 for metric in raw_metrics
                             ]
 
                             compressed_data = compact_metrics(metrics)
                             response_data["raw_data"] = result_data
-                            original_size = len(json.dumps(result_data))
-                            compressed_size = len(json.dumps(compressed_data))
+                            original_size = len(json.dumps(result_data, separators=(",", ":"), ensure_ascii=False))
+                            compressed_size = len(compressed_data)
                             compression_ratio = (
                                 (1 - compressed_size / original_size) * 100
                                 if original_size > 0
                                 else 0
                             )
 
                             if (
                                 compression_ratio
                                 > self.toolset.config.compress_range_metrics_minimum_ratio
                             ):
                                 # below this amount it's likely not worth mutating the response
                                 llm_data = compressed_data
-                                logging.info(
+                                logging.debug(
                                     f"Compressed Prometheus data: {original_size:,} → {compressed_size:,} chars "
                                     f"({compression_ratio:.1f}% reduction)"
                                 )
                             else:
-                                logging.info(
+                                logging.debug(
                                     f"Compressed Prometheus data: {original_size:,} → {compressed_size:,} chars "
                                     f"({compression_ratio:.1f}% reduction). Original data will be used instead."
                                 )
 
                             response_data["compression_info"] = {
                                 "original_size_chars": original_size,
                                 "compressed_size_chars": compressed_size,
                                 "compression_ratio_percent": round(
                                     compression_ratio, 1
                                 ),
                             }
-                    except Exception as e:
+                    except (KeyError, TypeError, ValueError) as e:
                         logging.warning(
                             f"Failed to compress data: {e}, the original data will be used"
                         )
 
-                    data_str_preview = json.dumps(result_data)
-                    if llm_data:
-                        data_str_preview = llm_data
-                    data_size_chars = len(data_str_preview)
+                    data_size_chars = len(json.dumps(response_data.get("data", {}), separators=(",", ":"), ensure_ascii=False))
 
-                    # Provide summary if data is still too large after compression
-                    # In this case the summary is based on the original data and not the compressed data
+                    # If still too large after optional compression, return summary instead of full data
                     if (
                         self.toolset.config.query_response_size_limit
                         and data_size_chars
                         > self.toolset.config.query_response_size_limit
                     ):
                         response_data["data_summary"] = (
                             create_data_summary_for_large_result(
-                                response_data["data"],
+                                response_data["data"],
                                 query,
                                 data_size_chars,
                                 is_range_query=True,
                             )
                         )
-                        # Remove the large data and keep only summary
+                        # Remove large payloads to keep the response small
+                        if "data" in response_data:
+                            del response_data["data"]
+                        if "raw_data" in response_data:
+                            del response_data["raw_data"]
                         logging.info(
                             f"Prometheus range query returned large dataset: "
                             f"{response_data['data_summary'].get('series_count', 0)} series, "
                             f"{data_size_chars:,} characters (limit: {self.toolset.config.query_response_size_limit:,}). "
                             f"Returning summary instead of full data."
                         )
                         # Also add character info to the summary for debugging
                         response_data["data_summary"]["_debug_info"] = (
                             f"Data size: {data_size_chars:,} chars exceeded limit of {self.toolset.config.query_response_size_limit:,} chars"
                         )

holmes/plugins/toolsets/prometheus/data_compression.py (4)

14-17: Resolve Pydantic forward refs for Group

Ensure forward refs are built at import time to avoid runtime validation issues.
 class Group(BaseModel):
     common_labels: set[tuple[str, Any]]
     metrics: Sequence[Union["Group", CompressedMetric]]
 
+try:  # Pydantic v2
+    Group.model_rebuild()
+except AttributeError:  # Pydantic v1
+    Group.update_forward_refs()
44-79: Type-guard Group branch; f-string conversion

Use isinstance to satisfy mypy and avoid attribute access on Union.
-    elif data.metrics:
+    elif isinstance(data, Group) and data.metrics:
         if not data.common_labels:
             raise ValueError(
                 f"Group has no labels: labels={data.common_labels} metrics={len(data.metrics)}"
             )
@@
-            for value in data.values:
-                lines.append(f"{INDENT_SPACES}- {str(value[0])}: {str(value[1])}")
+            for value in data.values:
+                lines.append(f"{INDENT_SPACES}- {value[0]!s}: {value[1]!s}")
168-176: LGTM on remove_labels signature and behavior

Typed signature, no mutable defaults, and clear semantics.

283-297: LGTM on compact_metrics pipeline

Good pre-filtering, grouping, and rendering flow.

…text window

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

holmes/core/tools_utils/tool_context_window_limiter.py (1)
14-17: Use an explicit is None check for size

Relying on truthiness treats 0 the same as “unset”, so we’d keep recomputing even if a tool legitimately produced a zero-token payload. Let’s gate the recalculation strictly on None.
-        if not tool_call_result.size:
+        if tool_call_result.size is None:

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 95027b8 and 303bf6d.

📒 Files selected for processing (1)

holmes/core/tools_utils/tool_context_window_limiter.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Always place Python imports at the top of the file, not inside functions or methods
All Python code must include type hints (mypy enforced)

Files:

holmes/core/tools_utils/tool_context_window_limiter.py

holmes/{core,plugins}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

All tools must return detailed error messages from underlying APIs, including executed command/query, time ranges/parameters, and full API error response; 'no data' responses must specify what was searched and where

Files:

holmes/core/tools_utils/tool_context_window_limiter.py

🧬 Code graph analysis (1)

holmes/core/tools_utils/tool_context_window_limiter.py (3)

holmes/core/models.py (1)

as_tool_call_message (30-42)

holmes/core/tools.py (1)

StructuredToolResultStatus (51-75)

holmes/utils/sentry_helper.py (1)

capture_toolcall_contains_too_many_tokens (22-34)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: llm_evals
GitHub Check: Pre-commit checks

github-actions · 2025-09-29T05:37:05Z

Results of HolmesGPT evals

ask_holmes: 33/36 test cases were successful, 1 regressions, 1 setup failures

Test suite	Test case	Status
ask	01_how_many_pods	✅
ask	02_what_is_wrong_with_pod	✅
ask	04_related_k8s_events	✅
ask	05_image_version	✅
ask	09_crashpod	✅
ask	10_image_pull_backoff	✅
ask	110_k8s_events_image_pull	✅
ask	11_init_containers	✅
ask	13a_pending_node_selector_basic	✅
ask	14_pending_resources	✅
ask	15_failed_readiness_probe	✅
ask	17_oom_kill	✅
ask	19_detect_missing_app_details	✅
ask	20_long_log_file_search	✅
ask	24_misconfigured_pvc	✅
ask	24a_misconfigured_pvc_basic	✅
ask	28_permissions_error	🚧
ask	33_cpu_metrics_discovery	❌
ask	39_failed_toolset	✅
ask	41_setup_argo	✅
ask	42_dns_issues_steps_new_tools	⚠️
ask	43_current_datetime_from_prompt	✅
ask	45_fetch_deployment_logs_simple	✅
ask	51_logs_summarize_errors	✅
ask	53_logs_find_term	✅
ask	54_not_truncated_when_getting_pods	✅
ask	59_label_based_counting	✅
ask	60_count_less_than	✅
ask	61_exact_match_counting	✅
ask	63_fetch_error_logs_no_errors	✅
ask	79_configmap_mount_issue	✅
ask	83_secret_not_found	✅
ask	86_configmap_like_but_secret	✅
ask	93_calling_datadog[0]	✅
ask	93_calling_datadog[1]	✅
ask	93_calling_datadog[2]	✅

Legend

✅ the test was successful
:minus: the test was skipped
⚠️ the test failed but is known to be flaky or known to fail
🚧 the test had a setup failure (not a code regression)
🔧 the test failed due to mock data issues (not a code regression)
🚫 the test was throttled by API rate limits/overload
❌ the test failed and should be fixed before merging the PR

nherment added 2 commits September 19, 2025 15:54

feat: compression of prometheus data

bc1918b

feat: compression of prometheus data

911172e

nherment and others added 3 commits September 19, 2025 16:02

Merge branch 'master' into ROB-2161_improved_truncation_methodology

f7b4259

chore: code cleanup + fix tests

2831306

Merge branch 'master' into ROB-2161_improved_truncation_methodology

7f8f31d

nherment marked this pull request as ready for review September 22, 2025 08:15

coderabbitai bot reviewed Sep 22, 2025

View reviewed changes

feat: integrate metrics compressin with newrelic

8247dab

coderabbitai bot reviewed Sep 22, 2025

View reviewed changes

nherment added 4 commits September 22, 2025 14:42

fix: ui graphs work with compression

e1107a2

chore: address PR comments

2b0aae0

chore: address PR comments

4ec4f08

chore: linting

b537550

coderabbitai bot reviewed Sep 22, 2025

View reviewed changes

holmes/core/models.py Show resolved Hide resolved

holmes/plugins/toolsets/newrelic/newrelic.py Show resolved Hide resolved

coderabbitai bot reviewed Sep 22, 2025

View reviewed changes

holmes/plugins/toolsets/prometheus/data_compression.py Show resolved Hide resolved

holmes/plugins/toolsets/prometheus/data_compression.py Show resolved Hide resolved

nherment requested review from Avi-Robusta and aantn September 22, 2025 13:29

Merge branch 'master' into ROB-2161_improved_truncation_methodology

af0d9ec

coderabbitai bot reviewed Sep 22, 2025

View reviewed changes

aantn requested changes Sep 22, 2025

View reviewed changes

chore: address PR comments

9efedb4

nherment requested a review from aantn September 23, 2025 06:59

coderabbitai bot reviewed Sep 23, 2025

View reviewed changes

chore: address PR comments

b0223ad

coderabbitai bot reviewed Sep 23, 2025

View reviewed changes

chore: address PR comments0

d03a0a1

coderabbitai bot reviewed Sep 24, 2025

View reviewed changes

nherment requested a review from moshemorad September 25, 2025 07:20

set tool result size to none after it has been invalidated due to con…

95027b8

…text window

chore: linting

303bf6d

coderabbitai bot reviewed Sep 25, 2025

View reviewed changes

nherment added 2 commits September 26, 2025 15:07

Merge branch 'master' into ROB-2161_improved_truncation_methodology

f1b93bd

Merge branch 'master' into ROB-2161_improved_truncation_methodology

d51c7c4

ROB-2161 improved truncation methodology #984

Are you sure you want to change the base?

ROB-2161 improved truncation methodology #984

Uh oh!

Conversation

nherment commented Sep 19, 2025

Uh oh!

coderabbitai bot commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

aantn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Sep 29, 2025

Results of HolmesGPT evals

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Sep 19, 2025 •

edited

Loading