-
Notifications
You must be signed in to change notification settings - Fork 138
Implement anonymous usage metrics collection with opt-out mechanism #2449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: lujunsan <[email protected]>
|
@claude review this PR |
|
Claude finished @lujunsan's task —— View job Code Review for PR #2449I've completed a thorough review of the anonymous usage metrics implementation. Overall, this is a well-designed feature with good attention to privacy, error handling, and integration points. Below are my findings organized by category. SummaryThis PR adds anonymous usage metrics collection that tracks tool calls to help ToolHive development. The implementation is solid with a few areas that could benefit from improvements around error handling, testing, and edge cases. Key strengths:
Areas for improvement:
Critical Issues1. Potential Goroutine Leak on Rapid Create/DestroyLocation: The func (c *Collector) Start() {
c.started.Store(true)
go c.flushLoop()
}Issue: If Recommendation: Add a check to prevent multiple goroutines: func (c *Collector) Start() {
if c.started.Swap(true) {
return // Already started
}
go c.flushLoop()
}2. Midnight Boundary Reset Loses Recent CountsLocation: The midnight boundary logic flushes the previous day's count but includes up to 15 minutes of calls from the current day (comment acknowledges this). While the comment says this is acceptable, there's a larger issue: // Reset for the new day
c.currentDate = currentDate
c.counter.Store(0)
return nilIssue: If the flush to the API fails (network error, API down), the count is permanently lost because we reset to 0 and return nil. The error is only logged as a warning. Recommendation: Consider one of these approaches:
3. Anonymous ID Race ConditionLocation: anonymousID, err := updates.TryGetAnonymousID()
if err != nil {
return fmt.Errorf("failed to get anonymous ID: %w", err)
}
// Skip sending if anonymous ID is not initialized yet
if anonymousID == "" {
logger.Debugf("Skipping metrics send - anonymous ID not yet initialized")
return nil
}Issue: There's a race condition between when ToolHive starts and when the update checker creates the anonymous ID. If metrics try to flush before the update checker runs, metrics are silently dropped with just a debug log. Recommendation:
Major Issues4. Insufficient Test Coverage for Critical PathsMissing test cases:
Recommendation: Add tests for these scenarios: func TestCollector_MidnightBoundaryTransition(t *testing.T) {
// Test count flush and reset across midnight UTC
}
func TestCollector_ConcurrentIncrements(t *testing.T) {
// Test thread-safety with multiple goroutines
}
func TestCollector_FlushFailureDoesNotBlock(t *testing.T) {
// Test that flush failures don't break the collector
}5. No Shutdown Timeout HandlingLocation: func (c *Collector) Shutdown(ctx context.Context) {
// ...
select {
case <-c.doneCh:
logger.Debugf("Collector stopped cleanly")
case <-ctx.Done():
logger.Warnf("Collector shutdown timed out: %v", ctx.Err())
}
}Issue: If shutdown times out, the goroutine continues running. The middleware's Recommendation: Consider adding a way to force-stop the goroutine or document that zombie goroutines are acceptable for this non-critical feature. Minor Issues6. Inconsistent Error Handling in FlushLocation: if err := c.client.SendMetrics(c.instanceID, record); err != nil {
logger.Warnf("Failed to send metrics: %v", err)
// Don't return error - we'll retry on next flush
return nil
}This returns Recommendation: Clarify the behavior:
7. Fixed 15-Minute Flush IntervalLocation: const flushInterval = 15 * time.MinuteIssue: This is hardcoded and not configurable. For testing or high-frequency scenarios, you might want to adjust this. Recommendation: Consider making this configurable via environment variable (with 15min default) for testing purposes. 8. UserAgent BuildType Fallback LogicLocation: buildType := versions.BuildType
if buildType == "" {
buildType = "local_build"
}Issue: The Recommendation: Consider using a more distinct value like 9. Middleware Factory Always SucceedsLocation: collector, err := NewCollector()
if err != nil {
logger.Warnf("Failed to initialize usage metrics: %v", err)
// Continue - metrics are non-critical, register no-op middleware
mw := &Middleware{}
runner.AddMiddleware(config.Type, mw)
return nil
}Issue: If collector creation fails, a no-op middleware is registered. This is fine for fault tolerance, but the error is only logged as a warning. If there's a configuration issue, it might go unnoticed. Recommendation: Consider logging at ERROR level or making it more visible that metrics are disabled due to an error. Code Quality & Style10. Magic Numbers in Timestamp FormatLocation: previousDayTimestamp := c.currentDate + "T23:59:00Z"Issue: The hardcoded Recommendation: Use a constant or helper function: const endOfDayTime = "T23:59:00Z"
previousDayTimestamp := c.currentDate + endOfDayTime11. Duplicate Code in Middleware ConfigurationLocation: The same logic for checking if metrics should be enabled exists in two places:
Recommendation: Consolidate this logic into a single place to avoid inconsistencies. Documentation12. Documentation is Excellent ✅The documentation updates are comprehensive and well-structured:
One suggestion: Add a FAQ section addressing common questions:
Security & Privacy13. Privacy Implementation is Strong ✅The privacy-first design is commendable:
No security concerns identified. Testing14. Test Coverage is Good But IncompleteWhat's tested well:
What's missing:
Recommendation: Add the missing test cases mentioned in issue #4. Integration Points15. Integration is Clean ✅The middleware integration follows ToolHive's patterns well:
No issues identified. Performance16. Performance Characteristics are Good ✅
No performance concerns identified. Recommendations SummaryMust fix before merge:
Should fix before merge: Nice to have: ConclusionThis is a solid implementation of anonymous usage metrics with strong privacy protections. The main concerns are around edge cases (midnight transitions, goroutine management, error handling) and test coverage. With the critical issues addressed, this feature will be a valuable addition to ToolHive. Overall assessment: Approve with requested changes for critical issues #1-3 and recommended improvements for issues #4-5. |
|
|
||
| **Note**: This middleware is currently implemented but not registered in the supported middleware factories (`pkg/runner/middleware.go:15`). It can be used directly via the proxy command but is not available through the standard middleware configuration system. | ||
|
|
||
| ### 7. Authorization Middleware |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happened to this section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the duplication in the docs, as it's also at point 3 line 136. Should've have added a comment here tho (or a separate PR rather), thanks for checking!
| // Returns empty string if the file doesn't exist or doesn't contain an instance ID. | ||
| // Use this for optional features like metrics that shouldn't trigger ID generation. | ||
| // TODO this should probably be extracted into its own package to handle instance ID generation. | ||
| func TryGetAnonymousID() (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we cache this ID in memory for the duration of the process? this way, we'd avoid reading the file every 15 mins.
|
|
||
| // getCurrentDateUTC returns the current date in YYYY-MM-DD format (UTC) | ||
| func getCurrentDateUTC() string { | ||
| return time.Now().UTC().Format("2006-01-02") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why use a different format than RFC3339?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This date is only used to track the midnight flush, saves the day and checks on every flush if it is the next day; it was a bit easier to parse this way (no need to extract the date later, and just checks dates aren't equal), but I'm happy to change it if needed. This date is not sent or used outside of this process.
pkg/usagemetrics/collector.go
Outdated
| logger.Debugf("Flushing %d tool calls at %s", count, timestamp) | ||
|
|
||
| if err := c.client.SendMetrics(c.instanceID, record); err != nil { | ||
| logger.Warnf("Failed to send metrics: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we tone down this warning? users often get confused by things like this. So, either we have a very descriptive warning about what happened and why this is not a problem, or we change the log level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, this probably should be debug.
|
|
||
| **Middleware** is a composable layer in the request processing chain. Each middleware can inspect, modify, or reject requests. | ||
|
|
||
| **Eight middleware types:** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: We probably should get rid of this count. It is apparent from the numbered list that there are nine items.
pkg/api/v1/workload_service.go
Outdated
| transportType = serverMetadata.GetTransport() | ||
| } | ||
|
|
||
| // Load application config for global settings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we can inject the config into the parent struct instead of loading it in each function call.
Signed-off-by: lujunsan <[email protected]>
Signed-off-by: lujunsan <[email protected]>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2449 +/- ##
==========================================
+ Coverage 55.46% 55.50% +0.03%
==========================================
Files 284 288 +4
Lines 26705 26901 +196
==========================================
+ Hits 14812 14931 +119
- Misses 10534 10602 +68
- Partials 1359 1368 +9 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Overview
Implements anonymous usage metrics (tool calls) collection for ToolHive. Users can opt out via persistent config or environment variable.
Implementation
Core Metrics Package (
pkg/usagemetrics)Middleware Integration
usagemetricsmiddleware typetools/callrequests from parsed MCP dataOpt-Out Mechanism
Users can opt out using either persistent config or environment variable:
Metrics are enabled by default and automatically disabled in CI environments.
Integration Points
All workload execution paths read from Application Config:
Documentation
Privacy