Skip to content

Conversation

@rawlingsj
Copy link
Member

@rawlingsj rawlingsj commented Nov 6, 2025

Summary

This PR reworks the rate limiting implementation to prevent GitHub's secondary rate limits, which are based on request velocity (requests/second) rather than total quota. The implementation is split into commits
for easier review:

Commit 1: Refactor existing rate limiter (no behavior change)

"Add proactive GitHub API rate limiting with shared implementation" (fb6a29d)

  • Extract existing SDK rate limiter into shared pkg/httpratelimit package
  • Integrate into githubreconciler (new usage)
  • Refactor SDK to use shared package (100% backward compatible)
  • Removes 170+ lines of duplicate code
  • NO BEHAVIOR CHANGES - exact same reactive rate limiting logic

Commit 2: Add velocity-based rate limiting (new functionality)

"Add velocity-based rate limiting to prevent GitHub secondary rate limits" (32a00ad)

  • Add DefaultMaxRequestsPerSecond = 15.0 constant
  • Add BurstMultiplier = 2 constant
  • Update NewTransport() to accept maxRequestsPerSec parameter
  • Change from unlimited (rate.Inf) to actual RPS-based rate limiting
  • NEW: Prevents secondary rate limit errors instead of just recovering from them

The Problem

Our previous approach (PR #1195) attempted to proactively throttle based on X-RateLimit-Remaining headers. However, testing with a rate limit reproducer revealed that secondary rate limits don't provide these
warning headers
:

Status=200, Limit=5000, Remaining=3925, Used=1075, Reset=1762428229, RetryAfter=
Status=403, Limit=, Remaining=, Used=, Reset=, RetryAfter=60  ⬅️ No warning!

Key insight: Secondary limits are triggered by request velocity (req/sec), not total quota. GitHub doesn't provide predictive headers - requests simply fail with 403 + Retry-After when hit.

How It Works

Before:

Unlimited requests → Hit rate limit (403/429) → Pause → Retry

After:

Every request passes through rate limiter (15 RPS default) → Prevents bursts → Never hit secondary limit
                                                           → Still handles 403/429 reactively

Example: With 15 RPS limit and 2x burst, we can handle occasional bursts of 30 requests but maintain an average of 15 req/sec, staying well below GitHub's undocumented secondary limit threshold.

Backward Compatibility

  • SDK wrapper updated, maintains compatibility
  • All existing callers work without changes (sensible defaults)
  • BREAKING CHANGE: Callers of NewTransport must update to use options:
Change NewTransport signature from positional parameters to variadic
options pattern for better flexibility and future extensibility.

Before:
  NewTransport(base, time.Minute)  // Old 2-param version

After:
  NewTransport(base)  // Uses defaults (15 RPS, 1 min retry)
  NewTransport(base, WithDefaultRetryAfter(time.Minute))
  NewTransport(base, WithMaxRequestsPerSecond(20))
  NewTransport(base,
    WithDefaultRetryAfter(time.Minute),
    WithMaxRequestsPerSecond(20))

Benefits:
- Explicit configuration via named options
- Future-proof for adding new options without more breaking changes
- Sensible defaults for common cases

Testing

Validated with rate limit reproducer tool that can trigger secondary limits on demand.

Future Work

  • Add Prometheus metrics for rate limiter behavior (tracking, throttling events)

This change addresses GitHub API rate limit issues by adding proactive
throttling to githubreconciler while refactoring existing SDK code into
a shared package. NO BEHAVIOR CHANGES to existing rate limiter - this is
a pure refactor to enable code reuse.

The rate limiter implementation in modules/github-bots/sdk has been in
production use via WithSecondaryRateLimitWaiter() and is battle-tested.
This commit extracts that proven implementation into a shared package
and applies it to githubreconciler.

Changes:
- Create new pkg/httpratelimit package with shared rate limiter
  - Copied line-by-line from modules/github-bots/sdk/ratelimit.go
  - Same logic for monitoring GitHub rate limit headers
  - Same behavior for pausing/retrying requests
  - Enhanced comments and log messages only

- Integrate rate limiter into pkg/githubreconciler (NEW)
  - Transport chain: OAuth2 -> Rate Limiting -> Metrics
  - Proactively throttles requests to prevent 403/429 errors
  - Works transparently without code changes

- Refactor modules/github-bots/sdk to use shared implementation
  - Maintains 100% backward compatibility
  - All existing tests pass without modification
  - Reduces code duplication (170+ lines removed)
  - Add deprecation notices pointing to new shared package

The SDK rate limiter is already used in production and tested. This
refactor enables githubreconciler to benefit from the same proven
implementation while maintaining a single source of truth for rate
limiting logic.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
This commit implements proactive rate limiting based on request velocity
to prevent GitHub's secondary rate limits, which don't provide warning
headers before triggering.

Changes:
- Add DefaultMaxRequestsPerSecond (15 RPS) constant
- Update NewTransport() to accept maxRequestsPerSecond parameter
- Update githubreconciler to use configurable rate limiting
- Update all callers and tests to pass rate limit parameter

Secondary rate limits are triggered by request velocity (req/sec), not
total quota. Unlike primary rate limits, they provide no predictive
headers - requests simply fail with 403 + Retry-After when hit.

The fix enforces a conservative 15 RPS default with burst capacity,
preventing burst patterns that trigger secondary limits.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: James Rawlings <[email protected]>
@rawlingsj rawlingsj force-pushed the fix-secondary-rate-limits-v2 branch from ae578df to 6484f2e Compare November 6, 2025 12:11
@rawlingsj rawlingsj requested review from mattmoor and tcnghia November 6, 2025 12:14
Change NewTransport signature from positional parameters to variadic
options pattern for better flexibility and future extensibility.

BREAKING CHANGE: Callers of NewTransport must update to use options:

Before:
  NewTransport(base, time.Minute)  // Old 2-param version

After:
  NewTransport(base)  // Uses defaults (15 RPS, 1 min retry)
  NewTransport(base, WithDefaultRetryAfter(time.Minute))
  NewTransport(base, WithMaxRequestsPerSecond(20))
  NewTransport(base,
    WithDefaultRetryAfter(time.Minute),
    WithMaxRequestsPerSecond(20))

Benefits:
- Explicit configuration via named options
- Future-proof for adding new options without more breaking changes
- Sensible defaults for common cases

All internal callers updated. All tests pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant