feat: Add resilience features - retry, rate limiting, and timeouts #135

Patrick-Ehimen · 2025-08-28T21:39:56Z

User description

This pull request introduces a comprehensive set of resilience features to the Lighthouse SDK, significantly
improving its reliability and robustness in production environments. The key additions are automatic retries
with exponential backoff, configurable rate limiting, and request timeouts.

Summary of Changes

Automatic Retries: The SDK now automatically retries failed API requests caused by network errors or
specific server-side issues (e.g., 5xx status codes, 429 rate limiting). This is handled by a new withRetry
utility that uses an exponential backoff strategy with jitter to prevent overwhelming the API.
Rate Limiting: A token bucket rate limiter has been implemented to control the request frequency, helping
to avoid hitting API rate limits.
Request Timeouts: All API requests now include a timeout mechanism to prevent them from hanging
indefinitely.
New Utilities:
- resilientFetch: A drop-in replacement for fetch that incorporates all the new resilience features.
- resilientUpload: A specialized version for file uploads that supports progress tracking while also
  being resilient.

This is a draft pr and I will be awaiting for review to finalize testing.

PR closes #134

PR Type

Enhancement

Description

Add comprehensive resilience features with retry, rate limiting, and timeouts
Implement exponential backoff retry mechanism with configurable conditions
Create token bucket rate limiter for API request throttling
Replace standard fetch with resilient alternatives across SDK

Diagram Walkthrough

flowchart LR
  A["Standard fetch"] --> B["resilientFetch"]
  B --> C["Rate Limiter"]
  B --> D["Retry Logic"]
  B --> E["Timeout Control"]
  F["File Upload"] --> G["resilientUpload"]
  G --> H["Progress Tracking"]
  G --> C
  G --> D

File Walkthrough

Relevant files

Enhancement

9 files

getAllKeys.ts `Replace fetch with resilientFetch for IPNS operations`	+7/-4
index.ts `Add resilient fetch with custom retry conditions`	+19/-9
browser.ts `Replace fetchWithTimeout with resilientUpload for file uploads`	+10/-18
index.ts `Export resilience utilities and configuration presets`	+23/-0
rateLimiter.ts `Implement token bucket rate limiter with burst capacity`	+46/-0
resilientFetch.ts `Create resilient fetch wrapper with retry and timeout`	+74/-0
resilientUpload.ts `Implement resilient upload with progress tracking support`	+113/-0
retry.ts `Add exponential backoff retry mechanism with jitter`	+64/-0
util.ts `Export new resilience utilities from utils module`	+5/-0

Configuration changes

1 files

apiConfig.ts `Create configuration presets for different API operation types`	+39/-0

codiumai-pr-agent-free · 2025-08-28T21:40:33Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🎫 Ticket compliance analysis ✅ 134 - Fully compliant Compliant requirements: • Implement exponential backoff retry logic with configurable options • Implement rate limiting using token bucket approach • Implement smart retry conditions for specific error types • Don't retry on 4xx client errors (except 429)
⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Potential Race Condition The refillTokens method modifies shared state but isn't synchronized. Multiple concurrent requests could lead to token calculation errors if they call waitForToken simultaneously. async waitForToken(): Promise<void> { this.refillTokens() if (this.tokens >= 1) { this.tokens -= 1 return } // Calculate wait time for next token const waitTime = (1 / this.refillRate) * 1000 await new Promise(resolve => setTimeout(resolve, waitTime)) // Recursively wait if still no tokens available return this.waitForToken() } Dynamic Import The dynamic import of the retry module could cause unexpected behavior if the module fails to load at runtime. Consider using static imports for critical functionality. const { withRetry } = await import('./retry') return withRetry(operation, retryOptions) Error Handling The retry logic assumes all errors have status or code properties. This might cause unexpected behavior when handling errors that don't match the expected structure. const defaultRetryCondition = (error: RetryableError): boolean => { // Network errors if (error.code && ['ECONNRESET', 'ETIMEDOUT', 'ENOTFOUND', 'ECONNREFUSED'].includes(error.code)) { return true } // HTTP status codes that should be retried if (error.status) { return error.status === 429 \|\| // Too Many Requests error.status === 502 \|\| // Bad Gateway error.status === 503 \|\| // Service Unavailable error.status === 504 // Gateway Timeout } return false }

codiumai-pr-agent-free · 2025-08-28T21:41:18Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Standardize error handling The error object created on failed responses doesn't match the `FetchError` type used elsewhere in the codebase. Use `FetchError` instead of a generic `Error` to maintain consistency with the retry system. src/Lighthouse/utils/resilientUpload.ts [51-77] xhr.onload = () => { const headers = new Headers() xhr .getAllResponseHeaders() .trim() .split(/[\r\n]+/) .forEach((line) => { const parts = line.split(': ') const header = parts.shift() const value = parts.join(': ') if (header) headers.set(header, value) }) const response = new Response(xhr.response, { status: xhr.status, statusText: xhr.statusText, headers: headers, }) if (!response.ok) { - const error = new Error(`Request failed with status code ${xhr.status}`) - ;(error as any).status = xhr.status + const error = new FetchError( + `Request failed with status code ${xhr.status}`, + xhr.status + ) reject(error) } else { resolve(response) } } Apply / Chat Suggestion importance[1-10]: 8 __ Why: The suggestion correctly identifies an inconsistency in error handling and proposes using the custom `FetchError` type, which is crucial for the new retry mechanism to work correctly.	Medium
	Fix error handling The error handling assumes `error` always has a `status` property, but if a network or other non-HTTP error occurs, `error.status` will be undefined. Add a check to ensure the error object has the expected structure. src/Lighthouse/podsi/index.ts [60-65] } catch (error: any) { - if (error.status === 400) { + if (error && error.status === 400) { throw new Error("Proof Doesn't exist yet") } - throw new Error(error.message) + throw new Error(error?.message \|\| 'Unknown error occurred') } Apply / Chat Suggestion importance[1-10]: 7 __ Why: The suggestion correctly points out that the `error` object may not have a `status` or `message` property, and the proposed change makes the error handling more robust against unexpected error types.	Medium
	Prevent stack overflow risk The recursive implementation of `waitForToken()` could cause a stack overflow for high traffic scenarios. Replace the recursion with a loop to prevent potential stack overflow errors when many tokens are needed in succession. src/Lighthouse/utils/rateLimiter.ts [17-31] async waitForToken(): Promise<void> { - this.refillTokens() - - if (this.tokens >= 1) { - this.tokens -= 1 - return + while (true) { + this.refillTokens() + + if (this.tokens >= 1) { + this.tokens -= 1 + return + } + + // Calculate wait time for next token + const waitTime = (1 / this.refillRate) * 1000 + await new Promise(resolve => setTimeout(resolve, waitTime)) } - - // Calculate wait time for next token - const waitTime = (1 / this.refillRate) * 1000 - await new Promise(resolve => setTimeout(resolve, waitTime)) - - // Recursively wait if still no tokens available - return this.waitForToken() } Apply / Chat Suggestion importance[1-10]: 6 __ Why: The suggestion correctly identifies that async recursion is not optimized and could lead to stack overflow, proposing a more robust iterative solution with a `while` loop.	Low
More

feat: Add resilience features - retry, rate limiting, and timeouts

8a65b96

codiumai-pr-agent-free bot added the Review effort 3/5 label Aug 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add resilience features - retry, rate limiting, and timeouts #135

feat: Add resilience features - retry, rate limiting, and timeouts #135

Uh oh!

Patrick-Ehimen commented Aug 28, 2025 •

edited by codiumai-pr-agent-free bot

Loading

Uh oh!

codiumai-pr-agent-free bot commented Aug 28, 2025

Uh oh!

codiumai-pr-agent-free bot commented Aug 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: Add resilience features - retry, rate limiting, and timeouts #135

Are you sure you want to change the base?

feat: Add resilience features - retry, rate limiting, and timeouts #135

Uh oh!

Conversation

Patrick-Ehimen commented Aug 28, 2025 • edited by codiumai-pr-agent-free bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

codiumai-pr-agent-free bot commented Aug 28, 2025

PR Reviewer Guide 🔍

Uh oh!

codiumai-pr-agent-free bot commented Aug 28, 2025

PR Code Suggestions ✨

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Patrick-Ehimen commented Aug 28, 2025 •

edited by codiumai-pr-agent-free bot

Loading