Skip to content

Conversation

Patrick-Ehimen
Copy link

@Patrick-Ehimen Patrick-Ehimen commented Aug 28, 2025

User description

This pull request introduces a comprehensive set of resilience features to the Lighthouse SDK, significantly
improving its reliability and robustness in production environments. The key additions are automatic retries
with exponential backoff, configurable rate limiting, and request timeouts.

Summary of Changes

  • Automatic Retries: The SDK now automatically retries failed API requests caused by network errors or
    specific server-side issues (e.g., 5xx status codes, 429 rate limiting). This is handled by a new withRetry
    utility that uses an exponential backoff strategy with jitter to prevent overwhelming the API.
  • Rate Limiting: A token bucket rate limiter has been implemented to control the request frequency, helping
    to avoid hitting API rate limits.
  • Request Timeouts: All API requests now include a timeout mechanism to prevent them from hanging
    indefinitely.
  • New Utilities:
    • resilientFetch: A drop-in replacement for fetch that incorporates all the new resilience features.
    • resilientUpload: A specialized version for file uploads that supports progress tracking while also
      being resilient.

This is a draft pr and I will be awaiting for review to finalize testing.

PR closes #134


PR Type

Enhancement


Description

  • Add comprehensive resilience features with retry, rate limiting, and timeouts

  • Implement exponential backoff retry mechanism with configurable conditions

  • Create token bucket rate limiter for API request throttling

  • Replace standard fetch with resilient alternatives across SDK


Diagram Walkthrough

flowchart LR
  A["Standard fetch"] --> B["resilientFetch"]
  B --> C["Rate Limiter"]
  B --> D["Retry Logic"]
  B --> E["Timeout Control"]
  F["File Upload"] --> G["resilientUpload"]
  G --> H["Progress Tracking"]
  G --> C
  G --> D
Loading

File Walkthrough

Relevant files
Enhancement
9 files
getAllKeys.ts
Replace fetch with resilientFetch for IPNS operations       
+7/-4     
index.ts
Add resilient fetch with custom retry conditions                 
+19/-9   
browser.ts
Replace fetchWithTimeout with resilientUpload for file uploads
+10/-18 
index.ts
Export resilience utilities and configuration presets       
+23/-0   
rateLimiter.ts
Implement token bucket rate limiter with burst capacity   
+46/-0   
resilientFetch.ts
Create resilient fetch wrapper with retry and timeout       
+74/-0   
resilientUpload.ts
Implement resilient upload with progress tracking support
+113/-0 
retry.ts
Add exponential backoff retry mechanism with jitter           
+64/-0   
util.ts
Export new resilience utilities from utils module               
+5/-0     
Configuration changes
1 files
apiConfig.ts
Create configuration presets for different API operation types
+39/-0   

Copy link

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🎫 Ticket compliance analysis ✅

134 - Fully compliant

Compliant requirements:

• Implement exponential backoff retry logic with configurable options
• Implement rate limiting using token bucket approach
• Implement smart retry conditions for specific error types
• Don't retry on 4xx client errors (except 429)

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Potential Race Condition

The refillTokens method modifies shared state but isn't synchronized. Multiple concurrent requests could lead to token calculation errors if they call waitForToken simultaneously.

async waitForToken(): Promise<void> {
  this.refillTokens()

  if (this.tokens >= 1) {
    this.tokens -= 1
    return
  }

  // Calculate wait time for next token
  const waitTime = (1 / this.refillRate) * 1000
  await new Promise(resolve => setTimeout(resolve, waitTime))

  // Recursively wait if still no tokens available
  return this.waitForToken()
}
Dynamic Import

The dynamic import of the retry module could cause unexpected behavior if the module fails to load at runtime. Consider using static imports for critical functionality.

const { withRetry } = await import('./retry')
return withRetry(operation, retryOptions)
Error Handling

The retry logic assumes all errors have status or code properties. This might cause unexpected behavior when handling errors that don't match the expected structure.

const defaultRetryCondition = (error: RetryableError): boolean => {
  // Network errors
  if (error.code && ['ECONNRESET', 'ETIMEDOUT', 'ENOTFOUND', 'ECONNREFUSED'].includes(error.code)) {
    return true
  }

  // HTTP status codes that should be retried
  if (error.status) {
    return error.status === 429 || // Too Many Requests
           error.status === 502 || // Bad Gateway
           error.status === 503 || // Service Unavailable
           error.status === 504    // Gateway Timeout
  }

  return false
}

Copy link

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Standardize error handling

The error object created on failed responses doesn't match the FetchError type
used elsewhere in the codebase. Use FetchError instead of a generic Error to
maintain consistency with the retry system.

src/Lighthouse/utils/resilientUpload.ts [51-77]

 xhr.onload = () => {
   const headers = new Headers()
   xhr
     .getAllResponseHeaders()
     .trim()
     .split(/[\r\n]+/)
     .forEach((line) => {
       const parts = line.split(': ')
       const header = parts.shift()
       const value = parts.join(': ')
       if (header) headers.set(header, value)
     })
 
   const response = new Response(xhr.response, {
     status: xhr.status,
     statusText: xhr.statusText,
     headers: headers,
   })
 
   if (!response.ok) {
-    const error = new Error(`Request failed with status code ${xhr.status}`)
-    ;(error as any).status = xhr.status
+    const error = new FetchError(
+      `Request failed with status code ${xhr.status}`,
+      xhr.status
+    )
     reject(error)
   } else {
     resolve(response)
   }
 }
  • Apply / Chat
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies an inconsistency in error handling and proposes using the custom FetchError type, which is crucial for the new retry mechanism to work correctly.

Medium
Fix error handling

The error handling assumes error always has a status property, but if a network
or other non-HTTP error occurs, error.status will be undefined. Add a check to
ensure the error object has the expected structure.

src/Lighthouse/podsi/index.ts [60-65]

 } catch (error: any) {
-  if (error.status === 400) {
+  if (error && error.status === 400) {
     throw new Error("Proof Doesn't exist yet")
   }
-  throw new Error(error.message)
+  throw new Error(error?.message || 'Unknown error occurred')
 }
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly points out that the error object may not have a status or message property, and the proposed change makes the error handling more robust against unexpected error types.

Medium
Prevent stack overflow risk

The recursive implementation of waitForToken() could cause a stack overflow for
high traffic scenarios. Replace the recursion with a loop to prevent potential
stack overflow errors when many tokens are needed in succession.

src/Lighthouse/utils/rateLimiter.ts [17-31]

 async waitForToken(): Promise<void> {
-  this.refillTokens()
-  
-  if (this.tokens >= 1) {
-    this.tokens -= 1
-    return
+  while (true) {
+    this.refillTokens()
+    
+    if (this.tokens >= 1) {
+      this.tokens -= 1
+      return
+    }
+    
+    // Calculate wait time for next token
+    const waitTime = (1 / this.refillRate) * 1000
+    await new Promise(resolve => setTimeout(resolve, waitTime))
   }
-  
-  // Calculate wait time for next token
-  const waitTime = (1 / this.refillRate) * 1000
-  await new Promise(resolve => setTimeout(resolve, waitTime))
-  
-  // Recursively wait if still no tokens available
-  return this.waitForToken()
 }
  • Apply / Chat
Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies that async recursion is not optimized and could lead to stack overflow, proposing a more robust iterative solution with a while loop.

Low
  • More

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Rate Limiting and Retry Logic for API Calls
1 participant