Skip to content

Conversation

jamsea
Copy link
Contributor

@jamsea jamsea commented Sep 5, 2025

🚀 Parallel LLM Racing Implementation

This PR implements a parallel LLM racing system in 45-llm-hedge.py that allows two OpenAI LLM instances to compete for the fastest response, with only the winner's frames being passed through to the TTS service.

✨ Features Added

🏁 LLMRaceProcessor

  • Custom frame processor that manages racing between two LLMs
  • Shared state coordination using class variables (_winning_llm_name, _response_started)
  • First response wins: Only the first LLM to generate LLMTextFrame is allowed through
  • Frame dropping: All subsequent frames from the losing LLM are discarded
  • Per-instance identification: Each processor knows which LLM it represents ("LLM1", "LLM2")

🔄 ParallelPipeline Architecture

parallel_llms = ParallelPipeline(
    [llm1, race_processor1],  # Branch 1: OpenAI LLM → Race Processor
    [llm2, race_processor2],  # Branch 2: OpenAI LLM → Race Processor  
)

📊 Filtered Debug Logging

  • Built-in Pipecat DebugLogObserver with frame type filtering
  • Only logs LLM frames going to TTS using FrameEndpoint.DESTINATION
  • Clean, focused logging without noise from other pipeline components

🔗 Pipeline Flow

transport.input() → stt → context_aggregator.user() → ParallelPipeline → tts → transport.output()
                                                          ↓
                                                    [llm1 → race1]
                                                    [llm2 → race2]

🎯 Key Implementation Details

  1. Shared Context: Both LLMs process frames from the same context_aggregator.user() to ensure consistency
  2. Race Logic: Uses shared class variables to coordinate state between processor instances
  3. Frame Lifecycle: Proper super().process_frame() call to handle system frames like StartFrame
  4. Performance: Fastest LLM response wins, slower responses are dropped to minimize latency

🧪 Testing

  • ✅ Pipeline architecture properly created with two LLM branches
  • ✅ Component linking correctly established
  • ✅ Client connection via WebRTC transport
  • ✅ Audio processing with VAD and Deepgram STT
  • ✅ Race state management between processor instances

📝 Usage

The system automatically races two OpenAI LLM instances on every user input:

  • First LLM to respond wins the race
  • Losing LLM's frames are dropped
  • Logs show race results with 🏆 winner, ✅ continuation, and ❌ dropped frames

🔧 Technical Notes

  • Uses Pipecat's built-in ParallelPipeline for proper frame distribution
  • Custom LLMRaceProcessor handles coordination between competing LLMs
  • Maintains backward compatibility with existing pipeline structure
  • Follows Pipecat frame processing patterns and lifecycle management

Copy link

codecov bot commented Sep 5, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
see 86 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jamsea jamsea self-assigned this Sep 5, 2025
@jamsea jamsea requested a review from markbackman September 5, 2025 09:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant