Skip to content

Redis connection failures cause indefinite blocking in queue mode with no timeout or recovery mechanism #5126

@utsav-bx

Description

@utsav-bx

Describe the bug

When Redis connection is lost or becomes unstable (ECONNRESET errors), Flowise enters a state where:

  • UI appears functional and responsive
  • Message requests are accepted but never respond
  • System eventually returns 504 Gateway Timeout
  • No automatic recovery mechanism exists
  • Entire system becomes unusable until manual restart

Root Cause: buildChatflow.ts:986 contains await job.waitUntilFinished(queueEvents) with no timeout, causing indefinite blocking when Redis queue jobs cannot be processed.

Architecture Impact: Redis becomes a single point of failure that can render the entire system non-functional, even though core AI processing could work fine without queuing.

To Reproduce

  1. Deploy Flowise in queue mode (MODE=queue) with Redis backend (e.g., Google Cloud Memorystore)
  2. Configure with unstable TLS setup (e.g., NODE_TLS_REJECT_UNAUTHORIZED=0) or network conditions causing Redis ECONNRESET
  3. Send a message through the UI
  4. Observe Redis connection drops/resets in logs
  5. Message request hangs indefinitely
  6. All subsequent message requests also hang
  7. System requires restart to recover

Expected behavior

Resilient behavior should include:

  • Job processing timeout configuration (e.g., JOB_TIMEOUT environment variable)
  • Graceful fallback to direct processing when Redis is unavailable
  • Circuit breaker pattern for Redis connection failures
  • Automatic retry logic with exponential backoff
  • Health checks that detect Redis connectivity issues
  • Error responses instead of indefinite hanging

Screenshots

No response

Flow

No response

Use Method

Docker

Flowise Version

3.0.4

Operating System

Linux

Browser

None

Additional context

Code Location: /packages/server/src/utils/buildChatflow.ts:986
// Currently: No timeout specified
const result = await job.waitUntilFinished(queueEvents)

// Should be: Configurable timeout
const result = await job.waitUntilFinished(queueEvents, {
timeout: parseInt(process.env.JOB_TIMEOUT || '300000') // 5 min default
})

Missing Resilience Patterns:

  • No timeout on job.waitUntilFinished()
  • No fallback when Redis unavailable
  • No circuit breaker for Redis failures
  • Health checks don't verify Redis connectivity
  • No job processing timeout configuration via environment variables

BullMQ Support: BullMQ v5.45.2 supports timeouts, but Flowise doesn't expose this configuration.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions