-
-
Notifications
You must be signed in to change notification settings - Fork 22.5k
Description
Describe the bug
When Redis connection is lost or becomes unstable (ECONNRESET errors), Flowise enters a state where:
- UI appears functional and responsive
- Message requests are accepted but never respond
- System eventually returns 504 Gateway Timeout
- No automatic recovery mechanism exists
- Entire system becomes unusable until manual restart
Root Cause: buildChatflow.ts:986 contains await job.waitUntilFinished(queueEvents) with no timeout, causing indefinite blocking when Redis queue jobs cannot be processed.
Architecture Impact: Redis becomes a single point of failure that can render the entire system non-functional, even though core AI processing could work fine without queuing.
To Reproduce
- Deploy Flowise in queue mode (MODE=queue) with Redis backend (e.g., Google Cloud Memorystore)
- Configure with unstable TLS setup (e.g., NODE_TLS_REJECT_UNAUTHORIZED=0) or network conditions causing Redis ECONNRESET
- Send a message through the UI
- Observe Redis connection drops/resets in logs
- Message request hangs indefinitely
- All subsequent message requests also hang
- System requires restart to recover
Expected behavior
Resilient behavior should include:
- Job processing timeout configuration (e.g., JOB_TIMEOUT environment variable)
- Graceful fallback to direct processing when Redis is unavailable
- Circuit breaker pattern for Redis connection failures
- Automatic retry logic with exponential backoff
- Health checks that detect Redis connectivity issues
- Error responses instead of indefinite hanging
Screenshots
No response
Flow
No response
Use Method
Docker
Flowise Version
3.0.4
Operating System
Linux
Browser
None
Additional context
Code Location: /packages/server/src/utils/buildChatflow.ts:986
// Currently: No timeout specified
const result = await job.waitUntilFinished(queueEvents)
// Should be: Configurable timeout
const result = await job.waitUntilFinished(queueEvents, {
timeout: parseInt(process.env.JOB_TIMEOUT || '300000') // 5 min default
})
Missing Resilience Patterns:
- No timeout on job.waitUntilFinished()
- No fallback when Redis unavailable
- No circuit breaker for Redis failures
- Health checks don't verify Redis connectivity
- No job processing timeout configuration via environment variables
BullMQ Support: BullMQ v5.45.2 supports timeouts, but Flowise doesn't expose this configuration.