Author: Meet Jain
This project implements a scalable backend for the TaskPilot-AI agent using FastAPI, real-time WebSocket streaming, VNC integration, and persistent session management. A minimal HTML/JS frontend is provided for demonstration.
- The system is fully functional but requires a valid Anthropic API key
- All code is production-ready and follows best practices
This is a sophisticated AI-powered computer automation system that allows Claude (Anthropic's AI) to control a virtual desktop environment through natural language commands. The system provides real-time interaction, persistent session management, and visual feedback through VNC integration.
- Natural Language Processing: Users can describe tasks in plain English
- Desktop Automation: Claude can perform complex computer tasks automatically
- Tool Integration: Access to file system, web browsing, application control, and more
- Real-time Execution: Immediate feedback and progress updates
- Persistent Sessions: Save and resume conversations across browser sessions
- Session History: Complete audit trail of all interactions and actions
- Multi-session Support: Handle multiple concurrent user sessions
- Database Persistence: SQLite database for reliable data storage
- WebSocket Streaming: Live updates of agent progress and tool outputs
- Bidirectional Communication: Real-time chat between user and AI
- Progress Indicators: Visual feedback for ongoing operations
- Error Handling: Graceful error reporting and recovery
- VNC Server: Virtual desktop environment for AI control
- noVNC Client: Web-based VNC viewer accessible via browser
- Live Desktop View: Real-time visualization of AI actions
- Cross-platform Access: Works on any device with a web browser
- Responsive Design: Works on desktop, tablet, and mobile devices
- Real-time Chat: Live messaging with AI agent
- Session Management: Easy creation and switching between sessions
- Visual Feedback: Status indicators and progress updates
- FastAPI: Modern, fast web framework for building APIs with Python
- SQLAlchemy: SQL toolkit and Object-Relational Mapping (ORM) library
- WebSockets: Real-time bidirectional communication
- Pydantic: Data validation using Python type annotations
- Uvicorn: Lightning-fast ASGI server implementation
- HTML5: Semantic markup for structure
- CSS3: Modern styling with Flexbox and responsive design
- JavaScript (ES6+): Vanilla JS for interactivity and WebSocket communication
- Google Fonts: Inter font family for modern typography
- Docker: Containerization for consistent deployment
- Docker Compose: Multi-container orchestration
- Nginx: Web server for frontend static files
- SQLite: Lightweight database for session storage
- Anthropic Claude API: Advanced AI model for task understanding and execution
- Computer Use Tools: Specialized tools for desktop automation
- Background Task Processing: Asynchronous task execution
- Ubuntu Desktop LXDE: Lightweight desktop environment
- TigerVNC: High-performance VNC server
- noVNC: HTML5 VNC client for web browsers
- Websockify: WebSocket to TCP proxy for VNC
- Operating System: macOS, Linux, or Windows with Docker support
- Docker: Version 20.10 or higher
- Docker Compose: Version 2.0 or higher
- Memory: Minimum 4GB RAM (8GB recommended)
- Storage: At least 2GB free disk space
- Network: Internet connection for AI API access
- Anthropic API Key: Valid API key for Claude access
- API Access: Active subscription to Anthropic's Claude API
sequenceDiagram
participant User
participant Frontend
participant Backend
participant Agent
participant Tools
participant VNC
User->>Frontend: Submit task "Search weather in Dubai"
Frontend->>Backend: POST /sessions/{id}/messages
Backend->>Backend: Store message in database
Backend->>Frontend: 202 Accepted (immediate response)
Backend->>Agent: Start background task
Agent->>Tools: Execute browser_open
Backend->>Frontend: WebSocket: {"type": "tool_output", "tool": "browser_open"}
Tools->>VNC: Open Firefox (visible in VNC)
Agent->>Tools: Execute web_search
Backend->>Frontend: WebSocket: {"type": "tool_output", "tool": "web_search"}
Tools->>VNC: Navigate to Google (visible in VNC)
Agent->>Backend: Complete task
Backend->>Frontend: WebSocket: {"type": "agent_output", "data": "Task completed"}
Frontend->>User: Display completion message
Frontend->>User: Prompt for new task
WebSocket Message Types:
// Agent thinking
{"type": "agent_output", "data": "I'll help you search for the weather in Dubai. Let me open a web browser and search for this information."}
// Tool execution start
{"type": "tool_output", "tool_id": "browser_open", "result": "Opening Firefox browser..."}
// Tool execution progress
{"type": "tool_output", "tool_id": "web_search", "result": "Navigating to Google search..."}
// Tool execution complete
{"type": "tool_output", "tool_id": "web_search", "result": "Search completed successfully"}
// Agent response
{"type": "agent_output", "data": "I found the current weather in Dubai. The temperature is 25°C with sunny conditions."}
// Task completion
{"type": "task_complete", "status": "success", "message": "Task completed successfully"}
stateDiagram-v2
[*] --> Idle
Idle --> Loading: Submit Task
Loading --> Streaming: Task Started
Streaming --> Processing: Tool Execution
Processing --> Streaming: Tool Complete
Streaming --> Complete: Task Finished
Complete --> Idle: New Task Prompt
Complete --> Loading: Submit New Task
graph TB
subgraph "Frontend Layer"
A[HTML/JS Frontend]
B[WebSocket Client]
C[HTTP Client]
end
subgraph "Backend Layer"
D[FastAPI Server]
E[WebSocket Manager]
F[Session Manager]
G[Agent Runner]
end
subgraph "Data Layer"
H[SQLite Database]
I[Session Storage]
J[Message History]
end
subgraph "AI Layer"
K[Anthropic Claude API]
L[Computer Use Tools]
M[Task Execution]
end
subgraph "Virtual Desktop"
N[Ubuntu Desktop]
O[TigerVNC Server]
P[noVNC Client]
end
A --> D
B --> E
C --> F
D --> H
G --> K
G --> L
L --> M
M --> N
O --> P
P --> A
- User Interaction: User sends message via frontend
- API Processing: FastAPI receives and stores message
- Agent Execution: Background task starts Claude agent
- Tool Execution: Agent uses computer tools to perform tasks
- Real-time Updates: WebSocket streams progress to frontend
- Visual Feedback: VNC shows desktop changes in real-time
sequenceDiagram
participant Client
participant API
participant Database
participant Agent
participant Tools
participant WebSocket
participant VNC
Client->>API: POST /sessions/{id}/messages
API->>Database: Store message
API->>Client: 202 Accepted
API->>Agent: Start background task
Agent->>Tools: Execute tool
Tools->>VNC: Perform action (visible)
Agent->>WebSocket: Send progress update
WebSocket->>Client: Real-time update
Agent->>Tools: Execute next tool
Tools->>VNC: Perform action (visible)
Agent->>WebSocket: Send progress update
WebSocket->>Client: Real-time update
Agent->>Database: Store final response
Agent->>WebSocket: Send completion
WebSocket->>Client: Task complete
graph LR
A[Client] -->|Connect| B[WebSocket Server]
B -->|Accept| A
A -->|Send Message| B
B -->|Process| C[Agent Runner]
C -->|Tool Execution| D[Computer Tools]
D -->|Action| E[VNC Desktop]
C -->|Progress| B
B -->|Stream| A
C -->|Complete| B
B -->|Final Update| A
- Anthropic: For providing the Claude API and Computer Use tools
- FastAPI: For the excellent web framework
- Docker: For containerization technology
- noVNC: For web-based VNC client
Connect with me through the following platforms: