Skip to content

Meetjain1/TaskPilot-AI

Repository files navigation

TaskPilot-AI

Author: Meet Jain

This project implements a scalable backend for the TaskPilot-AI agent using FastAPI, real-time WebSocket streaming, VNC integration, and persistent session management. A minimal HTML/JS frontend is provided for demonstration.

  • The system is fully functional but requires a valid Anthropic API key
  • All code is production-ready and follows best practices

Project Overview

This is a sophisticated AI-powered computer automation system that allows Claude (Anthropic's AI) to control a virtual desktop environment through natural language commands. The system provides real-time interaction, persistent session management, and visual feedback through VNC integration.


Core Features

1. AI-Powered Computer Control

  • Natural Language Processing: Users can describe tasks in plain English
  • Desktop Automation: Claude can perform complex computer tasks automatically
  • Tool Integration: Access to file system, web browsing, application control, and more
  • Real-time Execution: Immediate feedback and progress updates

2. Session Management System

  • Persistent Sessions: Save and resume conversations across browser sessions
  • Session History: Complete audit trail of all interactions and actions
  • Multi-session Support: Handle multiple concurrent user sessions
  • Database Persistence: SQLite database for reliable data storage

3. Real-time Communication

  • WebSocket Streaming: Live updates of agent progress and tool outputs
  • Bidirectional Communication: Real-time chat between user and AI
  • Progress Indicators: Visual feedback for ongoing operations
  • Error Handling: Graceful error reporting and recovery

4. Visual Desktop Integration

  • VNC Server: Virtual desktop environment for AI control
  • noVNC Client: Web-based VNC viewer accessible via browser
  • Live Desktop View: Real-time visualization of AI actions
  • Cross-platform Access: Works on any device with a web browser

5. Modern Web Interface

  • Responsive Design: Works on desktop, tablet, and mobile devices
  • Real-time Chat: Live messaging with AI agent
  • Session Management: Easy creation and switching between sessions
  • Visual Feedback: Status indicators and progress updates

Technology Stack

Backend Technologies

  • FastAPI: Modern, fast web framework for building APIs with Python
  • SQLAlchemy: SQL toolkit and Object-Relational Mapping (ORM) library
  • WebSockets: Real-time bidirectional communication
  • Pydantic: Data validation using Python type annotations
  • Uvicorn: Lightning-fast ASGI server implementation

Frontend Technologies

  • HTML5: Semantic markup for structure
  • CSS3: Modern styling with Flexbox and responsive design
  • JavaScript (ES6+): Vanilla JS for interactivity and WebSocket communication
  • Google Fonts: Inter font family for modern typography

Infrastructure & Deployment

  • Docker: Containerization for consistent deployment
  • Docker Compose: Multi-container orchestration
  • Nginx: Web server for frontend static files
  • SQLite: Lightweight database for session storage

AI & Automation

  • Anthropic Claude API: Advanced AI model for task understanding and execution
  • Computer Use Tools: Specialized tools for desktop automation
  • Background Task Processing: Asynchronous task execution

Virtual Desktop

  • Ubuntu Desktop LXDE: Lightweight desktop environment
  • TigerVNC: High-performance VNC server
  • noVNC: HTML5 VNC client for web browsers
  • Websockify: WebSocket to TCP proxy for VNC

Prerequisites

System Requirements

  • Operating System: macOS, Linux, or Windows with Docker support
  • Docker: Version 20.10 or higher
  • Docker Compose: Version 2.0 or higher
  • Memory: Minimum 4GB RAM (8GB recommended)
  • Storage: At least 2GB free disk space
  • Network: Internet connection for AI API access

API Requirements

  • Anthropic API Key: Valid API key for Claude access
  • API Access: Active subscription to Anthropic's Claude API

Streamlit-like UI Behavior Simulation

Real-time Progress Streaming

1. Task Submission Flow

sequenceDiagram
    participant User
    participant Frontend
    participant Backend
    participant Agent
    participant Tools
    participant VNC

    User->>Frontend: Submit task "Search weather in Dubai"
    Frontend->>Backend: POST /sessions/{id}/messages
    Backend->>Backend: Store message in database
    Backend->>Frontend: 202 Accepted (immediate response)
    
    Backend->>Agent: Start background task
    Agent->>Tools: Execute browser_open
    Backend->>Frontend: WebSocket: {"type": "tool_output", "tool": "browser_open"}
    Tools->>VNC: Open Firefox (visible in VNC)
    
    Agent->>Tools: Execute web_search
    Backend->>Frontend: WebSocket: {"type": "tool_output", "tool": "web_search"}
    Tools->>VNC: Navigate to Google (visible in VNC)
    
    Agent->>Backend: Complete task
    Backend->>Frontend: WebSocket: {"type": "agent_output", "data": "Task completed"}
    Frontend->>User: Display completion message
    Frontend->>User: Prompt for new task
Loading

2. Real-time Progress Monitoring

WebSocket Message Types:

// Agent thinking
{"type": "agent_output", "data": "I'll help you search for the weather in Dubai. Let me open a web browser and search for this information."}

// Tool execution start
{"type": "tool_output", "tool_id": "browser_open", "result": "Opening Firefox browser..."}

// Tool execution progress
{"type": "tool_output", "tool_id": "web_search", "result": "Navigating to Google search..."}

// Tool execution complete
{"type": "tool_output", "tool_id": "web_search", "result": "Search completed successfully"}

// Agent response
{"type": "agent_output", "data": "I found the current weather in Dubai. The temperature is 25°C with sunny conditions."}

// Task completion
{"type": "task_complete", "status": "success", "message": "Task completed successfully"}

3. UI State Management

stateDiagram-v2
    [*] --> Idle
    Idle --> Loading: Submit Task
    Loading --> Streaming: Task Started
    Streaming --> Processing: Tool Execution
    Processing --> Streaming: Tool Complete
    Streaming --> Complete: Task Finished
    Complete --> Idle: New Task Prompt
    Complete --> Loading: Submit New Task
Loading

Architecture Details

System Architecture

graph TB
    subgraph "Frontend Layer"
        A[HTML/JS Frontend]
        B[WebSocket Client]
        C[HTTP Client]
    end
    
    subgraph "Backend Layer"
        D[FastAPI Server]
        E[WebSocket Manager]
        F[Session Manager]
        G[Agent Runner]
    end
    
    subgraph "Data Layer"
        H[SQLite Database]
        I[Session Storage]
        J[Message History]
    end
    
    subgraph "AI Layer"
        K[Anthropic Claude API]
        L[Computer Use Tools]
        M[Task Execution]
    end
    
    subgraph "Virtual Desktop"
        N[Ubuntu Desktop]
        O[TigerVNC Server]
        P[noVNC Client]
    end
    
    A --> D
    B --> E
    C --> F
    D --> H
    G --> K
    G --> L
    L --> M
    M --> N
    O --> P
    P --> A
Loading

Data Flow

  1. User Interaction: User sends message via frontend
  2. API Processing: FastAPI receives and stores message
  3. Agent Execution: Background task starts Claude agent
  4. Tool Execution: Agent uses computer tools to perform tasks
  5. Real-time Updates: WebSocket streams progress to frontend
  6. Visual Feedback: VNC shows desktop changes in real-time

Message Flow Architecture

sequenceDiagram
    participant Client
    participant API
    participant Database
    participant Agent
    participant Tools
    participant WebSocket
    participant VNC

    Client->>API: POST /sessions/{id}/messages
    API->>Database: Store message
    API->>Client: 202 Accepted
    
    API->>Agent: Start background task
    Agent->>Tools: Execute tool
    Tools->>VNC: Perform action (visible)
    Agent->>WebSocket: Send progress update
    WebSocket->>Client: Real-time update
    
    Agent->>Tools: Execute next tool
    Tools->>VNC: Perform action (visible)
    Agent->>WebSocket: Send progress update
    WebSocket->>Client: Real-time update
    
    Agent->>Database: Store final response
    Agent->>WebSocket: Send completion
    WebSocket->>Client: Task complete
Loading

WebSocket Communication Flow

graph LR
    A[Client] -->|Connect| B[WebSocket Server]
    B -->|Accept| A
    A -->|Send Message| B
    B -->|Process| C[Agent Runner]
    C -->|Tool Execution| D[Computer Tools]
    D -->|Action| E[VNC Desktop]
    C -->|Progress| B
    B -->|Stream| A
    C -->|Complete| B
    B -->|Final Update| A
Loading

Acknowledgments

  • Anthropic: For providing the Claude API and Computer Use tools
  • FastAPI: For the excellent web framework
  • Docker: For containerization technology
  • noVNC: For web-based VNC client

Contact

Connect with me through the following platforms:

LinkedIn Twitter

Social Media and Platforms

Discord Instagram Stack Overflow Medium Hashnode

About

AI that pilots your computer tasks

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published