A multimodal AI assistant that:
- Accepts image input
- Performs OCR to extract text
- Uses GPT to summarize or interact with content
- Can generate updated/modified images with prompts
- Offers a FastAPI-based web server and CLI
- OCR with Tesseract
- OpenAI GPT API integration
- Automatic summary and Q&A generation
- Image generation using OpenAI or DALL·E API
- Web frontend ready (React/Next.js compatible)
Python, FastAPI, Tesseract, OpenAI, Typer, PIL
pip install -r requirements.txt
python main.py --image path/to/image.png