Skip to content

A multimodal AI assistant that accepts image input, extracts text via OCR, summarizes it with GPT, and can generate new images from prompts. Includes a FastAPI backend, Typer CLI, and OpenAI integration for seamless multimodal interaction.

Notifications You must be signed in to change notification settings

RafaTachinardi/vision_gpt_assistant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Vision-GPT Assistant 👁️🧠

A multimodal AI assistant that:

  • Accepts image input
  • Performs OCR to extract text
  • Uses GPT to summarize or interact with content
  • Can generate updated/modified images with prompts
  • Offers a FastAPI-based web server and CLI

Features

  • OCR with Tesseract
  • OpenAI GPT API integration
  • Automatic summary and Q&A generation
  • Image generation using OpenAI or DALL·E API
  • Web frontend ready (React/Next.js compatible)

Tech Stack

Python, FastAPI, Tesseract, OpenAI, Typer, PIL

Run

pip install -r requirements.txt
python main.py --image path/to/image.png

About

A multimodal AI assistant that accepts image input, extracts text via OCR, summarizes it with GPT, and can generate new images from prompts. Includes a FastAPI backend, Typer CLI, and OpenAI integration for seamless multimodal interaction.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages