Skip to content

defmethodinc/ai-content-process

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI Content Processing Tool

A powerful Python application that extracts text from various file types and converts text to speech using AI services. Available as both a command-line tool and a FastAPI web service. It automatically routes different file types to the most appropriate AI service for optimal text extraction and provides professional-quality text-to-speech generation for podcast creation.

Features

Text Extraction

  • Multi-format Support: Extract text from PDFs, documents, spreadsheets, videos, and audio files
  • AI-Powered: Uses OpenAI GPT for documents and Google Gemini for multimedia
  • Smart Routing: Automatically selects the best AI service for each file type
  • YouTube Support: Extract transcripts and audio from YouTube videos
  • YouTube Transcripts: Fast transcript extraction using existing captions
  • Batch Processing: Handle multiple files or URLs at once

Text-to-Speech (NEW)

  • Multiple Voices: 6 different AI voices (alloy, echo, fable, onyx, nova, shimmer)
  • Podcast Mode: Multi-voice support perfect for podcast creation with different speakers
  • High Quality: Two quality levels (tts-1 and tts-1-hd) with multiple audio formats
  • Speed Control: Adjustable speech speed from 0.25x to 4.0x
  • Smart Chunking: Automatically handles long text with natural break points
  • Speaker Management: Organize audio by speaker names for easy podcast assembly

Translation Service (NEW)

  • 47+ Languages: Support for major world languages including English, Spanish, French, German, Chinese, Japanese, and more
  • Dual AI Support: Choose between OpenAI GPT and Google Gemini for translation
  • Auto-Detection: Automatically detect source language when unknown
  • Batch Translation: Translate multiple texts efficiently in a single request
  • Language Detection: Identify the language of any text with confidence scores
  • High Quality: Professional-grade translations preserving meaning and tone

General

  • Dual Interface: Command-line tool and REST API server
  • Parallel Processing: Process multiple files simultaneously for faster results
  • Comprehensive Output: Detailed results with processing time, file info, and extraction statistics
  • RESTful API: Complete FastAPI server with auto-generated documentation

Supported File Types

OpenAI GPT (Documents & Text)

  • PDF files (.pdf)
  • Text files (.txt)
  • Word documents (.doc, .docx)
  • Excel spreadsheets (.xls, .xlsx)

Google Gemini (Multimedia)

  • Video files (.mp4, .avi, .mov, .mkv)
  • Audio files (.mp3, .wav, .m4a, .webm, .ogg)

YouTube Processor

  • YouTube videos and audio content
  • Supports various YouTube URL formats
  • Two methods available:
    • Fast Transcript Extraction: Uses existing captions/subtitles (3-5 seconds)
    • Audio Processing: Downloads and transcribes audio using AI (30-60 seconds)

Text-to-Speech Capabilities

Available Voices

  • alloy: A balanced voice, suitable for most content
  • echo: A warm, friendly voice
  • fable: A storytelling voice with character
  • onyx: A deep, authoritative voice
  • nova: A bright, energetic voice
  • shimmer: A soft, gentle voice

Audio Formats

  • MP3: Good balance of quality and size (recommended)
  • FLAC: Highest quality, larger files
  • Opus: Good for web streaming
  • AAC: Good for mobile apps

Quality Models

  • tts-1: Fast processing, good quality
  • tts-1-hd: High-definition quality, slightly slower

Perfect for Podcasts

  • Multi-speaker support: Use different voices for different speakers
  • Speaker names: Organize audio files by speaker
  • Natural conversations: Create realistic dialogues
  • Professional quality: Broadcast-ready audio output

PDF Processing Features

Resume-Only Mode for Large Files

For PDF files larger than 30MB, the system automatically switches to resume-only mode to provide efficient processing:

  • Smart Sampling: Extracts text from up to 40 strategically selected pages (beginning, middle, end)
  • AI-Generated Analysis: Creates a comprehensive 1200-2000 word analysis using OpenAI
  • Fast Processing: Significantly faster than full document analysis
  • Clear Indication: Output clearly shows this is an analysis from a large file

Configuration Options

# Normal usage (automatic resume for >30MB files)
python main.py large_document.pdf

# Force full processing (overrides resume-only mode)
PROCESS_FULL_DOCUMENT=true python main.py large_document.pdf

# Disable resume-only mode entirely
PDF_RESUME_ONLY_LARGE=false python main.py large_document.pdf

Example Output

=== DOCUMENT RESUME ===
Document: large_manual.pdf
Total pages: 250
File size: 45.2MB
Processing mode: Resume-only (large file)
Pages sampled: 40

[AI-generated comprehensive analysis follows...]

📝 ANALYSIS NOTE: This comprehensive analysis was generated from 40 strategically
selected pages due to the large file size (>30MB). The sampling provides broad
coverage but may not include every detail. For complete analysis of all 250 pages,
use full document processing mode.

Quick Start

🚀 FastAPI Web Service (Recommended)

  1. Setup and Start API Server:

    # Install dependencies
    pip install -r requirements.txt
    
    # Setup API keys (copy env_example.txt to .env and add your keys)
    cp env_example.txt .env
    
    # Start server
    python api_server.py
  2. Test the API:

    # Health check
    curl http://localhost:8000/health
    
    # Extract YouTube transcript (fast)
    curl -X POST http://localhost:8000/youtube-transcript \
      -H "Content-Type: application/json" \
      -d '{"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"}'
    
    # Upload and extract from file
    curl -X POST -F "[email protected]" http://localhost:8000/extract
    
    # Convert text to speech (single voice)
    curl -X POST http://localhost:8000/text-to-speech \
      -H "Content-Type: application/json" \
      -d '{"text": "Hello! This is a test of our text-to-speech API.", "voice": "alloy"}'
    
    # Create podcast with multiple voices
    curl -X POST http://localhost:8000/text-to-speech-podcast \
      -H "Content-Type: application/json" \
      -d '{
        "segments": [
          {"text": "Welcome to our podcast!", "voice": "nova", "speaker_name": "Host"},
          {"text": "Thanks for having me!", "voice": "onyx", "speaker_name": "Guest"}
        ]
      }'
  3. View Interactive Documentation:

🖥️ Command Line Interface

  1. Setup:

    # Install dependencies
    pip install -r requirements.txt
    
    # Setup environment variables
    cp env_example.txt .env
    # Edit .env file with your API keys
  2. Basic Usage:

    # Extract from single file
    python main.py document.pdf
    
    # Extract from multiple files with parallel processing
    python main.py file1.pdf file2.docx video.mp4 --parallel
    
    # Extract from YouTube video
    python main.py "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
    
    # Process directory
    python main.py documents/ --parallel --max-workers 8

New YouTube Transcript Feature

Fast Transcript Extraction

The new /youtube-transcript endpoint provides ultra-fast transcript extraction:

  • Speed: 3-5 seconds vs 30-60 seconds for audio processing
  • Method: Uses YouTube's existing captions/subtitles
  • Quality: Original caption quality (manual or auto-generated)
  • No Downloads: No audio file downloading required

Usage Examples

Basic transcript extraction:

curl -X POST http://localhost:8000/youtube-transcript \
  -H "Content-Type: application/json" \
  -d '{"url": "https://www.youtube.com/watch?v=VIDEO_ID"}'

With language preferences:

curl -X POST http://localhost:8000/youtube-transcript \
  -H "Content-Type: application/json" \
  -d '{"url": "https://www.youtube.com/watch?v=VIDEO_ID", "language": "en", "manual_only": true}'

Response Format

{
  "success": true,
  "extracted_text": "=== YOUTUBE TRANSCRIPT ===\nTitle: Video Title\nChannel: Channel Name\n[00:18] First subtitle text\n[00:22] Second subtitle text\n...",
  "text_length": 2896,
  "processing_time": 4.01,
  "file_info": {
    "name": "Video Title",
    "duration": 213,
    "video_id": "VIDEO_ID",
    "transcript_language": "en"
  }
}

Installation & Dependencies

Required Dependencies

pip install -r requirements.txt

Key packages:

  • openai>=1.3.0 - OpenAI API client
  • google-generativeai>=0.3.0 - Google Gemini API
  • fastapi>=0.104.0 - Web API framework
  • youtube-transcript-api==1.2.1 - YouTube transcript extraction (NEW)
  • yt-dlp>=2024.1.1 - YouTube video processing

System Dependencies

For YouTube video processing:

# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt update && sudo apt install ffmpeg

# Windows (using chocolatey)
choco install ffmpeg

API Keys Setup

Create .env file:

OPENAI_API_KEY=your_openai_api_key_here
GOOGLE_API_KEY=your_gemini_api_key_here

Troubleshooting

Python Version Issues

If you encounter ModuleNotFoundError for youtube_transcript_api:

  1. Check Python version:

    python3 --version
    which python3
  2. Use explicit Python path if needed:

    # Find Python installation
    ls /opt/homebrew/opt/python@*/bin/python*
    
    # Use explicit path
    /opt/homebrew/opt/[email protected]/bin/python3.13 api_server.py

Common Warnings (Safe to Ignore)

  • PyDub not available - No module named 'pyaudioop' - Audio processing works via other methods
  • FastAPI deprecation warnings - Functionality works normally
  • SSL/OpenSSL warnings - System-level warnings that don't affect functionality

YouTube Processing Issues

  1. No transcript available: Not all videos have captions
    • Try the audio-based /extract-youtube endpoint as fallback
  2. Rate limiting: YouTube may block requests
    • Use cookie authentication (see setup instructions)

Performance Comparison

Method Speed Requirements Best For
/youtube-transcript 3-5s Existing captions Quick extraction
/extract-youtube 30-60s Audio download + AI When no captions exist
Small PDF processing 2-10s File upload PDFs under 30MB
Large PDF resume-only 25-45s File upload PDFs over 30MB (40-page analysis)
Large PDF full processing 60-300s File upload + Full mode Complete analysis of large PDFs
Video/Audio AI 30-120s File upload + AI Media files

Command Line Examples

# Single file processing
python main.py document.pdf
python main.py presentation.pptx
python main.py audio.mp3 audio.webm audio.ogg

# YouTube content
python main.py "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
python main.py "https://youtu.be/9bZkp7q19f0"

# Batch processing with parallel execution
python main.py file1.pdf file2.docx video.mp4 --parallel --max-workers 4

# Directory processing
python main.py documents/ --parallel

# Custom output file
python main.py document.pdf --output results.txt

# Verbose output
python main.py document.pdf --verbose

API Usage Examples

Python Client

import requests

# Extract YouTube transcript (fast method)
response = requests.post(
    "http://localhost:8000/youtube-transcript",
    json={"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"}
)
result = response.json()
print(f"Extracted {result['text_length']} characters in {result['processing_time']:.1f}s")

# Upload file for processing
with open("document.pdf", "rb") as f:
    files = {"file": ("document.pdf", f, "application/pdf")}
    response = requests.post("http://localhost:8000/extract", files=files)
    result = response.json()
    print(result['extracted_text'][:200])

cURL Examples

# Health check
curl http://localhost:8000/health

# Extract from URL
curl -X POST http://localhost:8000/extract-url \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/document.pdf"}'

# Batch processing
curl -X POST http://localhost:8000/extract-batch-url \
  -H "Content-Type: application/json" \
  -d '{"urls": ["https://example.com/doc1.pdf", "https://example.com/doc2.txt"]}'

# Text-to-Speech (single voice)
curl -X POST http://localhost:8000/text-to-speech \
  -H "Content-Type: application/json" \
  -d '{"text": "Welcome to our AI-powered content processing service!", "voice": "nova"}'

# Text-to-Speech (podcast mode with multiple voices)
curl -X POST http://localhost:8000/text-to-speech-podcast \
  -H "Content-Type: application/json" \
  -d '{
    "segments": [
      {"text": "Good morning, listeners!", "voice": "nova", "speaker_name": "Host"},
      {"text": "Thanks for having me on the show.", "voice": "onyx", "speaker_name": "Guest"}
    ]
  }'

# Translation Service (single text)
curl -X POST http://localhost:8000/translate \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello, how are you?", "source_language": "en", "target_language": "es"}'

# Translation Service (auto-detect language)
curl -X POST http://localhost:8000/translate \
  -H "Content-Type: application/json" \
  -d '{"text": "¿Cómo está el clima?", "source_language": "auto-detect", "target_language": "en"}'

# Language Detection
curl -X POST http://localhost:8000/detect-language \
  -H "Content-Type: application/json" \
  -d '{"text": "Bonjour, comment allez-vous?"}'

# Batch Translation
curl -X POST http://localhost:8000/translate-batch \
  -H "Content-Type: application/json" \
  -d '{"texts": ["Hello", "Goodbye", "Thank you"], "source_language": "en", "target_language": "fr"}'

# Get available TTS voices
curl http://localhost:8000/tts-voices

Architecture

  • Modular Design: Separate processors for different file types
  • Smart Routing: Automatic processor selection based on file type
  • Parallel Processing: Concurrent file processing for better performance
  • Error Handling: Graceful handling of individual file failures
  • Cleanup: Automatic temporary file cleanup
  • API Integration: RESTful API with comprehensive documentation

Development

Running Tests

# Test the API
python api_client_examples.py

# Test YouTube functionality
python youtube_client_examples.py

# Test URL processing
python url_client_examples.py

# Test translation service
python translation_examples.py

# Test translation processor
python test_translation_service.py

File Structure

ai-content-process/
├── main.py                 # CLI interface
├── api_server.py           # FastAPI server
├── requirements.txt        # Dependencies
├── src/
│   ├── config.py          # Configuration
│   ├── text_extractor.py  # Main orchestrator
│   └── file_processors/   # Processor modules
│       ├── openai_processor.py
│       ├── gemini_processor.py
│       ├── youtube_processor.py
│       ├── youtube_transcript_processor.py
│       ├── tts_processor.py           # NEW - Text-to-Speech
│       └── translation_processor.py   # NEW - Translation Service
└── examples/              # Usage examples

Additional Documentation

Text-to-Speech

For detailed TTS usage examples and advanced features, see:

  • examples/TTS_CURL_EXAMPLES.md - Comprehensive curl examples for all TTS endpoints
  • Voice samples and use cases - Perfect for podcast creation, audiobooks, and voice-overs
  • API endpoint reference - Complete parameter documentation and response formats

Translation Service

For comprehensive translation service documentation and examples, see:

Other Resources

License

MIT License - see LICENSE file for details.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published