A modern, AI-powered audio transcription and analysis application built with Streamlit, Azure Speech Services, and Azure OpenAI. Perfect for transcribing meetings, interviews, lectures, and other audio content with intelligent summarization and action item extraction.
- 🎯 High-Quality Audio Transcription - Powered by Azure Speech Services with support for multiple languages
- 🤖 AI-Powered Analysis - Intelligent summarization and action item extraction using Azure OpenAI
- ⏱️ Flexible Duration Control - Transcribe full audio or select specific time ranges
- 🔒 Enterprise Security - Built for Azure Government and Commercial clouds with managed identity support
- 📊 Real-time Statistics - Processing time, word count, and confidence metrics
- 💾 Multiple Export Formats - Download as TXT, JSON, or comprehensive analysis reports
- 🎨 Modern Web Interface - Clean, responsive UI built with Streamlit
- 🚀 Easy Deployment - Ready for Azure App Service with Docker support
- Python 3.11 or higher
- Azure subscription with Speech Services and OpenAI resources
- FFmpeg (for audio processing)
-
Clone the repository
-
Install dependencies
pip install -r app/requirements.txt
-
Set up environment variables Create a
.envfile in the.azure/captainslog/directory:AZURE_SPEECH_KEY=your_speech_service_key AZURE_SPEECH_REGION=your_speech_region AZURE_SPEECH_ENDPOINT=your_speech_endpoint AZURE_OPENAI_ENDPOINT=your_openai_endpoint AZURE_OPENAI_KEY=your_openai_key AZURE_OPENAI_MODEL_NAME=gpt-4 AZURE_OPENAI_API_VERSION=2024-02-01
-
Run the application
cd app streamlit run app.py -
Open your browser to
http://localhost:8501
-
Build the Docker image
docker build -t captains-log . -
Run the container
docker run -p 8501:8501 --env-file .env captains-log
- Azure CLI installed and configured
- Azure Developer CLI (azd) installed
-
Initialize the project
azd init
-
Deploy to Azure
azd up
This will:
- Create necessary Azure resources (App Service, Speech Services, OpenAI)
- Deploy the application
- Configure environment variables
- Set up managed identity authentication
-
Create Azure Resources
- Speech Services resource
- OpenAI resource
- App Service or Container App
- Key Vault (recommended for secrets)
-
Configure Environment Variables Set the required environment variables in your Azure App Service configuration.
-
Deploy Application Use the included Bicep templates or deploy directly via Azure CLI.
- WAV - Recommended for best quality
- MP3 - Most common format
- M4A - Apple audio format
- OGG - Open source format
- FLAC - Lossless compression
- MP4 - Video files with audio
| Variable | Description | Required |
|---|---|---|
AZURE_SPEECH_KEY |
Azure Speech Services API key | Yes |
AZURE_SPEECH_REGION |
Azure region (e.g., eastus) | Yes |
AZURE_SPEECH_ENDPOINT |
Speech service endpoint | Yes |
AZURE_OPENAI_ENDPOINT |
Azure OpenAI endpoint | Yes |
AZURE_OPENAI_KEY |
Azure OpenAI API key | Yes |
AZURE_OPENAI_MODEL_NAME |
Model name (e.g., gpt-4) | Yes |
AZURE_OPENAI_API_VERSION |
API version | Yes |
The application automatically detects and configures for Azure Government clouds:
- Uses
*.speech.azure.usendpoints - Supports government-specific authentication
- Maintains compliance requirements
- Managed Identity - Secure authentication without storing credentials
- Environment Variables - Sensitive data stored securely
- Azure Key Vault - Integration ready for enterprise secrets management
- HTTPS Only - Secure communication in production
- No Data Persistence - Audio files are processed in memory only
-
"No speech detected"
- Ensure audio contains clear speech
- Check audio format compatibility
-
Authentication errors
- Verify Azure Speech service key and region
- Check OpenAI endpoint and API key
- Ensure managed identity is properly configured
-
Format issues
- Try converting audio to WAV format
- Check if FFmpeg is properly installed
- Ensure file size is within limits
-
Long processing times
- Large files take more time to process
- Consider using duration limits for testing
- Check Azure service quotas
Enable debug logging by setting:
STREAMLIT_LOGGER_LEVEL=DEBUGWe welcome contributions!
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- Microsoft Azure - For providing excellent Speech and OpenAI services
- Streamlit - For the amazing web framework
- PyDub - For audio processing capabilities
- Open Source Community - For the various libraries and tools used