A Go (Golang) backend service for transcribing audio files using FFmpeg for processing and the Groq API for transcription.
- Upload and transcribe audio files (MP3, WAV, FLAC, M4A, etc.)
- Audio preprocessing with FFmpeg for optimal transcription quality
- Audio chunking for large files to improve transcription accuracy
- Parallel processing of audio chunks for faster results
- RESTful API for easy integration with frontend applications
- Go (Golang)
- Gin framework for HTTP routing and middleware
- FFmpeg for audio processing
- Groq API for transcription (using the Whisper model)
- UUID generation for temporary file management
- Go 1.16+
- FFmpeg installed on the system
- FFprobe installed on the system (comes with FFmpeg)
- Groq API key
-
Clone the repository:
git clone <repository-url> cd <repository-directory>
-
Install dependencies:
go mod download
-
Add your Groq API key to the
transcribeAudiofunction inmain.go:// Replace the empty string with your API key apiKey := "your_groq_api_key_here"
Start the server:
go run main.goThe server will run on port 8080 by default.
Endpoint: POST /api/transcribe
Request:
- Content-Type:
multipart/form-data - Body:
file: Audio file (MP3, WAV, FLAC, M4A, etc.)
Response:
{
"transcription": "This is the transcribed text from the audio file..."
}Error Response:
{
"error": "Error message describing what went wrong"
}- File Upload: The API accepts audio file uploads via a multipart form.
- Preprocessing: The uploaded audio is preprocessed using FFmpeg:
- Converted to 16kHz sample rate
- Reduced to mono channel
- Converted to FLAC format for optimal transcription
- Chunking: Large audio files are split into manageable chunks (2 minutes each with a 1-second overlap)
- Parallel Processing: Multiple chunks are transcribed simultaneously (limited to 5 concurrent operations)
- Transcription: Each chunk is sent to the Groq API for transcription using the
distil-whisper-large-v3-enmodel - Combination: Results are combined and returned as a complete transcription
- Main Function: Sets up the Gin router with CORS configuration and defines the API routes
- transcribeAudio: The main handler function that orchestrates the audio processing and transcription
- preprocessAudioFile: Processes audio files to prepare them for transcription
- getAudioChunkData: Analyzes audio files to determine chunking parameters
- chunkifyAudioFile: Splits large audio files into smaller chunks
- createAudioChunkFile: Creates individual audio chunk files
- transcribeChunk: Sends audio chunks to the Groq API for transcription
- deleteFiles: Cleans up temporary files
To add additional functionality:
- Add new route handlers in
main.go - Create helper functions for new features
- Update the error handling as needed
The API is configured to accept requests from:
http://localhost:5173(default Vue.js development server)
If you need to allow additional origins, update the CORS configuration in the main function.
The API includes comprehensive error handling:
- Validation errors for missing files or bad requests
- Audio processing errors from FFmpeg operations
- Transcription errors from the Groq API
- Cleanup of temporary files even in error cases
- Concurrency Control: The API limits the number of concurrent transcription operations to 5 to prevent overloading the system or hitting API rate limits.
- CPU Utilization: Audio chunk processing uses the available CPU cores (with a default of 4 if GOMAXPROCS is not set)
- Temporary File Management: All temporary files are properly cleaned up after processing.