Audio Transcription API

A Go (Golang) backend service for transcribing audio files using FFmpeg for processing and the Groq API for transcription.

Features

Upload and transcribe audio files (MP3, WAV, FLAC, M4A, etc.)
Audio preprocessing with FFmpeg for optimal transcription quality
Audio chunking for large files to improve transcription accuracy
Parallel processing of audio chunks for faster results
RESTful API for easy integration with frontend applications

Tech Stack

Go (Golang)
Gin framework for HTTP routing and middleware
FFmpeg for audio processing
Groq API for transcription (using the Whisper model)
UUID generation for temporary file management

Requirements

Go 1.16+
FFmpeg installed on the system
FFprobe installed on the system (comes with FFmpeg)
Groq API key

Installation

Clone the repository:

git clone <repository-url>
cd <repository-directory>

Install dependencies:
```
go mod download
```

Add your Groq API key to the transcribeAudio function in main.go:

// Replace the empty string with your API key
apiKey := "your_groq_api_key_here"

Running the Server

Start the server:

go run main.go

The server will run on port 8080 by default.

API Endpoints

Transcribe Audio

Endpoint: POST /api/transcribe

Request:

Content-Type: multipart/form-data
Body:
- file: Audio file (MP3, WAV, FLAC, M4A, etc.)

Response:

{
  "transcription": "This is the transcribed text from the audio file..."
}

Error Response:

{
  "error": "Error message describing what went wrong"
}

Implementation Details

Audio Processing Pipeline

File Upload: The API accepts audio file uploads via a multipart form.
Preprocessing: The uploaded audio is preprocessed using FFmpeg:
- Converted to 16kHz sample rate
- Reduced to mono channel
- Converted to FLAC format for optimal transcription
Chunking: Large audio files are split into manageable chunks (2 minutes each with a 1-second overlap)
Parallel Processing: Multiple chunks are transcribed simultaneously (limited to 5 concurrent operations)
Transcription: Each chunk is sent to the Groq API for transcription using the distil-whisper-large-v3-en model
Combination: Results are combined and returned as a complete transcription

Code Structure

Main Function: Sets up the Gin router with CORS configuration and defines the API routes
transcribeAudio: The main handler function that orchestrates the audio processing and transcription
preprocessAudioFile: Processes audio files to prepare them for transcription
getAudioChunkData: Analyzes audio files to determine chunking parameters
chunkifyAudioFile: Splits large audio files into smaller chunks
createAudioChunkFile: Creates individual audio chunk files
transcribeChunk: Sends audio chunks to the Groq API for transcription
deleteFiles: Cleans up temporary files

Extending the API

To add additional functionality:

Add new route handlers in main.go
Create helper functions for new features
Update the error handling as needed

CORS Configuration

The API is configured to accept requests from:

http://localhost:5173 (default Vue.js development server)

If you need to allow additional origins, update the CORS configuration in the main function.

Error Handling

The API includes comprehensive error handling:

Validation errors for missing files or bad requests
Audio processing errors from FFmpeg operations
Transcription errors from the Groq API
Cleanup of temporary files even in error cases

Performance Considerations

Concurrency Control: The API limits the number of concurrent transcription operations to 5 to prevent overloading the system or hitting API rate limits.
CPU Utilization: Audio chunk processing uses the available CPU cores (with a default of 4 if GOMAXPROCS is not set)
Temporary File Management: All temporary files are properly cleaned up after processing.

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

Audio Transcription API

Features

Tech Stack

Requirements

Installation

Running the Server

API Endpoints

Transcribe Audio

Implementation Details

Audio Processing Pipeline

Code Structure

Extending the API

CORS Configuration

Error Handling

Performance Considerations

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Uh oh!

License

Uh oh!

nodesleep/go-transcriber

Folders and files

Latest commit

History

Repository files navigation

Audio Transcription API

Features

Tech Stack

Requirements

Installation

Running the Server

API Endpoints

Transcribe Audio

Implementation Details

Audio Processing Pipeline

Code Structure

Extending the API

CORS Configuration

Error Handling

Performance Considerations

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages