Python package providing wrappers around ivrit.ai's capabilities.
pip install ivritThe ivrit package provides audio transcription functionality using multiple engines.
import ivrit
# Transcribe a local audio file
model = ivrit.load_model(engine="faster-whisper", model="ivrit-ai/whisper-large-v3-turbo-ct2")
result = model.transcribe(path="audio.mp3")
# With custom device
model = ivrit.load_model(engine="faster-whisper", model="ivrit-ai/whisper-large-v3-turbo-ct2", device="cpu")
result = model.transcribe(path="audio.mp3")
print(result["text"])# Transcribe audio from a URL
model = ivrit.load_model(engine="faster-whisper", model="ivrit-ai/whisper-large-v3-turbo-ct2")
result = model.transcribe(url="https://example.com/audio.mp3")
print(result["text"])# Get results as a stream (generator)
model = ivrit.load_model(engine="faster-whisper", model="base")
for segment in model.transcribe(path="audio.mp3", stream=True, verbose=True):
print(f"{segment.start:.2f}s - {segment.end:.2f}s: {segment.text}")
# Or use the model directly
model = ivrit.FasterWhisperModel(model="base")
for segment in model.transcribe(path="audio.mp3", stream=True):
print(f"{segment.start:.2f}s - {segment.end:.2f}s: {segment.text}")
# Access word-level timing
for segment in model.transcribe(path="audio.mp3", stream=True):
print(f"Segment: {segment.text}")
for word in segment.extra_data.get('words', []):
print(f" {word['start']:.2f}s - {word['end']:.2f}s: '{word['word']}'")Load a transcription model for the specified engine and model.
- engine (
str): Transcription engine to use. Options:"faster-whisper","stable-ts" - model (
str): Model name for the selected engine - device (
str, optional): Device to use for inference. Default:"auto". Options:"auto","cpu","cuda","cuda:0", etc. - model_path (
str, optional): Custom path to the model (for faster-whisper)
TranscriptionModelobject that can be used for transcription
ValueError: If the engine is not supportedImportError: If required dependencies are not installed
The ivrit package uses an object-oriented design with a base TranscriptionModel class and specific implementations for each transcription engine.
TranscriptionModel: Abstract base class for all transcription modelsFasterWhisperModel: Implementation for the Faster Whisper engine
# Step 1: Load the model
model = ivrit.load_model(engine="faster-whisper", model="base")
# Step 2: Transcribe audio
result = model.transcribe(path="audio.mp3")# Create model directly
model = ivrit.FasterWhisperModel(model="base")
# Use the model
result = model.transcribe(path="audio.mp3")For multiple transcriptions, load the model once and reuse it:
# Load model once
model = ivrit.load_model(engine="faster-whisper", model="base")
# Use for multiple transcriptions
result1 = model.transcribe(path="audio1.mp3")
result2 = model.transcribe(path="audio2.mp3")
result3 = model.transcribe(path="audio3.mp3")pip install ivritpip install ivrit[faster-whisper]Fast and accurate speech recognition using the Faster Whisper model.
Model Class: FasterWhisperModel
Available Models: base, large, small, medium, large-v2, large-v3
Features:
- Word-level timing information
- Language detection with confidence scores
- Support for custom devices (CPU, CUDA, etc.)
- Support for custom model paths
- Streaming transcription
Dependencies: faster-whisper>=1.1.1
Stable and reliable transcription using Stable-TS models.
Status: Not yet implemented
git clone <repository-url>
cd ivrit
pip install -e ".[dev]"pytestblack .
isort .MIT License - see LICENSE file for details.