Whisper with VLLM

The repository holds a basic example of how to use Whisper ASR model with VLLM engine.

Check step_by_step.ipynb notebook for more detailed explanation.

Information was scrapped all across Internet, but it should provide good enought base to understand how to use VLLM and Whisper to achieve quite impressive perfomance. With RTX 2080 Mobile 8Gb, it takes literally couple seconds to process 2 min audio, which is huge boost compare to native Transfomers realization.

How to run example

Create virtual environment python3 -m venv .venv and activate it source .venv/bin/activate
Install requrements pip install -r requirements.txt
Place your wavs into samples folder OR do not forget to pass their folder as argument
Run python3 main.py

Arguments

--model - Model name, default value openai/whisper-large-v3-turbo
--gpu_memory_utilization - GPU memory utilization, default value 0.55
--dtype - Data type, default value float16
--max_num_seqs - Max number of sequences (batch size), default value 4
--sample_path - Path to directory with the audio samples in wav format, default value samples
--language - Language code, default value en

Sources

Official VLLM encoder decoder example Link
HuggingFace Whisper VLLM endpoint example Link

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
step_by_step.ipynb		step_by_step.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Whisper with VLLM

How to run example

Arguments

Sources

About

Uh oh!

Languages

Drakrig/whisper_vllm_example

Folders and files

Latest commit

History

Repository files navigation

Whisper with VLLM

How to run example

Arguments

Sources

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages