Skip to content

Drakrig/whisper_vllm_example

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Whisper with VLLM

The repository holds a basic example of how to use Whisper ASR model with VLLM engine.

Check step_by_step.ipynb notebook for more detailed explanation.

Information was scrapped all across Internet, but it should provide good enought base to understand how to use VLLM and Whisper to achieve quite impressive perfomance. With RTX 2080 Mobile 8Gb, it takes literally couple seconds to process 2 min audio, which is huge boost compare to native Transfomers realization.

How to run example

  1. Create virtual environment python3 -m venv .venv and activate it source .venv/bin/activate
  2. Install requrements pip install -r requirements.txt
  3. Place your wavs into samples folder OR do not forget to pass their folder as argument
  4. Run python3 main.py

Arguments

  • --model - Model name, default value openai/whisper-large-v3-turbo
  • --gpu_memory_utilization - GPU memory utilization, default value 0.55
  • --dtype - Data type, default value float16
  • --max_num_seqs - Max number of sequences (batch size), default value 4
  • --sample_path - Path to directory with the audio samples in wav format, default value samples
  • --language - Language code, default value en

Sources

  1. Official VLLM encoder decoder example Link
  2. HuggingFace Whisper VLLM endpoint example Link

About

A simple example of how to deploy Whisper with VLLM to speed up inference

Topics

Resources

Stars

Watchers

Forks