The repository holds a basic example of how to use Whisper ASR model with VLLM engine.
Check step_by_step.ipynb
notebook for more detailed explanation.
Information was scrapped all across Internet, but it should provide good enought base to understand how to use VLLM and Whisper to achieve quite impressive perfomance. With RTX 2080 Mobile 8Gb, it takes literally couple seconds to process 2 min audio, which is huge boost compare to native Transfomers realization.
- Create virtual environment
python3 -m venv .venv
and activate itsource .venv/bin/activate
- Install requrements
pip install -r requirements.txt
- Place your wavs into samples folder OR do not forget to pass their folder as argument
- Run
python3 main.py
--model
- Model name, default valueopenai/whisper-large-v3-turbo
--gpu_memory_utilization
- GPU memory utilization, default value0.55
--dtype
- Data type, default valuefloat16
--max_num_seqs
- Max number of sequences (batch size), default value4
--sample_path
- Path to directory with the audio samples inwav
format, default valuesamples
--language
- Language code, default valueen