Transcribe audio and add subtitles to videos using Whisper in ComfyUI. Support multiple languages, prompt guidance and multiple whisper models.
If you like my projects and wish to see updates and new features, please consider supporting me. It helps a lot!
Install via ComfyUI Manager
Load this workflow into ComfyUI
Models are auto-downloaded to /ComfyUI/models/stt/whisper
'tiny.en', 'tiny', 'base.en', 'base', 'small.en', 'small', 'medium.en', 'medium', 'large-v1', 'large-v2', 'large-v3', 'large', 'large-v3-turbo', 'turbo'
Transcribe audio and get timestamps for each segment and word.
Add subtitles on the video frames. You can specify font family, font color and x/y positions.
Add subtitles like wordcloud on blank frames
- Merge #22 by @francislabountyjr for model patcher, more whisper models support, comfyui model directory support
- Merge #18 by @qy8502 for Prompt Guidance support
- Support YRDZST Semibold Font
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)