Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,12 @@
#include "audio_utils.hpp"
#include "openvino/genai/whisper_pipeline.hpp"

auto get_config_for_cache() {
ov::AnyMap config;
config.insert({ov::cache_dir("whisper_cache")});
return config;
}

int main(int argc, char* argv[]) try {
if (argc < 3 || argc > 4) {
throw std::runtime_error(std::string{"Usage: "} + argv[0] + " <MODEL_DIR> \"<WAV_FILE_PATH>\" <DEVICE>");
Expand All @@ -13,7 +19,14 @@ int main(int argc, char* argv[]) try {
std::string wav_file_path = argv[2];
std::string device = (argc == 4) ? argv[3] : "CPU"; // Default to CPU if no device is provided

ov::genai::WhisperPipeline pipeline(models_path, device);
ov::AnyMap ov_config;
if (device == "NPU" || device.find("GPU") != std::string::npos) { // need to handle cases like "GPU", "GPU.0" and "GPU.1"
// Cache compiled models on disk for GPU and NPU to save time on the
// next run. It's not beneficial for CPU.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why it's not beneficial for CPU?

Copy link
Contributor Author

@luke-lin-vmc luke-lin-vmc Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. This comment is simply copied from the reference sample code.
  2. AFAIK CPU plugin's "compile" step is mostly graph rewrites and primitive selection. It’s typically milliseconds–a few hundred ms, not seconds–minutes like on GPU/NPU.
  3. Most importantly, enable model caching on CPU causes Whisper pipeline crashed. This looks like a bug which needs further investigation. So currently model caching is enabled only on GPU and NPU to avoid the issue.

ov_config = get_config_for_cache();
}

ov::genai::WhisperPipeline pipeline(models_path, device, ov_config);
Comment on lines +22 to +29
Copy link

Copilot AI Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition enables caching for GPU variants but misses NPU variants like 'NPU.0', restricting caching contrary to the stated intent. Adjust to also detect NPU substrings: if (device.find("GPU") != std::string::npos || device.find("NPU") != std::string::npos) { ... }.

Copilot uses AI. Check for mistakes.


ov::genai::WhisperGenerationConfig config = pipeline.get_generation_config();
// 'task' and 'language' parameters are supported for multilingual models only
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,10 @@ def read_wav(filepath):
raw_speech, samplerate = librosa.load(filepath, sr=16000)
return raw_speech.tolist()

def get_config_for_cache():
config_cache = dict()
config_cache["CACHE_DIR"] = "whisper_cache"
return config_cache

def main():
parser = argparse.ArgumentParser()
Expand All @@ -19,7 +23,13 @@ def main():
parser.add_argument("device", nargs="?", default="CPU", help="Device to run the model on (default: CPU)")
args = parser.parse_args()

pipe = openvino_genai.WhisperPipeline(args.model_dir, args.device)
ov_config = dict()
if args.device == "NPU" or "GPU" in args.device: # need to handle cases like "GPU", "GPU.0" and "GPU.1"
# Cache compiled models on disk for GPU and NPU to save time on the
# next run. It's not beneficial for CPU.
ov_config = get_config_for_cache()

Comment on lines +26 to +31
Copy link

Copilot AI Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition handles GPU variants (e.g. GPU.0) but will skip NPU variants such as 'NPU.0', limiting caching despite the PR goal to enable it for NPU. Update the condition to also match NPU suffixed forms, e.g.: if 'GPU' in args.device or args.device.startswith('NPU'):. Alternatively use substring checks for both: if 'GPU' in args.device or 'NPU' in args.device:.

Copilot uses AI. Check for mistakes.

pipe = openvino_genai.WhisperPipeline(args.model_dir, args.device, **ov_config)

config = pipe.get_generation_config()
# 'task' and 'language' parameters are supported for multilingual models only
Expand Down
Loading