Enable model caching for Whisper pipeline on GPU and NPU #2759

luke-lin-vmc · 2025-09-23T08:06:08Z

Whisper sample code to enable model caching on GPU and NPU

This is #2751 follow up

Sample Code Reference:
https://github.com/openvinotoolkit/openvino.genai/blob/master/samples/python/visual_language_chat/encrypted_model_vlm.py#L87
https://github.com/openvinotoolkit/openvino.genai/blob/master/samples/cpp/text_generation/encrypted_model_causal_lm.cpp#L52

OPTIMIZE_SIZE and encryption are not included. The main performance concern for Whisper is pipeline speed. Since Whisper is much smaller than LLMs, size optimization offers only very little savings while potentially adding latency. Similarly, model encryption can also introduce additional latency.

as-suvorov · 2025-09-23T08:22:06Z

samples/cpp/whisper_speech_recognition/whisper_speech_recognition.cpp

+    ov::AnyMap ov_config;
+    if (device == "NPU" || device.find("GPU") != std::string::npos) {  // need to handle cases like "GPU", "GPU.0" and "GPU.1"
+        // Cache compiled models on disk for GPU and NPU to save time on the
+        // next run. It's not beneficial for CPU.


Why it's not beneficial for CPU?

This comment is simply copied from the reference sample code.

AFAIK CPU plugin's "compile" step is mostly graph rewrites and primitive selection. It’s typically milliseconds–a few hundred ms, not seconds–minutes like on GPU/NPU.

Most importantly, enable model caching on CPU causes Whisper pipeline crashed. This looks like a bug which needs further investigation. So currently model caching is enabled only on GPU and NPU to avoid the issue.

Enable model caching for Whisper pipeline on GPU and NPU

daea5c5

luke-lin-vmc mentioned this pull request Sep 23, 2025

Added Model Caching example in Whisper README.md #2751

Closed

github-actions bot added the category: Whisper samples GenAI Whisper samples label Sep 23, 2025

as-suvorov reviewed Sep 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable model caching for Whisper pipeline on GPU and NPU #2759

Enable model caching for Whisper pipeline on GPU and NPU #2759

luke-lin-vmc commented Sep 23, 2025 •

edited

Loading

Uh oh!

as-suvorov Sep 23, 2025

Uh oh!

luke-lin-vmc Sep 23, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Enable model caching for Whisper pipeline on GPU and NPU #2759

Are you sure you want to change the base?

Enable model caching for Whisper pipeline on GPU and NPU #2759

Conversation

luke-lin-vmc commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

as-suvorov Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

luke-lin-vmc Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

luke-lin-vmc commented Sep 23, 2025 •

edited

Loading

luke-lin-vmc Sep 23, 2025 •

edited

Loading