Skip to content

Conversation

@RyanMetcalfeInt8
Copy link

This PR adds some necessary changes to support whisper decoder via NPUW stateful flow.

The whisper decoder has slightly different behavior than LLM decoders with NPU, in that the prefill logits are not already sliced (which is the assumption made by ORT GenAI, for which this pipeline is supported through).

Ref ticket: CVS-176474

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enables explicit slicing of prefill logits in the OVEP stateful flow when NPUW_SLICE_OUT is disabled, specifically to support whisper decoder behavior where prefill logits are not pre-sliced by NPU.

Key changes:

  • Added logic to detect when NPU logit slicing is required based on NPUW_SLICE_OUT property
  • Implemented GetTensor override in StatefulOVInferRequest to slice logits tensor when needed
  • Made GetTensor virtual in base OVInferRequest class to enable override

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
ov_interface.h Added virtual GetTensor method and NPU logits slice detection members to support stateful request overrides
ov_interface.cc Implemented NPUW_SLICE_OUT property checking and logits slicing logic in GetTensor override

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant