Skip to content

Conversation

@harrywhoo
Copy link
Contributor

Phi-4 was causing runtime errors in TVM due to its partial_rotary_factor of 0.75, which resulted in a mismatch between the expected ext_factor dimensions and the actual. Changed ext_factors buffer size from head_dim//2 to rotary_dim//2 to align with RoPE implementation. This resolves TVMError: Assert fail for fused_rope_longrope_scaling_ext_factors_handle. Tested other models to ensure this didn't break support.

tqchen and others added 6 commits June 17, 2025 16:28
MLC local ci setup. Also CI for Windows and macOS building,
which may take 90-100 mins.

Co-authored-by: Siyuan Feng <[email protected]>
- Revert "[CMake][MSVC] Disable permissive mode for MSVC builds (#16343)"
- Skip MSC tests
- Disable NNPack and TFLite
- Tweak CMAKE_CUDA_ARCHITECTURES
This PR updates the NVSHMEM-based NDArray allocation, which
was missed in the recent FFI refactor and thus fails to compile
when building tvm with NVSHMEM.
…tor models

- Change ext_factors buffer size from head_dim//2 to rotary_dim//2
- Fixes runtime error for models like Phi-4 with partial_rotary_factor < 1.0
- Resolves TVMError: Assert fail for fused_rope_longrope_scaling_ext_factors_handle
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants