[Eagle3] Add Qwen2 as verifier for Eagle3 speculation #98
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds Qwen2 as a verifier for Eagle3 speculative decoding.
Changes
SupportsEagle3
interface toQwen2ForCausalLM
Testing
Test Configuration:
Qwen/Qwen2-7B-Instruct
nm-testing/SpeculatorLlama3-1-8B-Eagle3-converted-0717-quantized
Results:
Known Limitation
Qwen2 + vLLM v1 Engine: Currently fails during KV cache configuration (
NotImplementedError
inkv_cache_utils.py:1118
). This is a separate vLLM v1 engine issue unrelated to the Eagle3 implementation. The Eagle3 integration itself works correctly as evidenced by successful model loading, compilation, and architecture recognition.For comparison, Qwen3 + Eagle3 works fully in the v1 engine, indicating this is a model-specific v1 engine limitation rather than an Eagle3 interface issue.
Files Modified
vllm/model_executor/models/qwen2.py