speculative decoding support #25

natehofmann · 2025-03-27T22:03:41Z

build off of liangjason87:liangjason/spec_decoding, but with rough Rust layer. haven't tried yet, things still all over the place. keeping to simple, non-stream, non-lora case

…into nathof/spec-decode-support

…neration logits

…mann/llgtrt into nathof/spec-decode-support

mmoskal · 2025-03-28T17:21:53Z

I think the spec-decoding logic should sit in async_exec.rs and not be visible outside much.

disable logits_processor on the draft model for now
spin two threads for response from trtllm (you already do that)
in ReqData keep the current draft and real req id

When a request from user is added (assume n_draft_tokens=5)

enqueue request in draft executor, with max_tokens=5
when the draft responder thread gets the response (all 5 tokens), validate tokens against the constraint (may ignore for now), and enqueue new request in the real executor (max_tokens=1, though maybe that's implicit)
when the real responder thread gets the response (I guess it's going to be up to 6 tokens), send StepResults back to the user and enqueue new request in draft executor (if request not done)

This way you don't have to mess with completions.rs much.

…lgtrt into nathof/spec-decode-support

…ofmann/llgtrt into nathof/spec-decode-support

… conditions in spec decode loop

…hof/spec-decode-support

This reverts commit a6d2c86.

This reverts commit a49321c.

…ft engine

natehofmann and others added 12 commits March 25, 2025 21:57

init

9fc8900

more thoughts

82eb609

dubm async setup

73957cf

add structs and logic to pass draft params in target request

51c8e58

rough loop

3ab4ca0

Merge remote-tracking branch 'jason-llgtrt/liangjason/spec_decoding' …

ed0ecec

…into nathof/spec-decode-support

small changes

ba9e10b

n_draft_tokens from cli

b1c8ff3

rought draft rust changes to be able to pass draft params and pull ge…

81efe3d

…neration logits

draft engine start upscript, account for less tokens

94426a9

delete comment

a57245c

Merge branch 'liangjason/spec_decode_rust_objs' of github.com:natehof…

4a4ea71

…mann/llgtrt into nathof/spec-decode-support

liangjason87 and others added 17 commits March 28, 2025 16:43

account for data types when grabbing bytes

a816bf6

some fixes

297c6f8

Merge branch 'nathof/spec-decode-support' of github.com:natehofmann/l…

aa40ee6

…lgtrt into nathof/spec-decode-support

come more compile fixes

461ee6e

more build fixes

6d9217d

req_info var loop handling

6136f10

edge case handling

78604b0

rm extraneous letter

24cb241

address compile errors

5081ac9

mising attribute variable

f12622c

fix var

5584bd4

wrong attribute

ce4bdc2

Merge branch 'nathof/spec-decode-support' of https://github.com/nateh…

b07274f

…ofmann/llgtrt into nathof/spec-decode-support

true should be false

d61448f

add null check for generation logits

5789acf

check for any logits from trtllm

5803234

syntax errors

3136178

liangjason87 and others added 23 commits April 3, 2025 21:14

make draft nonstreaming and check dims before using logits

72e212f

enable kv cache reuse for draft exec config, account for other finish…

687037e

… conditions in spec decode loop

wrong name

fcaf43c

uncommented start up code, type

854d5ae

fix stop reason

700a512

logging

d36b543

completion token fix

d4c9f0b

redo completion token count

8e3484b

mut some

2e1aeae

wrong some

5700d0b

as ref mut

59b097b

working stop cond

2506775

acc rate support

a6d2c86

rechange default

d4a4153

Merge branch 'main' of https://github.com/guidance-ai/llgtrt into nat…

2dde001

…hof/spec-decode-support

main repo merge, up trtllm to .18

005767a

up submodules

a49321c

Revert "acc rate support"

3c2e005

This reverts commit a6d2c86.

Revert "up submodules"

639a7de

This reverts commit a49321c.

up submodule versions

fb8d001

disable logits processor for target

1f5118d

acc rate support

b69990f

working state

4dc658a

natehofmann force-pushed the nathof/spec-decode-support branch from 265cdd4 to 4dc658a Compare April 11, 2025 00:08

liangjason87 and others added 6 commits April 11, 2025 03:10

Add logits logging, acceptance logs, runtime config support

8bfe391

logging, preserve working state

f1be245

must be mut

752a7fc

Add logs, check mpi for draft, change startup script to work with dra…

485273e

…ft engine

working state

a95315a

merge from main

2f1c950

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

speculative decoding support #25

speculative decoding support #25

Uh oh!

natehofmann commented Mar 27, 2025

Uh oh!

mmoskal commented Mar 28, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

speculative decoding support #25

Are you sure you want to change the base?

speculative decoding support #25

Uh oh!

Conversation

natehofmann commented Mar 27, 2025

Uh oh!

mmoskal commented Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mmoskal commented Mar 28, 2025 •

edited

Loading