This repo defines an interface for integration of structured output (constrained decoding) engines into LLM inference systems. It presents layout of an object provided by a structured outputs engine, and then passed to the inference system.
The core of the interface is a C struct cbison_factory that contains
(in addition to version and magic numbers, etc.) the following function pointers:
validate_grammar, taking type and text of a grammar, and returning a boolean and diagnosticsnew_matcher, also taking type and text of a grammar, and returning a pointer to a matcher object
Methods on matcher objects are also defined as function pointers in cbison_factory and include:
- state accessors:
get_error,is_accepting,is_stopped compute_mask, which returns a bitmask corresponding to allowed tokens in the current state of the matcherconsume_tokensadvancing the state of the matcher
Following matcher methods are optional:
validate_tokenschecking if (one or more) tokens would be accepted in sequencecompute_ff_tokensreturning any fast-forward tokens forced by the matcherrollbackwhich is the inverse ofconsume_tokensresetwhich resets the matcher to the initial state
Additionally, the factory has an optional method compute_masks which
returns token bitmasks for several matchers in parallel.
The C++ cbison::Factory class wraps an existing cbison_factory and provides a C++ interface.
The Python class cbison.CbisonFactory uses ctypes to wrap the C interface.
A grammar engine constructs cbison_factory given a cbison_tokenizer,
which defines the following methods:
get_token, which given a numeric token ID returns corresponding sequence of bytesis_special_token, which given a token ID returns true if the token is special (eg.,<|endoftext|>,<think>, etc.)tokenize_bytes, which takes a sequence of bytes and returns a list of token IDs (this is required to correctly compute "fast-forward" tokens based on "fast-forward" bytes)
The C++ cbison::Tokenizer class wraps an existing cbison_tokenizer and provides a C++ interface.
The Python class cbison.CbisonTokenizer uses ctypes to wrap the C interface.
Separately, cbison::CppTokenizer makes it easier to implement a cbison_tokenizer in C++.