Skip to content

Conversation

@stickies-v
Copy link

In the current experimental phase of bitcoinkernel, we don't offer any versioning, interface guarantees, backwards compatibility, ... This means that in practice, any software with a dependency on bitcoinkernel cannot use semver for its dependency management, but needs to specify (and ideally/usually bundle) the exact version it was built for.

Initially, I was looking at adding a kernel_Version* kernel_get_version() function to expose version and commit information, allowing downstream to provide users and build systems with better feedback when they're using a wrong bitcoinkernel version. Because this information is mostly useful at install or compile time, rather than runtime, I pivoted towards the approach in this PR, adding the commit sha to bitcoinkernel.pc.

This PR makes changes such as stickies-v/py-bitcoinkernel@772f562 very straightforward to implement.

Note: I'm not very experienced with CMake, and this is the cleanest/minimal diff I could come up with to achieve this goal, but I'm very much not confident that there aren't much better alternatives to do so.

TheCharlatan and others added 23 commits January 17, 2025 13:34
As a first step, implement the equivalent of what was implemented in the
now deprecated libbitcoinconsensus header. Also add a test binary to
exercise the header and library.

Unlike the deprecated libbitcoinconsensus the kernel library can now use
the hardware-accelerated sha256 implementations thanks for its
statically-initialzed context. The functions kept around for
backwards-compatibility in the libbitcoinconsensus header are not ported
over. As a new header, it should not be burdened by previous
implementations. Also add a new error code for handling invalid flag
combinations, which would otherwise cause a crash.

The macros used in the new C header were adapted from the libsecp256k1
header.

To make use of the C header from C++ code, a C++ header is also
introduced for wrapping the C header. This makes it safer and easier to
use from C++ code.
Exposing logging in the kernel library allows users to follow what is
going on when using it. Users of the C header can use
`kernel_logging_connection_create(...)` to pass a callback function to
Bitcoin Core's internal logger. Additionally the level and severity can
be globally configured.

By default, the logger buffers messages until
`kernel_loggin_connection_create(...)` is called. If the user does not
want any logging messages, it is recommended that
`kernel_disable_logging()` is called, which permanently disables the
logging and any buffering of messages.
The context introduced here holds the objects that will be required for
running validation tasks, such as the chosen chain parameters, callbacks
for validation events, and an interrupt utility. These will be used in a
few commits, once the chainstate manager is introduced.

This commit also introduces conventions for defining option objects. A
common pattern throughout the C header will be:
```
options = object_option_create();
object = object_create(options);
```
This allows for more consistent usage of a "builder pattern" for
objects where options can be configured independently from
instantiation.
As a first option, add the chainparams. For now these can only be
instantiated with default values. In future they may be expanded to take
their own options for regtest and signet configurations.

This commit also introduces a unique pattern for setting the option
values when calling the `*_set(...)` function.
The notifications are used for notifying on connected blocks and on
warning and fatal error conditions.

The user of the C header may define callbacks that gets passed to the
internal notification object in the
`kernel_NotificationInterfaceCallbacks` struct. Each of the callbacks
take a `user_data` argument that gets populated from the `user_data`
value in the struct. It can be used to recreate the structure containing
the callbacks on the user's side, or to give the callbacks additional
contextual information.
This is the main driver class for anything validation related, so expose
it here.

Creating the chainstate manager and block manager options will currently
also trigger the creation of their respectively configured directories.

The chainstate manager and block manager options were not consolidated
into a single object, since the kernel might eventually introduce a
block manager object for the purposes of being a light-weight block
store reader.

The chainstate manager will associate with the context with which it was
created for the duration of its lifetime. It is only valid if that
context remains in memory too.

The tests now also create dedicated temporary directories. This is
similar to the behaviour in the existing unit test framework.
Re-use the same pattern used for the context options. This allows users
to set the number of threads used in the validation thread pool.
The `kernel_chainstate_manager_load_chainstate(...)` function is the
final step required to prepare the chainstate manager for future tasks.
Its main responsibility is loading the coins and block tree indexes.

Though its `context` argument is not strictly required this was added to
ensure that the context remains in memory for this operation. This
pattern of a "dummy" context will be re-used for functions introduced in
later commits.

The chainstate load options will be populated over the next few commits.
The added function allows the user process and validate a given block
with the chainstate manager. The *_process_block(...) function does some
preliminary checks on the block before passing it to
`ProcessNewBlock(...)`. These are similar to the checks in the
`submitblock()` rpc.

Richer processing of the block validation result will be made available
in the following commits through the validation interface.

The commits also adds a utility for serializing a `CBlock`
(`kernel_block_create()`) that may then be passed to the library for
processing.

The tests exercise the function for both mainnet and regtest. The
commit also adds the data of 206 regtest blocks (some blocks also
contain transactions).
Adds options for wiping the chainstate and block tree indexes to the
chainstate load options. In combination and once the
`*_import_blocks(...)` function is added in a later commit, this
triggers a reindex. For now, it just wipes the existing data.
This allows a user to run the kernel without creating on-disk files for
the block tree and chainstate indexes. This is potentially useful in
scenarios where the user needs to do some ephemeral validation
operations.

One specific use case is when linearizing the blocks on disk. The block
files store blocks out of order, so a program may utilize the library
and its header to read the blocks with one chainstate manager, and then
write them back in order, and without orphans, with another chainstate
maanger. To save disk resources and if the indexes are not required once
done, it may be beneficial to keep the indexes in memory for the
chainstate manager that writes the blocks back again.
The `kernel_import_blocks` function is used to both trigger a reindex,
if the indexes were previously wiped through the chainstate load
options, or import the block data of a single block file.

The behaviour of the import can be verified through the test logs.
Calling interrupt can halt long-running functions associated with
objects that were created through the passed-in context.
This adds the infrastructure required to process validation events. For
now the external validation interface only has support for the
`BlockChecked` callback, but support for the other internal validation
interface methods can be added in the future.

The validation interface follows an architecture for defining its
callbacks and ownership that is similar to the notifications.

The task runner is created internally with a context, which itself
internally creates a unique ValidationSignals object. When the user
creates a new chainstate manager the validation signals are internally
passed to the chainstate manager through the context.

The callbacks block any further validation execution when they are
called. It is up to the user to either multiplex them, or use them
otherwise in a multithreaded mechanism to make processing the validation
events non-blocking.

A validation interface can register for validation events with a
context. Internally the passed in validation interface is registerd with
the validation signals of a context.

The BlockChecked callback introduces a seperate type for a non-owned
block. Since a library-internal object owns this data, the user needs to
be explicitly prevented from deleting it. In a later commit a utility
will be added to copy its data.
These allow for the interpretation of the data in a `BlockChecked`
validation interface callback. This is useful to get richer information
in case a block failed to validate.
This adds functions for copying serialized block data into a user-owned
variable-sized byte array.

Use it in the tests for verifying the implementation of the validation
interface's `BlockChecked` method.
This adds functions for reading a block from disk with a retrieved block
index entry. External services that wish to build their own index, or
analyze blocks can use this to retrieve block data.

The block index can now be traversed from the tip backwards. This is
guaranteed to work, since the chainstate maintains an internal block
tree index in memory and every block (besides the genesis) has an
ancestor.

The user can use this function to iterate through all blocks in the
chain (starting from the tip). Once the block index entry for the
genesis block is reached a nullptr is returned if the user attempts to
get the previous entry.
This adds functions for reading the undo data from disk with a retrieved
block index entry. The undo data of a block contains all the spent
script pubkeys of all the transactions in a block.

In normal operations undo data is used during re-orgs. This data might
also be useful for building external indexes, or to scan for silent
payment transactions.

Internally the block undo data contains a vector of transaction undo
data which contains a vector of the spent outputs. For this reason, the
`kernel_get_block_undo_size(...)` function is added to the header for
retrieving the size of the transaction undo data vector, as well as the
`kernel_get_transaction_undo_size(...) function for retrieving the size
of each spent outputs vector contained within each transaction undo data
entry. With these two sizes the user can iterate through the undo data
by accessing the transaction outputs by their indeces with
`kernel_get_undo_output_by_index`. If an invalid index is passed in, the
`kernel_ERROR_OUT_OF_BOUNDS` error is returned again.

The returned `kernel_TransactionOutput` is entirely owned by the user
and may be destroyed with the `kernel_transaction_output_destroy(...)`
convenience function.
Adds further functions useful for traversing the block index and
retrieving block information.

This includes getting the block height and hash.
This is useful for a host block processing feature where having an
identifier for the block is needed. Without this, external users need to
serialize the block and calculate the hash externally, which is less
efficient.
This showcases a re-implementation of bitcoin-chainstate only using the
kernel C++ API header.
This allows future commits to use the get_git_info function
to access git commit info
As long as the bitcoinkernel shared library does not offer any interface
guarantees, versioning, or backwards compatibility, exposing the
build git commit (if it exists) allows the user's build system to
do some kind of dependency management.
@TheCharlatan TheCharlatan force-pushed the kernelApi branch 4 times, most recently from 39c2c5a to abffb6d Compare August 12, 2025 15:06
@TheCharlatan TheCharlatan force-pushed the kernelApi branch 8 times, most recently from 3a34e00 to bce88ae Compare August 20, 2025 21:19
@TheCharlatan TheCharlatan force-pushed the kernelApi branch 4 times, most recently from 89f9518 to 1857296 Compare September 5, 2025 21:13
@TheCharlatan TheCharlatan force-pushed the kernelApi branch 2 times, most recently from e450549 to 0fc068b Compare September 15, 2025 15:11
@TheCharlatan TheCharlatan force-pushed the kernelApi branch 2 times, most recently from 2ac9d60 to 21b0503 Compare September 24, 2025 09:56
@TheCharlatan TheCharlatan force-pushed the kernelApi branch 2 times, most recently from f3ca1d6 to 64d1449 Compare October 4, 2025 16:03
@TheCharlatan TheCharlatan force-pushed the kernelApi branch 4 times, most recently from 2a80cd9 to 81cec73 Compare October 9, 2025 16:37
@TheCharlatan TheCharlatan force-pushed the kernelApi branch 3 times, most recently from edf99b8 to 20be96e Compare October 22, 2025 11:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants