-
Notifications
You must be signed in to change notification settings - Fork 219
Spike: direct tx @ executor adapter #2536
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
* main: fix: docs deployment (#2528) chore: merge docs repo into this repo (#2526) optimize: implement head index for BatchQueue.Next() to avoid O(n) slice re-slicing (#2519) build(deps): Bump github.com/rs/zerolog from 1.33.0 to 1.34.0 (#2525) chore: migrate from ipfs/go-log to zerolog across codebase (#2516) fix: proto inclusion (#2515) build(deps): Bump github.com/docker/docker from 28.2.2+incompatible to 28.3.3+incompatible in /test/docker-e2e in the go_modules group across 1 directory (#2503) refactor: replace Split in loops with more efficient SplitSeq (#2502) feat: expose ev-node P2P and RPC ports in docker compose (#2513) feat: create grpc execution environemnt (#2490)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @alpe, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request represents a significant "spike" to explore and implement the concept of direct transactions within our system. My primary goal is to establish a robust pipeline for transactions submitted directly to the Data Availability (DA) layer, ensuring their eventual inclusion even if they bypass the standard mempool. This involves introducing new components for processing, queuing, and adapting the execution layer to handle these transactions, alongside mechanisms for forced inclusion and a fallback mode to enhance network resilience.
Highlights
- Introduction of Direct Transactions: I've introduced the concept of "direct transactions" (DirectTX), which are transactions submitted directly to the Data Availability (DA) layer, bypassing the traditional mempool. This allows for guaranteed inclusion under certain conditions.
- New Direct Transaction Processing Pipeline: A new
pkg/directtx
package has been created, housing core components such as theDirectTxReaper
(to poll DA for transactions),Extractor
(to deduplicate and process them),Sink
(to manage a persistent queue), andExecutionAdapter
(to integrate direct transactions with the existing executor). - Integration into Node Architecture: The
block.Manager
andnode.FullNode
components have been updated to integrate this new direct transaction processing. This includes handling direct transactions from DA blobs and adapting the execution flow. - Forced Inclusion Mechanism: I've implemented logic for 'forced inclusion' of direct transactions, controlled by new configurable parameters (
min_da_delay
andmax_inclusion_delay
). This ensures that direct transactions are included within a specified time window, even if the sequencer is slow or unresponsive. - Fallback Mode for Sequencer Failures: A 'fallback mode' has been introduced to handle scenarios where a sequencer fails to include a direct transaction within its designated window. This mechanism aims to ensure chain liveness and data integrity even in the face of sequencer misbehavior or downtime.
- New End-to-End Testing: Comprehensive end-to-end tests have been added to validate the forced inclusion of direct transactions, including a test case for the full node's behavior in fallback mode.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a mechanism for handling direct transactions submitted to the DA layer, which is a significant feature. It includes a new directtx
package with components like a Reaper
, Extractor
, Sink
, and an ExecutionAdapter
to integrate with the existing block production flow. A fallback mode is also introduced for when the sequencer fails to include a direct transaction. My review focuses on the correctness and robustness of this new functionality. I've identified some critical issues in error handling and the fallback mechanism's design that could lead to network instability. Additionally, there are several areas where hardcoded values should be made configurable to improve flexibility and maintainability.
pkg/directtx/exec_adapter.go
Outdated
func (d *ExecutionAdapter) ExecuteTxs(ctx context.Context, txs [][]byte, blockHeight uint64, timestamp time.Time, prevStateRoot []byte) (updatedStateRoot []byte, maxBytes uint64, err error) { | ||
if d.directTXProvider.MarkTXIncluded(ctx, txs, blockHeight, timestamp) != nil { | ||
return nil, 0, err | ||
} | ||
if !IsInFallbackMode(ctx) && // bypass check in fallback mode | ||
d.directTXProvider.HasMissedDirectTX(ctx, blockHeight, timestamp) != nil { | ||
return nil, 0, err | ||
} | ||
return d.nested.ExecuteTxs(ctx, txs, blockHeight, timestamp, prevStateRoot) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a critical bug in the error handling within ExecuteTxs
. When d.directTXProvider.MarkTXIncluded
or d.directTXProvider.HasMissedDirectTX
returns an error, the function returns (nil, 0, err)
. Because err
is a named return value that is nil
at that point in the execution, the error is effectively swallowed and not propagated up the call stack. This can mask critical issues during transaction execution.
func (d *ExecutionAdapter) ExecuteTxs(ctx context.Context, txs [][]byte, blockHeight uint64, timestamp time.Time, prevStateRoot []byte) (updatedStateRoot []byte, maxBytes uint64, err error) { | |
if d.directTXProvider.MarkTXIncluded(ctx, txs, blockHeight, timestamp) != nil { | |
return nil, 0, err | |
} | |
if !IsInFallbackMode(ctx) && // bypass check in fallback mode | |
d.directTXProvider.HasMissedDirectTX(ctx, blockHeight, timestamp) != nil { | |
return nil, 0, err | |
} | |
return d.nested.ExecuteTxs(ctx, txs, blockHeight, timestamp, prevStateRoot) | |
} | |
func (d *ExecutionAdapter) ExecuteTxs(ctx context.Context, txs [][]byte, blockHeight uint64, timestamp time.Time, prevStateRoot []byte) (updatedStateRoot []byte, maxBytes uint64, err error) { | |
if err := d.directTXProvider.MarkTXIncluded(ctx, txs, blockHeight, timestamp); err != nil { | |
return nil, 0, err | |
} | |
if !IsInFallbackMode(ctx) { // bypass check in fallback mode | |
if err := d.directTXProvider.HasMissedDirectTX(ctx, blockHeight, timestamp); err != nil { | |
return nil, 0, err | |
} | |
} | |
return d.nested.ExecuteTxs(ctx, txs, blockHeight, timestamp, prevStateRoot) | |
} |
func (f *FallbackBlockProducer) CreateFallbackBlock(ctx context.Context) ([][]byte, error) { | ||
//todo: we need deterministic header and data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CreateFallbackBlock
function is intended to produce deterministic blocks when the sequencer is down. However, the //todo
comment correctly identifies that the current implementation is not deterministic regarding the block header and data. If different full nodes enter fallback mode, they will likely produce different blocks, leading to a network fork. This is a critical flaw in the fallback mechanism that must be addressed to ensure chain consistency during a sequencer outage.
if errors.Is(err, directtx.ErrDirectTXWindowMissed) { | ||
// the sequencer has missed to include a direct TX. Either by censoring or downtime | ||
m.fallbackMode = true | ||
return types.State{}, err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fallbackMode
flag is set to true
when ErrDirectTXWindowMissed
occurs, but there appears to be no mechanism to reset it to false
. This means that once a node enters fallback mode, it could remain in that mode indefinitely. This could have serious implications for the node's behavior and performance. A strategy for exiting fallback mode should be implemented, whether it's automatic after a certain condition is met or requires manual intervention.
var directTXSeq coresequencer.DirectTxSequencer | ||
if nodeConfig.Node.Aggregator { | ||
// Aggregator nodes use the full DirectTxSequencer that can sequence direct transactions | ||
directTXSeq = single.NewDirectTxSequencer(sequencer, logger, datastore, 100, nodeConfig.ForcedInclusion) // todo (Alex): what is a good max value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const ( // copied from da/jsonclient/internal | ||
defaultGovMaxSquareSize = 64 | ||
defaultMaxBytes = defaultGovMaxSquareSize * defaultGovMaxSquareSize * share.ContinuationSparseShareContentSize | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The constants defaultGovMaxSquareSize
and defaultMaxBytes
are copied from another package. This can lead to inconsistencies if the original values in the source package change. To improve maintainability, these constants should be imported from a shared package or the source package should be updated to export them.
@@ -118,7 +134,10 @@ func newFullNode( | |||
return nil, err | |||
} | |||
|
|||
reaper := block.NewReaper( | |||
const maxBlockBytes = 1024 * 1024 // todo (Alex): what is a good default? make configurable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
txs, err := d.directTXProvider.GetPendingDirectTXs(ctx, d.daHeightSource.GetDAIncludedHeight(), d.maxBlockBytes) | ||
if err != nil { | ||
return nil, err | ||
} | ||
return append(txs, otherTx...), nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In GetTxs
, direct transactions are fetched and appended to mempool transactions. This could lead to a situation where a large number of direct transactions fill up the entire block, effectively starving transactions from the mempool. The // todo
comment also raises this concern. Consider implementing a strategy to reserve a portion of the block for mempool transactions to ensure fairness and prevent one channel from dominating block space.
@@ -0,0 +1,19 @@ | |||
# direct TX | |||
|
|||
- What format can we expect for the direct TX on the DA? some wrapper type would be useful to unpack and distinct from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is an open design question. we could either set up some sort of frontend like optimism portal (https://docs.optimism.io/app-developers/tutorials/transactions/send-tx-from-eth#trigger-the-transaction) that allows users to directly submit direct TXs
or
just allow bake in this functionality of submitting a direct tx or set of a them from any full node which can publish a set of txs to the namespace in some format deserializable by the sequencer.
|
||
- What format can we expect for the direct TX on the DA? some wrapper type would be useful to unpack and distinct from | ||
random bytes | ||
- Should we always include direct TX although they may be duplicates to mempool TX? spike: yes, first one wins. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, first one wins. even if there's a duplicate, the duplicate would just fail automatically due to usual tx replay protections in state machines like nonces
- What format can we expect for the direct TX on the DA? some wrapper type would be useful to unpack and distinct from | ||
random bytes | ||
- Should we always include direct TX although they may be duplicates to mempool TX? spike: yes, first one wins. | ||
- Should we fill the block space with directTX if possible or reserve space for mempool TX |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, this case is when there's a lot of direct txs and mempool txs at the same time, so there needs to be a decision on what to do here. We can limit the amount of direct txs in each evolve block to some fixed number (can be made dynamic later) which follows an inclusion window so that the sequencer's mempool txs are still the ones included in an evolve block the fastest.
random bytes | ||
- Should we always include direct TX although they may be duplicates to mempool TX? spike: yes, first one wins. | ||
- Should we fill the block space with directTX if possible or reserve space for mempool TX | ||
- What should we do when a sequencer was not able to add a direct-TX within the time window? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A sequencer following the rules of the network should be able to add a direct tx within the time window. If there's too many direct txs to include in a time window, then everyone can see that and those extra direct txs would just be ignored. This responsibility of telling a user if their direct tx will be included or not, or has be included can be passed on to the frontends.
- Should we always include direct TX although they may be duplicates to mempool TX? spike: yes, first one wins. | ||
- Should we fill the block space with directTX if possible or reserve space for mempool TX | ||
- What should we do when a sequencer was not able to add a direct-TX within the time window? | ||
- Do we need to trace the TX source chanel? There is currently no way to find out if a sequencer adds a direct TX "to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I don't think it's necessary to trace the source channel, maybe it's useful for the sequencer itself though.
The sequencer has preferred execution rights in an evolve block so it can included whatever transactions it wants really since only valid txs will be actually executed. If the sequencer adds a direct tx, X, before it's even included in the DA layer, then every full node can see that that X was included already so that's perfectly valid. I do not see how this can be harmful to do.
- What should we do when a sequencer was not able to add a direct-TX within the time window? | ||
- Do we need to trace the TX source chanel? There is currently no way to find out if a sequencer adds a direct TX "to | ||
early". Can be a duplicate from mempool as well. | ||
- What is a good max size for a sequencer/fallback block? Should this be configurable or genesis? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean a good max size
? The max size of an evolve block is equal to the max blob size acceptable by the DA layer (minus any fixed serialization overhead). I think this is already set to to the DA max blob size in Evolve's current version and is only relevant to the sequencer. May be making it configurable is useful in the future but I don't see how it's related to direct txs. Can you please elaborate?
- Do we need to trace the TX source chanel? There is currently no way to find out if a sequencer adds a direct TX "to | ||
early". Can be a duplicate from mempool as well. | ||
- What is a good max size for a sequencer/fallback block? Should this be configurable or genesis? | ||
- Do we need to track the sequencer DA activity? When it goes down with no direct TX on DA there are no state changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in order to decide when to enter fallback mode, full nodes need to assess the liveness of the sequencer. The liveness of the sequencer is dependent on whether they're posting evolve blocks on the DA layer.
Gossiping evolve blocks over P2P shouldn't be considered the way to check liveness since they're vulnerable to P2P split network attacks like eclipse attacks. Opting to listen over P2P is purely optional for a full node. So we should assume the case for a full node only listening to DA for evolve blocks
- What is a good max size for a sequencer/fallback block? Should this be configurable or genesis? | ||
- Do we need to track the sequencer DA activity? When it goes down with no direct TX on DA there are no state changes. | ||
When it goes down and direct TX missed their time window, fullnodes switch to fallback mode anyway. | ||
- How do we restore the chain from recovery mode? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming by recovery mode, you mean fallback mode here. There doesn't need to be socially coordinated fork necessarily. Let's say the original sequencer was just experiencing downtime and the rest of the network is in fallback mode. Now if the original sequencer comes back online, first it needs to sync to the rest of the network that's in fallback mode to get to the tip of the chain, and it can then signal by posting something to the DA layer that it's back online. Then, the rest of the network full nodes can see this signal and exit fallback mode. They can then wait for the original period of time that causes fallback mode for the sequencer to start posting Evolve blocks to the DA layer.
The alternative if the original sequencer is forever down for some reason like its signing keys being lost or something, is introducing a socially coordinated fork like you suggested
🚧 Draft - WIP
Not included are fallback mode behaviour for both components