Skip to content

Conversation

mohamedawnallah
Copy link
Contributor

@mohamedawnallah mohamedawnallah commented Oct 8, 2025

Change Description

The original regression caused by this changeset 16a8b623b.

Closes #10180.
Closes #9081.

Steps to Test

Steps for reviewers to follow to test the change.

Pull Request Checklist

Testing

  • Your PR passes all CI checks.
  • Tests covering the positive and negative (error paths) are included.
  • Bug fixes contain tests triggering the bug to prevent regressions.

Code Style and Documentation

📝 Please see our Contribution Guidelines for further guidance.

Copy link

Summary of Changes

Hello @mohamedawnallah, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This PR optimizes neutrino rescans by integrating the wallet's birthday into the rescan logic, significantly speeding up blockchain synchronization for newly created wallets. It modifies neutrinonotify and chainview components to accept and utilize a walletBirthday parameter, which is then passed as a neutrino.StartTime option to limit the rescan range. Associated test files are updated to reflect these changes.

Highlights

  • Intent: This pull request aims to optimize the neutrino rescan process by leveraging the wallet's creation date (birthday). By passing the wallet birthday to the rescan, the system can avoid scanning blocks that predate the wallet's existence, thereby fixing slow blockchain sync issues, especially when a wallet is created before the neutrino client has completed its header synchronization. This addresses the problem described in issue [bug]: wallet creation slows sync if run before neutrino headers complete #10180.
  • Changes: The core change involves propagating the wallet's birthday (time.Time) through various components that initiate neutrino rescans. This includes modifications to the neutrinonotify package for chain notifications and the chainview package for filtered chain views. The neutrino.StartTime rescan option is now utilized to limit the scan range based on the wallet's birthday. Test files have also been updated to accommodate the new walletBirthday parameter, typically setting it to the Unix epoch for full historical scanning in tests.
  • Reviewer Activity: No specific reviewer activity has been recorded or provided in the context for this pull request.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an optimization for neutrino rescan by using the wallet's birthday. The changes are well-contained and the logic is sound. I've identified a few areas for improvement, mainly related to code formatting and documentation, to align with the repository's style guide.

@mohamedawnallah mohamedawnallah force-pushed the optimize-neutrino-rescan branch from 0f26e5b to dae9086 Compare October 8, 2025 17:12
@saubyk saubyk added this to the v0.20.0 milestone Oct 8, 2025
@saubyk saubyk added this to lnd v0.20 Oct 8, 2025
@saubyk saubyk moved this to In progress in lnd v0.20 Oct 8, 2025
@Roasbeef
Copy link
Member

Roasbeef commented Oct 8, 2025

Can we get a brief recap in the OP re: what happened, was it a regression, why this fixes it?

Another follow up: if the regression was introduced in btcwallet, is that a better way to fix it?

How can we develop a regression test for this issue?

@saubyk saubyk moved this from In progress to In review in lnd v0.20 Oct 9, 2025
@saubyk saubyk moved this from In review to In progress in lnd v0.20 Oct 9, 2025
@mohamedawnallah
Copy link
Contributor Author

mohamedawnallah commented Oct 9, 2025

Hi @shocknet-justin, Would it be possible to try this potential patch if that resolves your issue (#10180)? I can try to write a reproducible test (admittedly not the easiest thing to do) but having +1/-1 that this changeset works/not works can be helpful

@hsjoberg
Copy link
Contributor

hsjoberg commented Oct 9, 2025

Thank you for looking into this one @mohamedawnallah. I believe I'm experiencing the same issue as @shocknet-justin.

I tried running your patches, but it seems like there's still weird behavior on first run.

2025-10-10 01:47:08.475 [INF] BTCN blockmanager.go:1163: Successfully got filter headers for 918 checkpoints
2025-10-10 01:47:08.475 [INF] BTCN blockmanager.go:643: Fully caught up with cfheaders at height 918000, waiting at tip for new blocks
2025-10-10 01:47:08.476 [INF] BTCN blockmanager.go:733: Attempting to fetch set of un-checkpointed filters at height=918354, hash=00000000000000000001ac0a0f9ad634af778a391b799b80ff644c851d378aa1
2025-10-10 01:47:18.483 [INF] BTCN headerlogger.go:67: Verified 96000 filter headers in the last 12.72s (height 918001, 2025-10-07 10:32:03 +0400 +04)
2025-10-10 01:47:18.483 [INF] BTCN manager.go:327: Canceling block subscription: id=2
2025-10-10 01:47:18.572 [INF] NTFN neutrino.go:693: New block: height=14001, sha=00000000b44efc0402c4841e13c29e2843776abd8ca1a728b4d12a092966e87a
2025-10-10 01:47:18.661 [INF] NTFN neutrino.go:693: New block: height=14002, sha=000000001e4c88a360df5e1f5b4f0b91676dc552acb2343f71068d4b464801f7
2025-10-10 01:47:18.989 [INF] NTFN neutrino.go:693: New block: height=14003, sha=0000000095f738799810fbf31566faac20087ffb00598a35a02ce2166e387117
2025-10-10 01:47:19.321 [INF] NTFN neutrino.go:693: New block: height=14004, sha=0000000080863585423586350af531c858597cb5b9e59f0ea3ff6d63678f62e6
[Continues like this...]

The issue can be seen with:

./lnd-debug --lnddir="./lnddir" --noseedbackup --nolisten --bitcoin.active --bitcoin.mainnet --bitcoin.node=neutrino --feeurl="https://nodes.lightning.computer/fees/v1/btc-fee-estimates.json" --routing.assumechanvalid --tlsdisableautofill --neutrino.addpeer=asia.blixtwallet.com --neutrino.addpeer=europe.blixtwallet.com --neutrino.addpeer=btcd.lnolymp.us --neutrino.addpeer=btcd-mainnet.lightning.computer

Let me know if I can help you or perhaps test something.

@shocknet-justin
Copy link

I've got some time sensitive materials I need to finish today but will test in the next day or two. It looks like Hampus is still reproducing it however, the --noseedbackup flag I believe should be triggering the wallet create as quickly as my deployment scripts would.

@mohamedawnallah mohamedawnallah force-pushed the optimize-neutrino-rescan branch from dae9086 to 16e24f1 Compare October 12, 2025 01:27
@mohamedawnallah
Copy link
Contributor Author

mohamedawnallah commented Oct 12, 2025

Benchmarking Report

With the changeset in this PR

We bring LND v0.20.0 latest release candidate to normal initial sync times that was previously taking days to complete under ideal environmental conditions now takes just over a minute.

lnd_execution_time
Execution Time (seconds)
lnd_peak_memory
Peak Memory (MB)
lnd_cpu_usage
CPU Usage (%)
Benchmarks in Raw Format
dev@dev:~/lnd$ sudo ./compare_lnd_versions.sh
==========================================
LND Version Comparison Benchmark
==========================================
Comparing 4 versions:
  - v0.18.3-beta
  - v0.18.4-beta
  - v0.18.5-beta
  - v0.20.0-beta.rc1-after

Each version will run in isolated namespaces
for fair comparison.

>>> Running v0.18.3-beta (1/4)...
==========================================
Benchmarking LND v0.18.3-beta
Using temporary storage: /tmp/lnd-bench-v0.18.3-beta-tQ9SWS
Resource Limits:
  - CPU: 4 cores
  - Memory: 8G
==========================================
Starting LND in isolated namespace with tmpfs...
Started LND (PID: 851327)
Monitoring sync status...

To manually check status, run:
  sudo nsenter --target 851327 --pid --mount --uts --ipc \
    /home/dev/lnd/lncli-debug-v0.18.3-beta --lnddir=/tmp/lnd_data_ns/lnd getinfo

[Sample 0] Block: unknown | synced: false | Memory: 348.76 MB | CPU: 0%
[Sample 1] Block: unknown | synced: false | Memory: 437.65 MB | CPU: 223.00%
[Sample 2] Block: 150000 | synced: false | Memory: 119.67 MB | CPU: 190.00%
[Sample 3] Block: 243000 | synced: false | Memory: 131.54 MB | CPU: 158.00%
[Sample 4] Block: 337000 | synced: false | Memory: 150.61 MB | CPU: 171.00%
[Sample 5] Block: 412000 | synced: false | Memory: 160.77 MB | CPU: 151.00%
[Sample 6] Block: 464000 | synced: false | Memory: 174.42 MB | CPU: 143.00%
[Sample 7] Block: 536000 | synced: false | Memory: 193.35 MB | CPU: 157.00%
[Sample 8] Block: 635000 | synced: false | Memory: 207.39 MB | CPU: 167.00%
[Sample 9] Block: 700000 | synced: false | Memory: 217.50 MB | CPU: 153.00%
[Sample 10] Block: 764000 | synced: false | Memory: 242.05 MB | CPU: 154.00%
[Sample 11] Block: 822000 | synced: false | Memory: 253.53 MB | CPU: 152.00%
[Sample 12] Block: 914000 | synced: false | Memory: 278.78 MB | CPU: 144.00%
synced_to_chain
✓ synced_to_chain: true (block height: 918649)
Stopping LND...

========================================
BENCHMARK RESULTS - v0.18.3-beta
========================================
Execution Time: 70.434696628 seconds
Peak Memory Usage: 437.65 MB
Average Memory Usage: 231.76 MB
Average CPU Usage: 157.76%
Samples Collected: 14

Results saved to: benchmark_v0.18.3-beta.json
Temporary directory will be cleaned up automatically
Cleaning up temporary directory: /tmp/lnd-bench-v0.18.3-beta-tQ9SWS

Waiting 10 seconds before next benchmark...

>>> Running v0.18.4-beta (2/4)...
==========================================
Benchmarking LND v0.18.4-beta
Using temporary storage: /tmp/lnd-bench-v0.18.4-beta-2Z0bCV
Resource Limits:
  - CPU: 4 cores
  - Memory: 8G
==========================================
Starting LND in isolated namespace with tmpfs...
Started LND (PID: 852086)
Monitoring sync status...

To manually check status, run:
  sudo nsenter --target 852086 --pid --mount --uts --ipc \
    /home/dev/lnd/lncli-debug-v0.18.4-beta --lnddir=/tmp/lnd_data_ns/lnd getinfo

[Sample 0] Block: unknown | synced: false | Memory: 363.35 MB | CPU: 0%
[Sample 1] Block: unknown | synced: false | Memory: 454.38 MB | CPU: 240.00%
[Sample 2] Block: 197000 | synced: false | Memory: 118.01 MB | CPU: 183.00%
[Sample 3] Block: 275000 | synced: false | Memory: 132.96 MB | CPU: 158.00%
[Sample 4] Block: 351000 | synced: false | Memory: 150.52 MB | CPU: 153.00%
[Sample 5] Block: 432000 | synced: false | Memory: 163.67 MB | CPU: 158.00%
[Sample 6] Block: 508000 | synced: false | Memory: 178.56 MB | CPU: 164.00%
[Sample 7] Block: 583000 | synced: false | Memory: 190.03 MB | CPU: 156.00%
[Sample 8] Block: 635000 | synced: false | Memory: 209.26 MB | CPU: 148.00%
[Sample 9] Block: 705000 | synced: false | Memory: 226.96 MB | CPU: 151.00%
[Sample 10] Block: 791000 | synced: false | Memory: 248.11 MB | CPU: 161.00%
[Sample 11] Block: 816000 | synced: false | Memory: 255.58 MB | CPU: 132.00%
[Sample 12] Block: 918000 | synced: false | Memory: 251.60 MB | CPU: 132.00%
synced_to_chain
✓ synced_to_chain: true (block height: 918649)
Stopping LND...

========================================
BENCHMARK RESULTS - v0.18.4-beta
========================================
Execution Time: 70.358434022 seconds
Peak Memory Usage: 454.38 MB
Average Memory Usage: 234.59 MB
Average CPU Usage: 157.61%
Samples Collected: 14

Results saved to: benchmark_v0.18.4-beta.json
Temporary directory will be cleaned up automatically
Cleaning up temporary directory: /tmp/lnd-bench-v0.18.4-beta-2Z0bCV

Waiting 10 seconds before next benchmark...

>>> Running v0.18.5-beta (3/4)...
==========================================
Benchmarking LND v0.18.5-beta
Using temporary storage: /tmp/lnd-bench-v0.18.5-beta-XHFt2i
Resource Limits:
  - CPU: 4 cores
  - Memory: 8G
==========================================
Starting LND in isolated namespace with tmpfs...
Started LND (PID: 852853)
Monitoring sync status...

To manually check status, run:
  sudo nsenter --target 852853 --pid --mount --uts --ipc \
    /home/dev/lnd/lncli-debug-v0.18.5-beta --lnddir=/tmp/lnd_data_ns/lnd getinfo

[Sample 0] Block: unknown | synced: false | Memory: 352.79 MB | CPU: 0%
[Sample 1] Block: unknown | synced: false | Memory: 475.08 MB | CPU: 244.00%
[Sample 2] Block: 207000 | synced: false | Memory: 116.33 MB | CPU: 180.00%
[Sample 3] Block: 256000 | synced: false | Memory: 138.87 MB | CPU: 144.00%
[Sample 4] Block: 347000 | synced: false | Memory: 156.42 MB | CPU: 163.00%
[Sample 5] Block: 434000 | synced: false | Memory: 166.09 MB | CPU: 163.00%
[Sample 6] Block: 514000 | synced: false | Memory: 179.01 MB | CPU: 161.00%
[Sample 7] Block: 571000 | synced: false | Memory: 190.89 MB | CPU: 150.00%
[Sample 8] Block: 635000 | synced: false | Memory: 208.35 MB | CPU: 143.00%
[Sample 9] Block: 721000 | synced: false | Memory: 221.60 MB | CPU: 163.00%
[Sample 10] Block: 795000 | synced: false | Memory: 237.88 MB | CPU: 162.00%
[Sample 11] Block: 854000 | synced: false | Memory: 262.40 MB | CPU: 156.00%
synced_to_chain
✓ synced_to_chain: true (block height: 918649)
Stopping LND...

========================================
BENCHMARK RESULTS - v0.18.5-beta
========================================
Execution Time: 65.119188935 seconds
Peak Memory Usage: 475.08 MB
Average Memory Usage: 230.24 MB
Average CPU Usage: 162.00%
Samples Collected: 13

Results saved to: benchmark_v0.18.5-beta.json
Temporary directory will be cleaned up automatically
Cleaning up temporary directory: /tmp/lnd-bench-v0.18.5-beta-XHFt2i

Waiting 10 seconds before next benchmark...

>>> Running v0.20.0-beta.rc1-after (4/4)...
==========================================
Benchmarking LND v0.20.0-beta.rc1-after
Using temporary storage: /tmp/lnd-bench-v0.20.0-beta.rc1-after-LeHuC4
Resource Limits:
  - CPU: 4 cores
  - Memory: 8G
==========================================
Starting LND in isolated namespace with tmpfs...
Started LND (PID: 854003)
Monitoring sync status...

To manually check status, run:
  sudo nsenter --target 854003 --pid --mount --uts --ipc \
    /home/dev/lnd/lncli-debug-v0.20.0-beta.rc1-after --lnddir=/tmp/lnd_data_ns/lnd getinfo

[Sample 0] Block: unknown | synced: false | Memory: 356.35 MB | CPU: 0%
[Sample 1] Block: unknown | synced: false | Memory: 489.78 MB | CPU: 249.00%
[Sample 2] Block: 207000 | synced: false | Memory: 144.97 MB | CPU: 222.00%
[Sample 3] Block: 278000 | synced: false | Memory: 172.72 MB | CPU: 203.00%
[Sample 4] Block: 355000 | synced: false | Memory: 207.43 MB | CPU: 221.00%
[Sample 5] Block: 436000 | synced: false | Memory: 242.13 MB | CPU: 224.00%
[Sample 6] Block: 510000 | synced: false | Memory: 269.44 MB | CPU: 216.00%
[Sample 7] Block: 577000 | synced: false | Memory: 303.05 MB | CPU: 222.00%
[Sample 8] Block: 657000 | synced: false | Memory: 324.72 MB | CPU: 228.00%
[Sample 9] Block: 721000 | synced: false | Memory: 354.95 MB | CPU: 214.00%
[Sample 10] Block: 771000 | synced: false | Memory: 378.89 MB | CPU: 208.00%
[Sample 11] Block: 795000 | synced: false | Memory: 405.87 MB | CPU: 183.00%
[Sample 12] Block: 872000 | synced: false | Memory: 392.42 MB | CPU: 227.00%
synced_to_chain
✓ synced_to_chain: true (block height: 918649)
Stopping LND...

========================================
BENCHMARK RESULTS - v0.20.0-beta.rc1-after
========================================
Execution Time: 70.416534861 seconds
Peak Memory Usage: 489.78 MB
Average Memory Usage: 308.40 MB
Average CPU Usage: 208.15%
Samples Collected: 14

Results saved to: benchmark_v0.20.0-beta.rc1-after.json
Temporary directory will be cleaned up automatically
Cleaning up temporary directory: /tmp/lnd-bench-v0.20.0-beta.rc1-after-LeHuC4

==========================================
COMPARISON RESULTS
==========================================

Metric                    | v0.18.3-beta      | v0.18.4-beta      | v0.18.5-beta      | v0.20.0-beta.rc1-after
------------------------- | ----------------- | ----------------- | ----------------- | -----------------
Execution Time (s)        | 70.434696628      | 70.358434022      | 65.119188935      | 70.416534861
Peak Memory (MB)          | 437.65            | 454.38            | 475.08            | 489.78
Average Memory (MB)       | 231.76            | 234.59            | 230.24            | 308.4
Average CPU (%%)          | 157.76            | 157.61            | 162               | 208.15

Summary:
--------
✓ Fastest: v0.18.5-beta (65.119188935s)
✓ Lowest Peak Memory: v0.18.3-beta (437.65 MB)
✓ Lowest Average CPU: v0.18.4-beta (157.61%)

Detailed logs available in:
  lnd_v0.18.3-beta.log
  lnd_v0.18.4-beta.log
  lnd_v0.18.5-beta.log
  lnd_v0.20.0-beta.rc1-after.log
Conclusions

Execution time for v0.20.0-beta.rc1 with the changeset in this PR remains competitive at 70.42 seconds, matching the non-regression performance of v0.18.3-beta and v0.18.4-beta (70.43s and 70.36s respectively) while being only 8.1% slower than v0.18.5-beta (65.12s). Most importantly, this represents a dramatic recovery from the multi-day sync times that plagued v0.19.0-beta, v0.19.1-beta, v0.19.2-beta, v0.19.3-beta, and v0.20.0 current release candidates - what previously took days now completes in just over a minute, making LND v0.20.0 fully operational again.

Constraints

  • Each LND Benchmark runs its own isolated linux namespace with all same upper bound for cgroup resources for fair and representative results

Reproducibility

  • Choosing the target release branches to benchmark against, build LND in debug mode using make build, and then tweak compare_lnd_versions.sh script to match generated artifacts names. Then you would be able to run it and get those benchmarking results. Seeing the benchmarks in raw format section above for perhaps a more verbose helpful output

Resources

@mohamedawnallah
Copy link
Contributor Author

mohamedawnallah commented Oct 12, 2025

cc @hsjoberg, @shocknet-justin Pushed a new changeset and it will solve the wallet sync issues you experienced and this one as well #9081

@mohamedawnallah mohamedawnallah changed the title neutrinonotify+chainview: optimize neutrino rescan with wallet birthday rpcserver: avoid false negatives in dispatcher during the initial sync Oct 12, 2025
@shocknet-justin
Copy link

@mohamedawnallah I think this partially solved the issue, in that lnd reports syncd_to_chain=true after headers complete, which should unblock our deployments, but what I understand to be the rescan is still taking place after headers sync:

image

@mohamedawnallah
Copy link
Contributor Author

@mohamedawnallah I think this partially solved the issue, in that lnd reports syncd_to_chain=true after headers complete, which should unblock our deployments, but what I understand to be the rescan is still taking place after headers sync:

image

It took me two hours to identify the underlying root cause. The changeset I submitted was addressing the symptom rather than the core issue. I'll submit a new changeset tomorrow that fully resolves the root cause. Thanks for being responsive during the resolution process!

rpcserver.go Outdated
// Sync state transitions as dispatcher catches up:
// Gap >10 blocks: synced (ignore dispatcher lag during catchup)
// Gap 1-10 blocks: not synced (strict: must be at exact tip)
// Gap 0 blocks: synced (dispatcher caught up)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blockbeatDispatcher is receiving blocks and dispatching them to blockbeat subscribers, I don't think it's safe to assume we've synced. It's likely that one of the subsystems is still busy processing the block, causing the blockbeat to stall. We should instead identify what's causing the stall fix properly fix it, other than giving user the false impression that lnd is synced.

@mohamedawnallah
Copy link
Contributor Author

mohamedawnallah commented Oct 13, 2025

Okay today's experiments haven't come to a meaningful conclusion fix. I am 100% confident that is the main changeset that introduced this regression is this one: 16a8b623b. I guess it is more of race condition/not up-to-date height value after the chain notifier moved from its place to there.

Building ANY LND before that commit will NOT TRIGGER any premature wallet rescanning.
Building ANY LND during/after that commit will TRIGGER the premature wallet rescanning.

Will try tomorrow with different experiments :)

@mohamedawnallah mohamedawnallah changed the title rpcserver: avoid false negatives in dispatcher during the initial sync rpcserver: resolve root cause of premature wallet rescanning Oct 14, 2025
@mohamedawnallah mohamedawnallah force-pushed the optimize-neutrino-rescan branch from 16e24f1 to a93c2d3 Compare October 15, 2025 20:26
@mohamedawnallah
Copy link
Contributor Author

mohamedawnallah commented Oct 15, 2025

Alrighty, I have submitted a new changeset and It will solve the premature rescanning issue. Something you will notice for now when trying this is that the LND RPC server will not start till the headers will be fully synced to chain that means issuing any RPC calls/commands against LND won't be responsive while headers syncing. That can be noticed on first run, on subsequent runs is negligble unless there is a big timing difference between those runs.

I will put later today a root cause analysis report that will help answering the following genuine questions:

  1. What triggers this premature rescanning?
  2. Why when I restart LND that premature rescanning not happening?
  3. Is really syncing headers before starting RPC server necessary? if not what needs to happen to allow this (if any input)? If it is indeed necessary how can the gap between neutrino p2p headers sync and starting LND RPC server be significantly reduced (on the first LND run)?
  4. How using other chain backends e.g btcd, bitcoind would affect the gap between initial sync and LND RPC server startup?
  5. What are the LND releases impacted by this?

For now you can experiment with this changeset if that solves your wallet sync issues

cc @shocknet-justin, @hsjoberg

@shocknet-justin
Copy link

Our workaround is already to no not create until its done, but that's not ideal because its slowing the on-boarding flow relative to prior versions, though its not the end of the world either since its usually only by a minute or two

The lack of rpc at all is a bigger problem since we have to parse logs to know how things are progressing, otherwise we're polling into the void... and both options can be flakey

An rpc for the state that is available immediately would solve the latter problem

@mohamedawnallah
Copy link
Contributor Author

mohamedawnallah commented Oct 15, 2025

I understand this is not the ideal solution. We are iterating on what the ideal solution could look like. Our process has been:

  1. First, we verified whether this is an existing issue for current users
  2. Then, we identified and analyzed whether this is a regression, and if so, what the root cause might be
  3. Now, we are evaluating intentional changes to resolve that root cause

Why does the current changeset move headers syncing before server start?

Let me explain by comparing the old behavior (non-regression) with the new behavior.

Old Behavior (Non-Regression)

We can use commit 1dfb5a0 as a representative example of the LND version where premature rescanning was not happening.

In that version, the sequence was:

  1. RPC server starts first
  2. Full syncing happens as part of the daemon server
  3. After headers are fully synced, the server calls Chain Notifier, which starts the Neutrino rescanning process
  4. Rescanning begins from the current height (which is confident to be recent since we just synced)

References:

  • lnd/lnd.go

    Lines 694 to 751 in 1dfb5a0

    type syncResult struct {
    synced bool
    bestBlockTime int64
    err error
    }
    var syncedResChan = make(chan syncResult, 1)
    for {
    // We check if the wallet is synced in a separate goroutine as
    // the call is blocking, and we want to be able to interrupt it
    // if the daemon is shutting down.
    go func() {
    synced, bestBlockTime, err := activeChainControl.Wallet.
    IsSynced()
    syncedResChan <- syncResult{synced, bestBlockTime, err}
    }()
    select {
    case <-interceptor.ShutdownChannel():
    return nil
    case res := <-syncedResChan:
    if res.err != nil {
    return mkErr("unable to determine if wallet "+
    "is synced", res.err)
    }
    ltndLog.DebugS(ctx, "Syncing to block chain",
    "best_block_time", time.Unix(res.bestBlockTime, 0),
    "is_synced", res.synced)
    if res.synced {
    break
    }
    // If we're not yet synced, we'll wait for a second
    // before checking again.
    select {
    case <-interceptor.ShutdownChannel():
    return nil
    case <-time.After(time.Second):
    continue
    }
    }
    break
    }
    _, bestHeight, err = activeChainControl.ChainIO.GetBestBlock()
    if err != nil {
    return mkErr("unable to determine chain tip", err)
    }
    ltndLog.InfoS(ctx, "Chain backend is fully synced!",
    "end_height", bestHeight)
  • lnd/server.go

    Lines 2129 to 2134 in 1dfb5a0

    cleanup = cleanup.add(s.cc.ChainNotifier.Stop)
    if err := s.cc.ChainNotifier.Start(); err != nil {
    startErr = err
    return
    }

New Behavior (Regression)

With commit 16a8b62, the order of operations is FLIPPED:

  1. Chain Notifier (which triggers Neutrino rescanning) is called first in newServer, before starting the RPC server
  2. Wallet syncing happens after the RPC server starts

This means the height that Chain Notifier accesses comes from BuildChainControl, NOT from the last synced height. That's why we're seeing rescanning start from ~100,000 headers.

References:

  • lnd/lnd.go

    Lines 496 to 502 in 6ade31d

    activeChainControl, cleanUp, err := implCfg.BuildChainControl(
    partialChainControl, walletConfig,
    )
    if err != nil {
    return mkErr("error loading chain control", err)
    }
  • lnd/server.go

    Lines 736 to 747 in 6ade31d

    // Start the low-level services once they are initialized.
    //
    // TODO(yy): break the server startup into four steps,
    // 1. init the low-level services.
    // 2. start the low-level services.
    // 3. init the high-level services.
    // 4. start the high-level services.
    if err := s.startLowLevelServices(); err != nil {
    return nil, err
    }
    currentHash, currentHeight, err := s.cc.ChainIO.GetBestBlock()

Why Not Return to the Old Order?

A natural question: Why not sync first, then call Chain Notifier rescanning in a separate goroutine so it has access to recent height (i.e., move it out of newServer like the earlier variant)? That way, premature rescanning would be avoided and the RPC server wouldn't be blocked.

That's a good question. I don't really know the answer, and I may need @yyforyongyu's input on this. If it turns out that s.cc.ChainNotifier.Start() must be called before the server starts (like seen down below as currently in the codebase), then perhaps there is no other way than syncing headers first (meaning the RPC server wouldn't start yet), then starting the server so Neutrino rescanning gets the recent synced height. Otherwise, this premature rescanning will continue to happen.

Reference:

  • lnd/server.go

    Lines 2128 to 2150 in 6ade31d

    // startLowLevelServices starts the low-level services of the server. These
    // services must be started successfully before running the main server. The
    // services are,
    // 1. the chain notifier.
    //
    // TODO(yy): identify and add more low-level services here.
    func (s *server) startLowLevelServices() error {
    var startErr error
    cleanup := cleaner{}
    cleanup = cleanup.add(s.cc.ChainNotifier.Stop)
    if err := s.cc.ChainNotifier.Start(); err != nil {
    startErr = err
    }
    if startErr != nil {
    cleanup.run()
    }
    return startErr
    }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In progress

6 participants