Automated prefix state sharing optimization #110

augustepoiroux · 2025-06-27T17:02:55Z

Goal: This PR aims at reducing redundant computation automatically.
In essence, this PR revives the repl++ idea by @kim-em, and adds the trie idea to share state computation across all commands in constant time instead of only with the previous command. It mirrors VS Code experience (orange bar on the left), but it takes into account the whole editing history, and it can share computation across files and commands (given that they share a common prefix).

Feature: For a given command, this PR finds the longest string prefix shared among all previously submitted commands and reuses the corresponding incremental state to share the computation.
It additionally automatically caches and shares header processing (which is not automatically handled by the incremental states above). For instance, if multiple commands use "import Mathlib" as a header, it will be only loaded once. This also fixes #77 where empty headers are reprocessed for each command, introducing large unexpected overhead.
This introduces a new way of using the REPL: instead of manually building a chain of REPL environments, one can now send full file content while adding/editing content, and the REPL will automatically find the best environment state shared with previous commands.

Overhead: The longest prefix matching is done using a trie. It has a theoretical complexity linear in the number of characters of the input, and independent of the number of previous commands.
In my testing, I didn't observe any noticeable overhead introduced by this new code and #109, in terms of both computation time and memory usage. But this should be tested more thoroughly in real-world use-cases.

Example: An example might be more explanatory. Here is the content file1.lean:

import Mathlib

theorem mathd_numbertheory_188 : Nat.gcd 180 168 = 12 := by norm_num

def fib : Nat → Nat
  | 0 => 0
  | 1 => 1
  | n + 2 => fib (n + 1) + fib n
#eval fib 35

theorem foo : n = n := by
  rfl

and here is the content of file2.lean:

import Mathlib

theorem mathd_numbertheory_188 : Nat.gcd 180 168 = 12 := by norm_num

def fib : Nat → Nat
  | 0 => 0
  | 1 => 1
  | n + 2 => fib (n + 1) + fib n
#eval fib 35

theorem foo2 : n = n + 0 := by
  rfl

When sending {"path": "file1.lean"} and then {"path": "file2.lean"} to the REPL in this PR, the following now happens:

file1.lean is processed as usual. In this example, it takes a few seconds to run.
When processing file2.lean, everything up to #eval fib 35 is detected as the longest prefix match and computation is skipped by reusing header state + incremental state, and only theorem foo2 has to be fully processed. So running this after file1.lean is now almost instant. This particular example can be found in test test/Mathlib/test/incrementality.in.

Limits:

Header caching is based on exact string matching on the full header, meaning that if you do import Lean\nimport Mathlib in a command and then import Mathlib in a second command, computation will not be shared and Mathlib will be loaded twice.

Note: This PR depends on #109

augustepoiroux changed the title ~~Automated prefix computation optimization~~ Automated prefix computation sharing optimization Jun 28, 2025

augustepoiroux changed the title ~~Automated prefix computation sharing optimization~~ Automated prefix state sharing optimization Jun 28, 2025

augustepoiroux force-pushed the auto-prefix-optimization branch 3 times, most recently from b382af6 to 481094b Compare July 4, 2025 10:22

augustepoiroux force-pushed the auto-prefix-optimization branch 2 times, most recently from 04a2642 to fd0236e Compare July 8, 2025 16:51

augustepoiroux mentioned this pull request Jul 9, 2025

Declaration information extraction #119

Draft

augustepoiroux force-pushed the auto-prefix-optimization branch from fd0236e to 6b421e8 Compare July 24, 2025 07:41

augustepoiroux force-pushed the auto-prefix-optimization branch 2 times, most recently from 053bd46 to cdfaccc Compare August 15, 2025 11:43

augustepoiroux added 17 commits September 15, 2025 18:24

Improve LocalContext and LocalInstance extraction for sorries

74f0ed4

Add test

e3a85b6

First attempt at doing incremental processing of commands

511813e

Implement incrmental collection of sorries

e312daa

Fix test output

27f4704

Update tests

e446c32

Improving fine-grained environment

a62aa57

Revert constant replay

1de74b6

Fix v4.22.0-rc2

112c71e

Fix test v4.24.0-rc1

8de3d74

Init commit

638f425

Implement trie-based prefix matching

73c2b84

Add test example

79860c9

Fix sensitivity to comments and whitespaces in the header

0722aaa

Fix trie stack overflow issue

1885101

Make incrementality optional

f3aaebf

Update incrementality test

2fa4a97

augustepoiroux force-pushed the auto-prefix-optimization branch from d4d965c to 2fa4a97 Compare September 15, 2025 16:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Automated prefix state sharing optimization #110

Automated prefix state sharing optimization #110

Uh oh!

augustepoiroux commented Jun 27, 2025 •

edited

Loading

Uh oh!

Uh oh!

Automated prefix state sharing optimization #110

Are you sure you want to change the base?

Automated prefix state sharing optimization #110

Uh oh!

Conversation

augustepoiroux commented Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

augustepoiroux commented Jun 27, 2025 •

edited

Loading