Skip to content

Conversation

augustepoiroux
Copy link
Contributor

@augustepoiroux augustepoiroux commented Jun 27, 2025

Goal: This PR aims at reducing redundant computation automatically.
In essence, this PR revives the repl++ idea by @kim-em, and adds the trie idea to share state computation across all commands in constant time instead of only with the previous command. It mirrors VS Code experience (orange bar on the left), but it takes into account the whole editing history, and it can share computation across files and commands (given that they share a common prefix).

Feature: For a given command, this PR finds the longest string prefix shared among all previously submitted commands and reuses the corresponding incremental state to share the computation.
It additionally automatically caches and shares header processing (which is not automatically handled by the incremental states above). For instance, if multiple commands use "import Mathlib" as a header, it will be only loaded once. This also fixes #77 where empty headers are reprocessed for each command, introducing large unexpected overhead.
This introduces a new way of using the REPL: instead of manually building a chain of REPL environments, one can now send full file content while adding/editing content, and the REPL will automatically find the best environment state shared with previous commands.

Overhead: The longest prefix matching is done using a trie. It has a theoretical complexity linear in the number of characters of the input, and independent of the number of previous commands.
In my testing, I didn't observe any noticeable overhead introduced by this new code and #109, in terms of both computation time and memory usage. But this should be tested more thoroughly in real-world use-cases.

Example: An example might be more explanatory. Here is the content file1.lean:

import Mathlib

theorem mathd_numbertheory_188 : Nat.gcd 180 168 = 12 := by norm_num

def fib : Nat → Nat
  | 0 => 0
  | 1 => 1
  | n + 2 => fib (n + 1) + fib n
#eval fib 35

theorem foo : n = n := by
  rfl

and here is the content of file2.lean:

import Mathlib

theorem mathd_numbertheory_188 : Nat.gcd 180 168 = 12 := by norm_num

def fib : Nat → Nat
  | 0 => 0
  | 1 => 1
  | n + 2 => fib (n + 1) + fib n
#eval fib 35

theorem foo2 : n = n + 0 := by
  rfl

When sending {"path": "file1.lean"} and then {"path": "file2.lean"} to the REPL in this PR, the following now happens:

  • file1.lean is processed as usual. In this example, it takes a few seconds to run.
  • When processing file2.lean, everything up to #eval fib 35 is detected as the longest prefix match and computation is skipped by reusing header state + incremental state, and only theorem foo2 has to be fully processed. So running this after file1.lean is now almost instant. This particular example can be found in test test/Mathlib/test/incrementality.in.

Limits:

  • Header caching is based on exact string matching on the full header, meaning that if you do import Lean\nimport Mathlib in a command and then import Mathlib in a second command, computation will not be shared and Mathlib will be loaded twice.

Note: This PR depends on #109

@augustepoiroux augustepoiroux changed the title Automated prefix computation optimization Automated prefix computation sharing optimization Jun 28, 2025
@augustepoiroux augustepoiroux changed the title Automated prefix computation sharing optimization Automated prefix state sharing optimization Jun 28, 2025
@augustepoiroux augustepoiroux force-pushed the auto-prefix-optimization branch 3 times, most recently from b382af6 to 481094b Compare July 4, 2025 10:22
@augustepoiroux augustepoiroux force-pushed the auto-prefix-optimization branch 2 times, most recently from 04a2642 to fd0236e Compare July 8, 2025 16:51
@augustepoiroux augustepoiroux force-pushed the auto-prefix-optimization branch from fd0236e to 6b421e8 Compare July 24, 2025 07:41
@augustepoiroux augustepoiroux force-pushed the auto-prefix-optimization branch 2 times, most recently from 053bd46 to cdfaccc Compare August 15, 2025 11:43
@augustepoiroux augustepoiroux force-pushed the auto-prefix-optimization branch from d4d965c to 2fa4a97 Compare September 15, 2025 16:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LeanRepl terminates trying to run large amounts of commands on a single LeanRepl instance
1 participant