-
Notifications
You must be signed in to change notification settings - Fork 56
Add batch processing to LeanRepl #93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
see README.md for more details about all of the available config options for batch commands |
Amazing feature, thanks for sharing it!
|
the python code is just for benchmarking, I can get rid of it before we merge (and generate a corresponding |
One useful option could also be something along the lines of "skip processing elements if an earlier example got verified okay". This can be useful for verifying multiple proofs for the same problem produced by whole-proof completeion LLMs (e.g. 32 or 256 or several thousand roll-outs). Sometimes we are already satisfied if we found a single proof and for time-saving, all further proofs of the given problem can be skipped |
I agree this can be a useful feature. |
i think to do this efficiently you would need to poll each pending task to see if it has yielded a successful result at regular intervals kind of like I do for the |
This sounds like a good practical approach. I wonder if there is a more efficient way of doing this though. |
It could also be good to process the batch in micro-batches (when configured) - so that the CPU utilization can be maximized. When I ran REPL, I got typically 10% utilization of a single CPU core. So bringing it up would be useful. Another question is multi-core processing. Then maybe proofs should be annotated with problem-name, so that simultaneously multiple micro-batches from different problems can be processed |
If you look at the readme file in this PR, I think @FrederickPu already implemented that through the |
When running in parallel mode on a VM with 300+ cores I max out at a 7x speedup. So I'm not sure if core level parallelism is being acheived |
Should we do thread level parallelism within each bucket? |
{ "cmds": ["theorem womp : 2 + 2 = 4 := by rfl", "theorem womp1 : 2 + 4 = 6 := by rfl"]} | ||
``` | ||
|
||
All the same options from Command can be used and will be applied to each command in the `cmds` array. Additionally, you can specify the parrallelism mode using `mode` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parallel has 1 r
.
Able to process multiple commands at once using
Also added multithreading parallelism using
Task
monad as well as option to garbage collect command snapshots (this overlaps with #83.Batch commands have timeouts to prevent one bad proof from stalling the batch which overlaps with #92.
Timeouts are in milliseconds and can be set using an option
Right now, there is no command snapshotting for batch commands.