Skip to content

Conversation

nitely
Copy link
Owner

@nitely nitely commented Apr 11, 2024

In theory this speeds up regex containing hundreds of alternations (see #138). How? the first state contains a state per alternation; the optimization reduces the number of alternations (at the very least the outer one), so the first state will be capped to [a-zA-Z-0-9] (+ symbols) states in the case of ASCII, so 1K initial states will get reduced to ~50. For find all this matters a lot because it tries to match the initial states to every input character.

The description is in this gist https://gist.github.com/nitely/745c8cabdf06ba2d37f8cf5cda3aea5f

This is just a PoC, though. I'm just optimizing the simplest case, since it should speed up #138

it's also a (broken) WIP that cannot even be tested.

@nitely
Copy link
Owner Author

nitely commented May 23, 2024

We can go further than just literals, consider (^abc|^acb) -> ^a(bc|cb). The same can be applied to suffixes, consider (car|bar) -> (b|c)ar, this would be useful for the literals optimization to kick in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant