-
-
Notifications
You must be signed in to change notification settings - Fork 60
[Java] Improve cucumber expression creation performance #202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 4 commits
5481450
73a51e1
531d380
a53b0e5
f264cf2
20cc322
fb2e664
7f5c171
b6f51b4
68999a0
99dd95d
7aef82c
6c1f3ec
afc42c8
b2febb7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -24,7 +24,27 @@ | |||||||||||||
|
||||||||||||||
@API(status = API.Status.STABLE) | ||||||||||||||
public final class CucumberExpression implements Expression { | ||||||||||||||
private static final Pattern ESCAPE_PATTERN = Pattern.compile("[\\\\^\\[({$.|?*+})\\]]"); | ||||||||||||||
/** | ||||||||||||||
* List of characters to be escaped. | ||||||||||||||
* The last char is '}' with index 125, so we need only 126 characters. | ||||||||||||||
*/ | ||||||||||||||
private static final boolean[] CHAR_TO_ESCAPE = new boolean[126]; | ||||||||||||||
static { | ||||||||||||||
CHAR_TO_ESCAPE['^'] = true; | ||||||||||||||
CHAR_TO_ESCAPE['$'] = true; | ||||||||||||||
CHAR_TO_ESCAPE['['] = true; | ||||||||||||||
CHAR_TO_ESCAPE[']'] = true; | ||||||||||||||
CHAR_TO_ESCAPE['('] = true; | ||||||||||||||
CHAR_TO_ESCAPE[')'] = true; | ||||||||||||||
CHAR_TO_ESCAPE['{'] = true; | ||||||||||||||
CHAR_TO_ESCAPE['}'] = true; | ||||||||||||||
CHAR_TO_ESCAPE['.'] = true; | ||||||||||||||
CHAR_TO_ESCAPE['|'] = true; | ||||||||||||||
CHAR_TO_ESCAPE['?'] = true; | ||||||||||||||
CHAR_TO_ESCAPE['*'] = true; | ||||||||||||||
CHAR_TO_ESCAPE['+'] = true; | ||||||||||||||
CHAR_TO_ESCAPE['\\'] = true; | ||||||||||||||
} | ||||||||||||||
private final List<ParameterType<?>> parameterTypes = new ArrayList<>(); | ||||||||||||||
private final String source; | ||||||||||||||
private final TreeRegexp treeRegexp; | ||||||||||||||
|
@@ -61,9 +81,18 @@ private String rewriteToRegex(Node node) { | |||||||||||||
} | ||||||||||||||
|
||||||||||||||
private static String escapeRegex(String text) { | ||||||||||||||
return ESCAPE_PATTERN.matcher(text).replaceAll("\\\\$0"); | ||||||||||||||
int length = text.length(); | ||||||||||||||
StringBuilder sb = new StringBuilder(length * 2); // prevent resizes | ||||||||||||||
int maxChar = CHAR_TO_ESCAPE.length; | ||||||||||||||
for (int i = 0; i < length; i++) { | ||||||||||||||
|
Variant | Description |
---|---|
escapeRegex0 | Pattern (cucumber 7.11.0) |
escapeRegex1 | append(char) with 1 array allocation and 1 array copy |
escapeRegex2 | append(String) with 1 array allocation and 1 array copy |
escapeRegex3 | append(char) with 1 array allocation and 0..1 array copy |
escapeRegex4 | append(String) with 0..1 array allocation and 0..1 array copy |
I benchmarked these variants with JMH for varying escaping frequencies:
The variant escapeRegex4
outperforms the other variants for all escaping frequencies. Depending on escaping frequency, this variant is 1.2 to 3 times faster the cucumber 7.11.0 implementation. The code is a bit more complex, but still understandable, so I think we can use this variant:
public static String escapeRegex4(String text) {
int length = text.length();
StringBuilder sb = null;
int blocStart=0;
int maxChar = CHAR_TO_ESCAPE.length;
for (int i = 0; i < length; i++) {
char currentChar = text.charAt(i);
if (currentChar < maxChar && CHAR_TO_ESCAPE[currentChar]) {
if (sb == null) {
sb = new StringBuilder(length * 2);
}
if (i > blocStart) {
// flush previous block
sb.append(text, blocStart, i);
}
sb.append('\\');
sb.append(currentChar);
blocStart=i+1;
}
}
if (sb != null) {
if (length > blocStart) {
// flush remaining characters
sb.append(text, blocStart, length);
}
return sb.toString();
}
return text;
}
In real conditions, on my project with about ~400 test scenarios runs in about 8.5 seconds. The InteliJ profiler says escapeRegex0
takes 3.37% of the total CPU time (=280 ms), while escapeRegex4
takes 0.97% of the total (=80 ms). That's a 200 ms improvement (2.3%): not a lot on one build (I was hoping for more), but worth the effort when considering the number of builds on all cucumber users.
For completeness, the escapeRegex3
variant is:
public static String escapeRegex3(String text) {
int length = text.length();
StringBuilder sb = new StringBuilder(length * 2); // prevent resizes
int maxChar = CHAR_TO_ESCAPE.length;
boolean escaped = false;
for (int i = 0; i < length; i++) {
char currentChar = text.charAt(i);
if (currentChar < maxChar && CHAR_TO_ESCAPE[currentChar]) {
sb.append('\\');
escaped = true;
}
sb.append(currentChar);
}
return escaped ? sb.toString() : text;
}
Uh oh!
There was an error while loading. Please reload this page.