-
Couldn't load subscription status.
- Fork 19
Regular expressions
Regular expressions can be used to recognize strings that match a certain pattern. PawnPlus exposes the regex features of C++ in a simple yet useful set of functions that support simple matching (finding a pattern in a string), extraction (indetifying submatches in a pattern), and replacement (substituting a matched portion of a string with a new substring).
All regex functions accept a regex_options bit-field. The regex_options enum is a combination of C++ types syntax_option_type and match_flag_type, allowing changing the language or the way patterns are matched.
regex_cached is an additional option added by PawnPlus that can increase the efficiency of matching. Usually, every function call has to construct the regex object from the pattern, but with regex_cached, the object is stored in a map and can be retrieved later when the same pattern and options are used.
The default grammar is a modified ECMAScript, but can be changed via the options.
Matching is the simplest application of regular expressions. The function str_match can be used to test if a string contains the provided pattern.
assert str_match(@("apple banana orange"), \"\b(apple|banana)\b");The function doesn't match the whole string against the pattern, just the first occurence. ^ and $ can be used to anchor the pattern at the beginning and at the end of the string. The pos parameter can be used to specify the starting offset of matching and to obtain the ending offset of the match.
str_extract looks for a pattern in a string and constructs a new list holding all the groups captured from the string, or List:0 in case of no match. This function can also be used to iterate over all occurences of the pattern:
new pos = 0;
new String:str = @("apple banana orange");
new List:l;
while((l = str_extract(str, \"\b[[:alpha:]]+\b", .pos=pos)))
{
print_s(list_get_str_s(l, 0));
list_delete(l);
}
//prints all words in the stringAll occurences of a pattern in a string can be replaced with another string, a list, or a function.
str_replace can be used to specify a single replacement string. If you use capturing groups, you can refer to them via $1, $2 (or \1, \2) etc. in the replacement string.
assert str_replace(@("apple banana orange"), \"\b([[:alpha:]]+)\b", "word($1)") == @("word(apple) word(banana) word(orange)");Sometimes, it is useful to replace multiple things at once, but be able to select the specific replacement based on the pattern. For this usage, it is possible to use str_replace_list:
new List:l = list_new_args_str("1", "0");
print_s(str_replace_list(@("apple"), "(..)|(.)", l)); //110
list_delete(l);The function finds the first range of successfully matched groups and gets the list index corresponding to the first group.
The individual replacement strings can reference the match:
new List:l = list_new_args_str("($1)", "[$1]");
print_s(str_replace_list(@("(abc)[def]"), \"\[(.*?)\]|\((.*?)\)", l)); //[abc](def)
list_delete(l);However, if more groups are part of a single alternative, the function cannot determine this information from the pattern, and so you have to add an empty string in its place in the list:
new List:l = list_new_args_str("($2:$1)", "", "[$2:$1]");
print_s(str_replace_list(@("(a:b)[c:d]"), \"\[(.*?):(.*?)\]|\((.*?):(.*?)\)", l)); //[b:a](d:c)
list_delete(l);If the second alternative (\((.*?):(.*?)\)) is encountered, it determines the replacement index is 2 (because of the number of previous unmatched groups), so the replacement has to be padded with "" (which will never be selected alone).
The most powerful type of replacement is using a function to generate the replacement string. str_replace_func accepts a public function that will be called for every occurence of the pattern inside the string, passing the values of all groups to it:
forward String:regex_func(gr1[], gr1_size);
public String:regex_func(gr1[], gr1_size)
{
new tmp = gr1[1];
gr1[1] = gr1[0], gr1[0] = tmp;
return str_new(gr1);
}
main()
{
print_s(str_replace_func(@("abcd"), "..", pawn_nameof(regex_func))); //badc
}The function is provided with all groups from the pattern, together with their lengths. The first group is the whole match. The function is supposed to return a valid dynamic string. More complex operations can be performed on the string, such as converting to uppercase.
forward String:regex_to_upper(gr1[], gr1_size);
public String:regex_to_upper(gr1[], gr1_size)
{
return str_to_upper(str_new_arr(gr1, 1)) + str_new(gr1[1]);
}
main()
{
print_s(str_replace_func(@("apple banana orange"), "[[:alpha:]]+", pawn_nameof(regex_to_upper))); //Apple Banana Orange
}