-
-
Notifications
You must be signed in to change notification settings - Fork 211
Implemented Initial Query Parser for Search Interfaces #2114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary by CodeRabbit
WalkthroughAdds a new pyparsing-based query parser module that converts GitHub-like search expressions into structured condition dictionaries, adds pyparsing to Poetry dependencies, and introduces unit tests covering parsing, schema validation, operators, dates, booleans, strict mode, and case sensitivity. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Pre-merge checks (4 passed, 1 warning)❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal). Please share your feedback with us on this Discord post. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 11
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/tests/apps/common/search/query_parser_test.py (1)
159-188
: Final set of incorrectly nested test methods.
test_boundary_conditions_and_edge_cases
,test_long_field_names
, andtest_numeric_values
are all incorrectly nested.Extract and fix all nested test methods:
- def test_boundary_conditions_and_edge_cases(self): - @pytest.mark.parametrize( - ("query", "expected"), - [ - ("author:", {"field": "query", "type": "string", "string": "author:"}), - (":123", {"field": "query", "type": "string", "string": ":123"}), - ('"field: true"', {"field": "query", "type": "string", "boolean": True}), - ("2341", {"field": "query", "type": "string", "string": "2341"}), - ], - ) - def test_boundary_conditions_and_edge_cases(self, query, expected): - result = self.parser.parse(query) - assert result == [expected] + @pytest.mark.parametrize( + ("query", "expected_string"), + [ + ("author:", "author:"), + (":123", ":123"), + ('"field: true"', "field: true"), + ("2341", "2341"), + ], + ) + def test_boundary_conditions(self, query, expected_string): + result = self.parser.parse(query) + assert len(result) == 1 + assert result[0]["field"] == "query" + assert result[0]["type"] == "string" + assert result[0]["string"] == expected_string - def test_long_field_names(): - long_field_parser = QueryParser(allowed_fields={"verylongfieldname": "string"}) - long_value = "a" * 1000 - result = long_field_parser.parse(f'verylongfieldname:"{long_value}"') - assert isinstance(result, list) - assert result[0]["string"] == long_value + def test_long_field_names_and_values(self): + long_field_parser = QueryParser(allowed_fields={"verylongfieldname": "string"}) + long_value = "a" * 1000 + result = long_field_parser.parse(f'verylongfieldname:"{long_value}"') + assert isinstance(result, list) + assert result[0]["string"] == long_value - @pytest.mark.parametrize("val", ["0", "-1", "999999", "3.14159"]) - def test_numeric_values(self, val): - result = self.parser.parse(f"stars:{val}") - assert len(result) == 1 - assert result[0]["type"] == "number" - assert result[0]["field"] == "stars" - assert result[0]["number"] == str(int(float(val))) - assert result[0]["op"] == "=" + @pytest.mark.parametrize("val", ["0", "-1", "999999", "3.14159"]) + def test_numeric_values(self, val): + result = self.parser.parse(f"stars:{val}") + assert len(result) == 1 + assert result[0]["type"] == "number" + assert result[0]["field"] == "stars" + assert result[0]["number"] == str(int(float(val))) + assert result[0]["op"] == "="
🧹 Nitpick comments (2)
backend/apps/common/search/query_parser.py (1)
443-445
: Redundant empty value checks after tokenization.The empty string checks on Lines 441-442 and 444-445 seem redundant since
_split_tokens
should not produce empty tokens, and_parse_token
should handle the parsing appropriately.Consider removing redundant checks or documenting why they're necessary:
for token in tokens: - if not token or token.strip() == "": - continue field, raw_value = self._parse_token(token) - if raw_value is None or raw_value.strip() == "": + if not raw_value: continuebackend/tests/apps/common/search/query_parser_test.py (1)
31-32
: Test expectation message doesn't match the actual error.The test expects "Expected end of text" but tests for an invalid field name with dashes. Consider using a more descriptive assertion.
- with pytest.raises(Exception, match="Expected end of text"): + with pytest.raises(QueryParserError, match="must be alphanumeric"): QueryParser(allowed_fields={"field-with-dash": "string"})
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
backend/poetry.lock
is excluded by!**/*.lock
📒 Files selected for processing (3)
backend/apps/common/search/query_parser.py
(1 hunks)backend/pyproject.toml
(1 hunks)backend/tests/apps/common/search/query_parser_test.py
(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
backend/tests/apps/common/search/query_parser_test.py (1)
backend/apps/common/search/query_parser.py (3)
QueryParser
(76-469)QueryParserError
(28-73)parse
(422-469)
🔇 Additional comments (1)
backend/pyproject.toml (1)
50-50
: LGTM! Dependency addition aligns with PR objectives.The addition of
pyparsing = "^3.2.3"
is appropriate for implementing the query parser module. This is a well-established parsing library that aligns with the suggestions from issue #2038.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Nitpick comments (5)
backend/apps/common/search/query_parser.py (3)
166-176
: Field name validation relies on truthiness of ParseResults; wrap ParseException for consistent error types.
parseString(..., parseAll=True)
raisesParseException
on invalid names; the currentif not ...
check is unreachable. Wrap both default_field and field validations to raiseFIELD_NAME_ERROR
consistently.Apply this diff:
- if not QueryParser._FIELD_NAME.parseString(default_field, parseAll=True): - QueryParser._raise_invalid_field_name(default_field) + try: + QueryParser._FIELD_NAME.parseString(default_field, parseAll=True) + except ParseException: + QueryParser._raise_invalid_field_name(default_field) @@ - if not QueryParser._FIELD_NAME.parseString(field, parseAll=True): - QueryParser._raise_invalid_field_name(field) + try: + QueryParser._FIELD_NAME.parseString(field, parseAll=True) + except ParseException: + QueryParser._raise_invalid_field_name(field)Also applies to: 170-176
53-63
: Defaultfield
on QueryParserError forces “Field: ''” in messages.Using "" makes
__str__
always include a field line. Prefer empty string so the field line only appears when provided.Apply this diff:
- field: str = "<field>", + field: str = "", @@ - if self.field: + if self.field: parts.append(f"Field: '{self.field}'")Also applies to: 31-37
191-213
: Tokenization via Regex doesn’t handle escaped quotes inside quoted strings.Example:
project:"OWASP \"ZAP\" Guide"
will split incorrectly. Consider replacing_split_tokens
with a pyparsing-based tokenizer that reuses_QUOTED_VALUE
and_TOKEN_PATTERN
, or, at minimum, extend the regex to account for escaped quotes.I can provide a consolidated grammar (
OneOrMore(Group(_FIELD_NAME + ':' + _FIELD_VALUE) | _QUOTED_VALUE | Word(…))
) to replace_split_tokens
if you’d like.backend/tests/apps/common/search/query_parser_test.py (2)
165-167
: Incorrect expected shape for quoted free text.
'"field: true"'
becomes a free-text token; the parser emits{"type": "string", "field": "query", "string": "field: true"}
— not a boolean. Adjust the expectation.Apply this diff:
- ('"field: true"', {"field": "query", "type": "string", "boolean": True}), + ('"field: true"', {"field": "query", "type": "string", "string": "field: true"}),
27-33
: Avoid asserting on pyparsing’s internal error text.
"Expected end of text"
is parser-specific and brittle. Prefer asserting the raised type and/or your error_type.Example:
- with pytest.raises(Exception, match="Expected end of text"): + with pytest.raises(QueryParserError) as ei: QueryParser(allowed_fields={"field-with-dash": "string"}) + assert ei.value.error_type in {"FIELD_NAME_ERROR", "CONFIGURATION_ERROR"}
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
backend/poetry.lock
is excluded by!**/*.lock
📒 Files selected for processing (3)
backend/apps/common/search/query_parser.py
(1 hunks)backend/pyproject.toml
(1 hunks)backend/tests/apps/common/search/query_parser_test.py
(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
backend/tests/apps/common/search/query_parser_test.py (1)
backend/apps/common/search/query_parser.py (3)
QueryParser
(76-469)QueryParserError
(28-73)parse
(422-469)
🔇 Additional comments (3)
backend/pyproject.toml (1)
50-50
: CI environment mismatch & import failure — manual verification neededThe quick import test on your runner reported Python 3.11.2 and failed to find the
pyparsing
module, so we can’t confirm compatibility with Python 3.13 yet. Please verify in your CI:
That the test matrix actually installs and uses Python 3.13 (e.g., via a
python3.13
job).That
pyparsing==3.2.3
installs without requiring extra wheels (runpip install pyparsing==3.2.3
).That a simple import check passes under Python 3.13:
#!/bin/bash set -e python3.13 --version python3.13 - <<'PY' import sys print("Python:", sys.version) import pyparsing as pp print("pyparsing:", pp.__version__) PYbackend/apps/common/search/query_parser.py (1)
451-456
: Field name normalization is always case-insensitive; confirm intended behavior.Even with
case_sensitive=True
, field names are lowercased before lookup, making field matching case-insensitive by design. If the PR intent was “case sensitivity for values only,” this is fine; if field names should honor case in strict mode, we’ll need a change.Would you like field-name case sensitivity to be enforced when
case_sensitive=True
and/orstrict=True
? I can adjust lookup accordingly.backend/tests/apps/common/search/query_parser_test.py (1)
180-188
: Numeric default operator assertion matches intended behavior.This test verifies that
stars:<val>
defaults to the “=” operator and that the numeric value is normalized to an integer representation, which aligns with the intended parser behavior.Please confirm locally—once the parser fix is merged—that this test passes in your environment by running:
python3 -m pytest backend/tests/apps/common/search/query_parser_test.py::TestQueryParser::test_numeric_values -q
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bots have some suggestions to address -- please take a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (4)
backend/apps/common/search/query_parser.py (4)
99-111
: Fix grammatical error in docstring.Line 109 contains a typo that was previously flagged in review comments.
Apply this diff to fix the grammatical error:
- - Only supports implicit AND logic between conditions + - Only supports implicit AND logic between conditions
262-283
: Update docstring to match actual behavior.The docstring on Line 270 still contains the previously flagged inconsistency about the return value when operator is None.
Update the docstring to match the actual behavior:
- Tuple of (operator, clean_value), if operator is none then ("=", clean_value) + Tuple of (operator, clean_value), defaulting to ("=", clean_value) if no operator
308-333
: Consider returning actual boolean values instead of strings.Based on previous review comments, this method returns string representations ("True"/"False") instead of actual boolean values, which could be confusing for API consumers.
Consider returning actual boolean values:
def _parse_boolean_value(field: str, value: str) -> bool: - """Parse boolean value from string and return its string representation. + """Parse boolean value from string. Returns: - String representation of boolean value ("True" or "False") + Boolean valueif value.lower() in QueryParser._BOOLEAN_TRUE_VALUES: - return "True" + return True if value.lower() in QueryParser._BOOLEAN_FALSE_VALUES: - return "False" + return False
450-489
: Potential issue with case-sensitive handling affecting field values.The current implementation converts the entire query to lowercase when
case_sensitive
is False, which affects both field names and field values. This was flagged in previous reviews as potentially undesirable for string searches where case preservation might be important.Consider only normalizing field names while preserving field values:
conditions = [] query = query.strip() - if not self.case_sensitive: - query = query.lower() tokens = self._split_tokens(query) for token in tokens: field, raw_value = self._parse_token(token) if field is None: conditions.append(self._create_text_search_condition(raw_value)) continue - field = field.lower() # field name normalization + # Normalize field name based on case sensitivity setting + if not self.case_sensitive: + field = field.lower()
🧹 Nitpick comments (1)
backend/apps/common/search/query_parser.py (1)
387-420
: Update to handle boolean return type change.If the
_parse_boolean_value
method is changed to return actual booleans, this method will need to be updated accordingly.If you change
_parse_boolean_value
to return booleans, update this line:- condition_dict["boolean"] = self._parse_boolean_value(field, raw_value) + condition_dict["boolean"] = str(self._parse_boolean_value(field, raw_value))
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
backend/apps/common/search/query_parser.py
(1 hunks)backend/tests/apps/common/search/query_parser_test.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- backend/tests/apps/common/search/query_parser_test.py
🔇 Additional comments (17)
backend/apps/common/search/query_parser.py (17)
1-2
: LGTM!The module docstring clearly describes the purpose of this query parser module.
5-17
: LGTM!All necessary pyparsing imports are included, and the import organization is clean and logical.
20-27
: LGTM!The FieldType enum properly extends both str and Enum, which provides string compatibility while maintaining type safety. The supported field types (STRING, NUMBER, DATE, BOOLEAN) align with the PR objectives.
29-49
: LGTM!This utility function provides a clean way to check enum membership without relying on private attributes. The exception handling approach is appropriate and follows best practices.
51-97
: LGTM!The QueryParserError class provides comprehensive error context with structured information for API responses. The implementation includes proper error categorization and contextual information that will be helpful for debugging and user feedback.
112-124
: LGTM!The pyparsing grammar definitions are well-structured and comprehensive. The use of class-level constants provides good organization and reusability. The boolean value sets cover common representations appropriately.
125-134
: LGTM!The field name validation logic is sound, combining pyparsing validation with a lowercase check. This ensures field names follow the expected alphanumeric + underscore pattern and maintains consistency with the lowercase requirement.
135-169
: LGTM!The constructor provides good initialization with comprehensive error handling. The parameter validation and exception wrapping ensure clear error messages for configuration issues.
170-179
: LGTM!The static helper methods for raising specific error types provide clean separation of concerns and consistent error message formatting.
180-210
: LGTM!The field validation logic correctly uses the
is_in_enum
helper function, which addresses the previous review concern about using private enum attributes. The validation is thorough and includes both field names and field types.
211-244
: LGTM!The tokenization logic properly handles quoted strings and various token types using regular expressions. The pyparsing integration provides robust parsing with appropriate error handling.
245-261
: LGTM!The token parsing method correctly extracts field:value pairs and gracefully handles non-matching tokens by returning None for the field name.
284-307
: LGTM!The string parsing implementation correctly handles both quoted and unquoted strings as mentioned in the past review comments. The approach preserves quotes for quoted strings while validating unquoted values against the restricted character set.
334-361
: LGTM!The numeric value parsing correctly handles comparison operators and converts string values to integers. The error handling provides appropriate context for debugging.
362-386
: LGTM!The date parsing implementation validates the YYYY-MM-DD format and handles comparison operators appropriately. The error handling catches various exception types that could occur during parsing.
421-432
: LGTM!The text search condition creation is straightforward and properly maps unmatched tokens to the default field.
433-449
: LGTM!The unknown field handling respects the strict mode setting, raising errors only when strict validation is enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (7)
backend/tests/apps/common/search/query_parser_test.py (7)
26-28
: Reduce coupling to private internals (BOOLEAN and COMPARISON*).*Tests should not depend on private attributes; it makes them brittle if internals are renamed/refactored. Hardcode the public contract here instead.
Apply this diff:
- self.boolean_values = self.parser._BOOLEAN_TRUE_VALUES | self.parser._BOOLEAN_FALSE_VALUES - self.true_boolean_values = self.parser._BOOLEAN_TRUE_VALUES - self.comparison_operators = self.parser._COMPARISON_OPERATORS + # Avoid coupling to private attributes; freeze the supported sets locally. + self.boolean_values = {"true", "1", "yes", "on", "false", "0", "no", "off"} + self.true_boolean_values = {"true", "1", "yes", "on"} + self.comparison_operators = ["<", "<=", ">", ">=", "="]
106-117
: Rename for clarity and assert error type in strict mode.These are “unknown field” cases, not malformed tokens. Also assert
UNKNOWN_FIELD_ERROR
in strict mode to lock in the API contract.Apply this diff:
- malformed_queries = ['not_field:"operation"', "not_field:operation"] + unknown_field_queries = ['not_field:"operation"', "not_field:operation"] - @pytest.mark.parametrize("query", malformed_queries) - def test_malformed_queries(self, query): + @pytest.mark.parametrize("query", unknown_field_queries) + def test_unknown_field_non_strict_drops_token(self, query): result = self.parser.parse(query) assert not result - @pytest.mark.parametrize("query", malformed_queries) - def test_invalid_queries(self, query): - with pytest.raises(QueryParserError): - self.strict_parser.parse(query) + @pytest.mark.parametrize("query", unknown_field_queries) + def test_unknown_field_strict_raises(self, query): + with pytest.raises(QueryParserError) as ei: + self.strict_parser.parse(query) + assert ei.value.error_type == "UNKNOWN_FIELD_ERROR"
118-124
: Nit: also assert default operator for valid dates.This locks in the implicit “=” default for dates in the public contract.
Apply this diff:
for date_str in valid_dates: result = self.parser.parse(f"created:{date_str}") assert len(result) == 1 assert result[0]["date"] == date_str + assert result[0]["op"] == "="
125-129
: Strengthen strict-mode expectations for invalid dates.Also assert the specific error type to avoid regressions in error taxonomy.
Apply this diff:
- for inv_date_str in invalid_dates: - with pytest.raises(QueryParserError, match="Invalid date value"): - self.strict_parser.parse(f"created:{inv_date_str}") + for inv_date_str in invalid_dates: + with pytest.raises(QueryParserError, match="Invalid date value") as ei: + self.strict_parser.parse(f"created:{inv_date_str}") + assert ei.value.error_type == "DATE_VALUE_ERROR"
134-137
: Avoid duplicate coverage; add a more meaningful init failure case.This duplicates Line 31’s invalid type check. Consider replacing it with invalid default_field validation.
Apply this diff:
- def test_invalid_initialization(self): - with pytest.raises(QueryParserError): - QueryParser(allowed_fields={"field": "invalid_type"}) + def test_invalid_default_field_name(self): + # Default field must be lowercase alphanumeric/underscore + with pytest.raises(QueryParserError) as ei: + QueryParser(allowed_fields={"test": "string"}, default_field="Default") + assert ei.value.error_type == "FIELD_NAME_ERROR"
159-165
: Add a couple more assertions for completeness.Assert the parsed field and type as well to ensure no accidental regressions.
Apply this diff:
result = long_field_parser.parse(f'verylongfieldname:"{long_value}"') assert isinstance(result, list) - assert result[0]["string"] == '"' + long_value + '"' + assert result[0]["string"] == '"' + long_value + '"' + assert result[0]["field"] == "verylongfieldname" + assert result[0]["type"] == "string"
1-174
: Optional: add a few targeted tests to harden edge cases.If helpful, I can open a follow-up patch with these additions:
# Add alongside existing tests def test_date_comparison_operators(self): for op in ["<", "<=", ">", ">=", "="]: result = self.parser.parse(f"created:{op}2024-01-01") assert len(result) == 1 assert result[0]["field"] == "created" assert result[0]["type"] == "date" assert result[0]["op"] == op assert result[0]["date"] == "2024-01-01" def test_invalid_number_value_non_strict_is_dropped(self): result = self.parser.parse("stars:abc") assert result == [] def test_invalid_number_value_strict_raises(self): with pytest.raises(QueryParserError) as ei: self.strict_parser.parse("stars:abc") assert ei.value.error_type == "NUMBER_VALUE_ERROR" def test_invalid_boolean_value_non_strict_is_dropped(self): result = self.parser.parse("archived:maybe") assert result == [] def test_invalid_boolean_value_strict_raises(self): with pytest.raises(QueryParserError) as ei: self.strict_parser.parse("archived:maybe") assert ei.value.error_type == "BOOLEAN_VALUE_ERROR" def test_escaped_quotes_in_quoted_string(self): result = self.parser.parse(r'author:"John \"JD\" Doe"') assert len(result) == 1 assert result[0]["field"] == "author" assert result[0]["type"] == "string" # Parser preserves quotes and escape sequences for strings assert result[0]["string"] == r'"john \"jd\" doe"'
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
backend/apps/common/search/query_parser.py
(1 hunks)backend/tests/apps/common/search/query_parser_test.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- backend/apps/common/search/query_parser.py
🧰 Additional context used
🧬 Code graph analysis (1)
backend/tests/apps/common/search/query_parser_test.py (1)
backend/apps/common/search/query_parser.py (3)
QueryParser
(98-525)QueryParserError
(50-95)parse
(158-197)
🔇 Additional comments (8)
backend/tests/apps/common/search/query_parser_test.py (8)
39-55
: Parametrized basic string parsing looks solid.Good coverage of quoted value normalization under case-sensitive vs. insensitive parsers. Assertions target the canonical lowercase field and expected string casing correctly.
56-61
: Verify intended semantics for “case-sensitive field matching.”Current implementation lowercases field names during tokenization regardless of the
case_sensitive
flag, so fields are effectively case-insensitive in all modes. The PR objectives mention “optional case-sensitive field matching.” If field-case sensitivity is a requirement, tests should assert failures for mismatched case undercase_sensitive=True
(and the parser likely needs an update). Otherwise, consider clarifying in docs/tests that case sensitivity applies only to values/free text, not field keys.
62-70
: Numeric comparison operator coverage is correct and concise.The loop-based assertions exercise all supported operators and validate normalization to integer string.
71-79
: Boolean normalization checks are accurate.Asserting "True"/"False" based on canonical true/false sets matches the parser’s behavior.
80-105
: Complex/mixed queries test reads well and matches spec.Nice validation of: quoted strings, numeric op/value, boolean, and default-field mapping for free text.
138-147
: Custom default field behavior is validated correctly.Confirms that unknown field tokens are dropped in non-strict mode and free text maps to the configured default field.
148-158
: Boundary cases are sensible and assertions match spec.Treating empty or non key:value tokens (including quoted colon content) as free text is correct.
166-174
: Numeric value normalization and default operator checks look good.Covers negative, large, and float inputs with truncation semantics and implicit "=".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
backend/tests/apps/common/search/query_parser_test.py (2)
155-163
: Test boundary conditions comprehensively.The test covers various malformed inputs but could be more comprehensive. Consider adding edge cases like empty strings, whitespace-only strings, and special characters.
@pytest.mark.parametrize( ("query"), - ["author:", ":123", '"field: true"', "2341"], + ["author:", ":123", '"field: true"', "2341", "", " ", "field:"], ) def test_boundary_conditions(self, query): result = self.parser.parse(query) + if not query.strip(): + assert not result + return assert len(result) == 1 assert result[0]["field"] == "query" assert result[0]["type"] == "string" assert result[0]["string"] == query
174-181
: Confirm float–to–int truncation logic matches testVerified that in
backend/apps/common/search/query_parser.py
,_parse_number_value
usesint(float(clean_value))
, which truncates any decimal portion (e.g."3.14159"
→3
). The test inbackend/tests/apps/common/search/query_parser_test.py
(lines 174–181) asserts againststr(int(float(val)))
, so it accurately reflects the current implementation.• If only integer semantics are intended, no change is needed.
• If full–precision floats or a different rounding behavior should be supported, please update:
_parse_number_value
inbackend/apps/common/search/query_parser.py
- Corresponding assertions in
backend/tests/apps/common/search/query_parser_test.py
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
backend/tests/apps/common/search/query_parser_test.py
(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
backend/tests/apps/common/search/query_parser_test.py (1)
backend/apps/common/search/query_parser.py (3)
QueryParser
(98-525)QueryParserError
(50-95)parse
(158-197)
🪛 Ruff (0.12.2)
backend/tests/apps/common/search/query_parser_test.py
124-124: Single quotes found but double quotes preferred
Replace single quotes with double quotes
(Q000)
🔇 Additional comments (5)
backend/tests/apps/common/search/query_parser_test.py (5)
1-182
: Address previous review feedback about nested test functions.Based on the past review comments, there were originally nested test functions inside other test methods that wouldn't be discovered by pytest. This version appears to have addressed those issues by flattening the test structure, which is excellent.
6-29
: Well-structured test class with comprehensive setup.The test class is well-organized with clear setup of multiple parser configurations (standard, case-sensitive, strict, case-sensitive-strict) and test data. This provides good coverage of different parser behaviors.
71-78
: Robust boolean value testing.The test correctly validates boolean parsing across various representations and properly checks the normalized output format ("True"/"False" strings).
102-104
: Comprehensive complex query testing.The parametrized test cases effectively validate mixed content scenarios with different field types, quoted strings, and free text tokens in a single query.
144-152
: Ignore incorrect test logic concernThe test correctly reflects the parser’s behavior: any token prefixed by an unknown field (here, “query:test”) is dropped (via
_handle_unknown_field
) and only true free-text tokens (free_text
) are mapped todefault_field
. The implementation adds thedefault_field
only for tokens without a colon, so the test’s expectation is accurate.Likely an incorrect or invalid review comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @Adarsh0427
I have suggestions and some questions about the implementation:
value: String value to parse | ||
|
||
Returns: | ||
String representation of boolean value ("True" or "False") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see returning str() form of True/False is by design. Could you add more details about this, maybe a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to this Returns: String ("True" or "False") based on the boolean input value.
.
Is this statement fine ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I expected something that explains the decision to return strings instead of booleans. Basically, why do you return "True" instead of True
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def _parse_boolean_value(value: str) -> str:
"""Parse boolean string to standardized string format.
Converts boolean-like strings to "True" or "False" for consistent
storage in dictionaries without needing type conversion.
Args:
value: Boolean string to parse
Returns:
"True" or "False" as string
Raises:
QueryParserError: If value is not a valid boolean
"""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'author:"John Doe" stars:>100 archived:false some "free text"'
...
{"type": "string", "field": "author", "string": '"john doe"'},
{"type": "number", "field": "stars", "op": ">", "number": "100"},
{"type": "boolean", "field": "archived", "boolean": "False"},
{"type": "string", "field": "query", "string": "some"},
{"type": "string", "field": "query", "string": '"free text"'},
I think it should return naturally expected type/value pairs, e.g. 100
instead of "100"
for number, True
/ False
instead of "True"/"False" for boolean, string stays as is.
Any suggestions why it might be not a good idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I need to change the code to store type/value pairs
?
The structure of return
type of query_parser
would be List[Dict[str, str | int | bool]]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it could be unified with type/field/op/value
instead of type/field/op/[string/boolean/number]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got your point, so the structure of result should be:
{
"type": "str",
"field": "str",
"op": "str",
"value": "[str | bool | int]"
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, value
may be one of str, bool, int consistent with type
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the parser behavior to return value with its actual data type (int, bool, str) and also set an overflow for integer below 0 or above 1e32
.
defined dataclass for QueryCondition
:
@dataclass
class QueryCondition:
"""Dataclass representing a parsed query condition."""
type: str
field: str
value: str | int | bool
op: str | None = None
def to_dict(self) -> dict:
"""Convert the QueryCondition to a dictionary representation."""
data = asdict(self)
if data.get("op") is None:
data.pop("op")
return data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
backend/apps/common/search/query_parser.py (1)
171-176
: Fix case-sensitivity: preserve values; only normalize field names when requested.Lowercasing the entire query (Line 174) and always lowercasing parsed field names (Line 385) breaks case-sensitive value searches and also makes
case_sensitive=True
ineffective for fields. Normalize only field names whencase_sensitive=False
, and keep values intact.@@ - query = query.strip() - if not self.case_sensitive: - query = query.lower() - tokens = self._split_tokens(query) + query = query.strip() + tokens = self._split_tokens(query) @@ - field, raw_value = self._parse_token(token) + field, raw_value = self._parse_token(token) if field is None: conditions.append(self._create_text_search_condition(raw_value)) continue - - if field not in self.field_schema: - self._handle_unknown_field(field) - continue - - field_type = self.field_schema[field] - condition = self.create_condition(field, field_type, raw_value) + # Normalize field name depending on case sensitivity + normalized_field = field if self.case_sensitive else field.lower() + if normalized_field not in self.field_schema: + self._handle_unknown_field(normalized_field) + continue + field_type = self.field_schema[normalized_field] + condition = self.create_condition(normalized_field, field_type, raw_value) if condition: conditions.append(condition)@@ - field, value = token.split(":", 1) - field = field.strip().lower() + field, value = token.split(":", 1) + field = field.strip()Also applies to: 384-387
🧹 Nitpick comments (4)
backend/apps/common/search/query_parser.py (4)
116-119
: Avoid passing a set to pyparsing.oneOf; use a deterministic literal list.Sets are unordered; prefer a stable sequence or a whitespace-delimited string.
- _COMPARISON_OPERATORS = {">", "<", ">=", "<=", "="} - _COMPARISON_OPERATOR = Optional(oneOf(_COMPARISON_OPERATORS), default="=") + _COMPARISON_OPERATORS = (">=", "<=", ">", "<", "=") + _COMPARISON_OPERATOR = Optional(oneOf(">= <= > < ="), default="=")
503-508
: Do not silently truncate numeric values; reject non-integers instead.
int(float(clean_value))
turns “10.9” into 10. If only integers are supported, fail fast; if decimals are allowed, return them explicitly without truncation.- operator, clean_value = QueryParser._parse_comparison_pattern(value) - numeric_value = int(float(clean_value)) - return operator, str(numeric_value) + operator, clean_value = QueryParser._parse_comparison_pattern(value) + # Accept only integers to avoid silent truncation of floats + numeric_value = int(clean_value) + return operator, str(numeric_value)Please confirm desired behavior for inputs like
score:>=10.5
orsize:1e3
; adjust parsing accordingly if decimals/scientific notation must be supported.
101-109
: Clarify docstring phrasing about case sensitivity.Make it explicit that only field-name matching is normalized and values are preserved.
- - Handle case sensitivity/insensitivity values + - Configurable case sensitivity for field names; values are preserved
216-237
: Output shape diverges from PR objective (“value” field vs type-specific keys).You emit type-specific keys (
"string"
,"number"
,"date"
,"boolean"
) instead of a generic"value"
alongside"type"
. If downstream expects"value"
, this is a contract mismatch.Options:
- Keep current shape but document it clearly in the API contract and tests.
- Or emit
"value"
with the canonical string representation and keep"type"
for disambiguation. We can provide a small adapter if you prefer not to change the parser’s internals.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
backend/poetry.lock
is excluded by!**/*.lock
📒 Files selected for processing (2)
backend/apps/common/search/query_parser.py
(1 hunks)backend/tests/apps/common/search/query_parser_test.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- backend/tests/apps/common/search/query_parser_test.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-22T14:42:54.652Z
Learnt from: Adarsh0427
PR: OWASP/Nest#2114
File: backend/tests/apps/common/search/query_parser_test.py:120-121
Timestamp: 2025-08-22T14:42:54.652Z
Learning: The query parser in backend/apps/common/search/query_parser.py can gracefully handle unclosed quotes in field values. For example, `created:"2024-12-01` (missing closing quote) is parsed correctly to produce `{'field': 'created', 'type': 'date', 'op': '=', 'date': '2024-12-01'}`. Test data with unclosed quotes like `'"2023-12-31'` is intentional to verify this robust parsing behavior.
Applied to files:
backend/apps/common/search/query_parser.py
🧬 Code graph analysis (1)
backend/apps/common/search/query_parser.py (1)
backend/apps/ai/agent/tools/rag/rag_tool.py (1)
query
(34-63)
🔇 Additional comments (3)
backend/apps/common/search/query_parser.py (3)
420-437
: LGTM: default operator handling is correct and consistent.
Optional(..., default="=")
plus normalization in_parse_comparison_pattern
achieves the intended default “=”. Good cleanup.
119-120
: LGTM: YYYYMMDD support and normalization.Supporting both
YYYY-MM-DD
andYYYYMMDD
with_normalize_date()
meets the requirement and keeps output consistent.Also applies to: 404-417
221-237
: LGTM: clean use of structural pattern matching.The
match
onFieldType
is concise and readable.
Updated the field name regex pattern and modified the return logic in the field name validation method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
backend/apps/common/search/query_parser.py (1)
343-347
: Support escaped quotes inside quoted values.Current regex breaks on
\"
(e.g., project:"OWASP "ZAP""). Allow escapes.- reg_components = { - "key_value": r'\S+:"[^"]*"', - "quoted_string": r'"[^"]*"', - "unquoted_word": r"\S+", - } + reg_components = { + "key_value": r'\S+:"([^"\\]|\\.)*"', + "quoted_string": r'"([^"\\]|\\.)*"', + "unquoted_word": r"\S+", + }
🧹 Nitpick comments (3)
backend/apps/common/search/query_parser.py (3)
14-14
: Remove unused import (ruff F401).
alphas
is imported but never used.-from pyparsing import ( +from pyparsing import ( Group, Optional, ParseException, QuotedString, Regex, Word, ZeroOrMore, alphanums, - alphas, oneOf, )
101-109
: Docstring tweak for clarity on case sensitivity.Clarify that only field names are normalized in case-insensitive mode; values are preserved.
- - Handle case sensitivity/insensitivity values + - Case sensitivity: normalize field names only when disabled; preserve values
116-119
: Make operator grammar deterministic; remove redundant set.Passing a set to
oneOf
is unnecessary and order-dependent; define choices inline and drop_COMPARISON_OPERATORS
.- _COMPARISON_OPERATORS = {">", "<", ">=", "<=", "="} - _COMPARISON_OPERATOR = Optional(oneOf(_COMPARISON_OPERATORS), default="=") + _COMPARISON_OPERATOR = Optional(oneOf(">= <= > < ="), default="=")
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
backend/apps/common/search/query_parser.py
(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-22T14:42:54.652Z
Learnt from: Adarsh0427
PR: OWASP/Nest#2114
File: backend/tests/apps/common/search/query_parser_test.py:120-121
Timestamp: 2025-08-22T14:42:54.652Z
Learning: The query parser in backend/apps/common/search/query_parser.py can gracefully handle unclosed quotes in field values. For example, `created:"2024-12-01` (missing closing quote) is parsed correctly to produce `{'field': 'created', 'type': 'date', 'op': '=', 'date': '2024-12-01'}`. Test data with unclosed quotes like `'"2023-12-31'` is intentional to verify this robust parsing behavior.
Applied to files:
backend/apps/common/search/query_parser.py
🧬 Code graph analysis (1)
backend/apps/common/search/query_parser.py (1)
backend/apps/ai/agent/tools/rag/rag_tool.py (1)
query
(34-63)
🪛 Ruff (0.12.2)
backend/apps/common/search/query_parser.py
14-14: pyparsing.alphas
imported but unused
Remove unused import: pyparsing.alphas
(F401)
🔇 Additional comments (6)
backend/apps/common/search/query_parser.py (6)
119-119
: Nice: YYYYMMDD support.Adding compact date format with normalization is aligned with review feedback.
237-243
: Graceful non-strict parsing is correct.Returning
None
on value errors and skipping tokens in non-strict mode matches the PR objective.
257-273
: Unknown-field handling matches strict mode semantics.
445-465
: String parsing behavior is consistent with quoted/unquoted rules.Preserving quotes for quoted values and validating unquoted tokens via
_UNQUOTED_VALUE
is clear and robust.
467-487
: Boolean parsing returns strings by design—docstring clarifies this.The implementation and documentation are consistent.
490-509
: Reject non-integer values or fully support floats in numeric parsing
The current implementation usesint(float(clean_value))
, silently truncating decimals (e.g."1.9"
→1
). Either enforce integers only or parse floats end-to-end. Which do you prefer? For integer-only fields, for example, you can add:+import re @@ -506,3 +506,9 @@ operator, clean_value = QueryParser._parse_comparison_pattern(value) - numeric_value = int(float(clean_value)) - return operator, str(numeric_value) + if not re.fullmatch(r"[+-]?\\d+", clean_value): + raise QueryParserError( + message="Invalid numeric value: integers only", + error_type="NUMBER_VALUE_ERROR", + ) + return operator, clean_value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
backend/apps/common/search/query_parser.py (1)
341-355
: Support escaped quotes in tokenization (e.g., project:"OWASP "ZAP"").Current patterns break on escaped quotes. Accept standard backslash escapes in quoted segments.
@@ - reg_components = { - "key_value": r'\S+:"[^"]*"', - "quoted_string": r'"[^"]*"', - "unquoted_word": r"\S+", - } + reg_components = { + # Support escaped quotes inside quotes: \" + "key_value": r'\S+:"([^"\\]|\\.)*"', + "quoted_string": r'"([^"\\]|\\.)*"', + "unquoted_word": r"\S+", + }
🧹 Nitpick comments (1)
backend/apps/common/search/query_parser.py (1)
283-286
: Tiny cleanup: reuse _remove_quotes in empty-string check.Minor simplification; avoids repeated strip chains.
- return token.strip().strip('"').strip() == "" + return QueryParser._remove_quotes(token).strip() == ""
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
backend/apps/common/search/query_parser.py
(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-22T14:42:54.652Z
Learnt from: Adarsh0427
PR: OWASP/Nest#2114
File: backend/tests/apps/common/search/query_parser_test.py:120-121
Timestamp: 2025-08-22T14:42:54.652Z
Learning: The query parser in backend/apps/common/search/query_parser.py can gracefully handle unclosed quotes in field values. For example, `created:"2024-12-01` (missing closing quote) is parsed correctly to produce `{'field': 'created', 'type': 'date', 'op': '=', 'date': '2024-12-01'}`. Test data with unclosed quotes like `'"2023-12-31'` is intentional to verify this robust parsing behavior.
Applied to files:
backend/apps/common/search/query_parser.py
🔇 Additional comments (4)
backend/apps/common/search/query_parser.py (4)
115-117
: Good: default operator is “=” via grammar.Using Optional(oneOf(...), default="=") simplifies downstream logic and fixes “missing ‘=’” cases.
117-117
: Good: YYYYMMDD support added.Accepting both YYYY-MM-DD and YYYYMMDD meets reviewer request and normalizing via _normalize_date keeps output consistent.
235-241
: Non-strict mode resilience is correct.Gracefully skipping invalid tokens (returning None) unless strict=True aligns with the PR’s robustness goal.
294-298
: Config validation: message clarity and schema normalization look good.Clear error typing and enum validation via has_enum_value avoid private enum internals; default_field normalization is sensible.
Also applies to: 320-329
hey @Adarsh0427 ! Could you resolve conflicts on this one? thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more comments:
- please revisit
_parse_boolean_value
return value type - also address/resolve coderabbit's comments (the ready for review code should have only human authored comments or bot's comments that you couldn't resolve yourself)
value: String value to parse | ||
|
||
Returns: | ||
String representation of boolean value ("True" or "False") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I expected something that explains the decision to return strings instead of booleans. Basically, why do you return "True" instead of True
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
backend/apps/common/search/query_parser.py (2)
100-108
: Fix case-sensitivity behavior to match docs; stop lowercasing entire query (mutates values).Doc says “normalize field names only, preserve original value casing,” but parse() lowercases the whole query. This alters string/date values and contradicts the stated behavior.
Apply:
@@ - query = query.strip() - if not self.case_sensitive: - query = query.lower() + query = query.strip()Also applies to: 170-174
342-356
: Tokenizer breaks on escaped quotes inside quoted values (e.g., project:"OWASP "ZAP"").Update regex to accept escaped quotes so quoted segments aren’t split incorrectly.
@@ - reg_components = { - "key_value": r'\S+:"[^"]*"', - "quoted_string": r'"[^"]*"', - "unquoted_word": r"\S+", - } + reg_components = { + # Support escaped quotes within quotes + "key_value": r'\S+:"([^"\\]|\\.)*"', + "quoted_string": r'"([^"\\]|\\.)*"', + "unquoted_word": r"\S+", + }
🧹 Nitpick comments (4)
backend/apps/common/search/query_parser.py (4)
528-539
: Optionally validate calendar-correct dates (reject 2025-13-40).Current regex checks format only. Add a lightweight
datetime.strptime
validation after normalization.@@ - result = QueryParser._DATE_PATTERN.parseString(clean_value, parseAll=True) - result[0] = QueryParser._normalize_date(result[0]) - return operator, result[0] + result = QueryParser._DATE_PATTERN.parseString(clean_value, parseAll=True) + normalized = QueryParser._normalize_date(result[0]) + # Calendar validation + from datetime import datetime + datetime.strptime(normalized, "%Y-%m-%d") + return operator, normalized
328-329
: Clarify default_field type override semantics.You always coerce
default_field
to STRING, potentially overriding a schema-provided non-string type. Either enforce via explicit config error or document the override.- validated_fields[default_field] = FieldType.STRING + # Ensure default_field is STRING + if default_field in validated_fields and validated_fields[default_field] is not FieldType.STRING: + raise QueryParserError( + message="default_field must be of type 'string'", + error_type="CONFIGURATION_ERROR", + field=default_field, + ) + validated_fields[default_field] = FieldType.STRINGWould you like me to update tests to cover this case?
284-287
: Minor: simplify empty-token check.Small readability tweak; behavior unchanged.
- return token.strip().strip('"').strip() == "" + return token.strip('"').strip() == ""
209-214
: Docstring: mention None return in non-strict mode.
create_condition()
can return None when value parsing fails andstrict=False
; reflect this in the docstring.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
backend/apps/common/search/query_parser.py
(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-22T14:42:54.652Z
Learnt from: Adarsh0427
PR: OWASP/Nest#2114
File: backend/tests/apps/common/search/query_parser_test.py:120-121
Timestamp: 2025-08-22T14:42:54.652Z
Learning: The query parser in backend/apps/common/search/query_parser.py can gracefully handle unclosed quotes in field values. For example, `created:"2024-12-01` (missing closing quote) is parsed correctly to produce `{'field': 'created', 'type': 'date', 'op': '=', 'date': '2024-12-01'}`. Test data with unclosed quotes like `'"2023-12-31'` is intentional to verify this robust parsing behavior.
Applied to files:
backend/apps/common/search/query_parser.py
🔇 Additional comments (1)
backend/apps/common/search/query_parser.py (1)
112-116
: Good call on allowing unbalanced quotes for robustness.Including
"
in_UNQUOTED_VALUE
together with the fallback path enables graceful handling of unclosed quotes (per retrieved learnings). LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
backend/apps/common/search/query_parser.py (1)
171-174
: Preserve value casing; stop lowercasing the entire query (contradicts doc).Lowercasing mutates values when case_sensitive=False; doc says “normalize field names only, preserve original value casing.” Field names are already normalized in _parse_token.
- if not self.case_sensitive: - query = query.lower() + # Preserve original value casing; field names are normalized during token parsing.
🧹 Nitpick comments (5)
backend/apps/common/search/query_parser.py (5)
112-116
: Avoid accepting stray quotes in unquoted string values; split string-vs-comparison grammars.UNQUOTED_VALUE includes
"
, which allows tokens like language:foo" to pass string parsing. Keep"
only for comparison parsing (to support unclosed quotes), and use a stricter UNQUOTED_STRING for strings.@@ - QUOTED_VALUE = QuotedString(quoteChar='"', escChar="\\", unquoteResults=False) - UNQUOTED_VALUE = Word(alphanums + '+-.<>=/_"') - FIELD_VALUE = QUOTED_VALUE | UNQUOTED_VALUE + QUOTED_VALUE = QuotedString(quoteChar='"', escChar="\\", unquoteResults=False) + # For comparisons (numbers/dates), allow " to support robustness with unclosed quotes. + UNQUOTED_VALUE = Word(alphanums + '+-.<>=/_"') + # For plain string fields, do not allow stray quotes in unquoted values. + UNQUOTED_STRING = Word(alphanums + "+-.<>=/_") + FIELD_VALUE = QUOTED_VALUE | UNQUOTED_STRING @@ - try: - result = QueryParser.FIELD_VALUE.parseString(value, parseAll=True) - return result[0] + try: + result = QueryParser.FIELD_VALUE.parseString(value, parseAll=True) + return result[0]Also applies to: 450-466
349-364
: Precompile the token parser to avoid rebuilding regex every call.Minor perf win on hot path.
@@ class QueryParser: + # Precompiled token parser + TOKEN_REGEX = r'\S+:"([^"\\]|\\.)*"|"([^"\\]|\\.)*"|\S+' + TOKEN_PARSER = ZeroOrMore(Regex(TOKEN_REGEX)) @@ - token_regex = "|".join(reg) - parser = ZeroOrMore(Regex(token_regex)) - try: - result = parser.parseString(query, parseAll=True) + result = QueryParser.TOKEN_PARSER.parseString(query, parseAll=True) return result.asList()
512-516
: Remove redundant quote-stripping in number/date parsers._parse_comparison_pattern already removes quotes.
@@ - value = QueryParser._remove_quotes(value) - operator, clean_value = QueryParser._parse_comparison_pattern(value) + operator, clean_value = QueryParser._parse_comparison_pattern(value) @@ - value = QueryParser._remove_quotes(value) - operator, clean_value = QueryParser._parse_comparison_pattern(value) + operator, clean_value = QueryParser._parse_comparison_pattern(value)Also applies to: 539-544
37-44
: Doc nit: “Raises: None” is misleading.This function never raises; simplify the doc.
- Raises: - None: This function catches ValueError internally and returns False. + Note: + Returns False if the value is not a valid enum member; does not raise.
284-287
: Tiny simplification.Same behavior with less work.
- return token.strip().strip('"').strip() == "" + return token.strip(' "').strip() == ""
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
backend/poetry.lock
is excluded by!**/*.lock
📒 Files selected for processing (2)
backend/apps/common/search/query_parser.py
(1 hunks)backend/tests/apps/common/search/query_parser_test.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- backend/tests/apps/common/search/query_parser_test.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-22T14:42:54.652Z
Learnt from: Adarsh0427
PR: OWASP/Nest#2114
File: backend/tests/apps/common/search/query_parser_test.py:120-121
Timestamp: 2025-08-22T14:42:54.652Z
Learning: The query parser in backend/apps/common/search/query_parser.py can gracefully handle unclosed quotes in field values. For example, `created:"2024-12-01` (missing closing quote) is parsed correctly to produce `{'field': 'created', 'type': 'date', 'op': '=', 'date': '2024-12-01'}`. Test data with unclosed quotes like `'"2023-12-31'` is intentional to verify this robust parsing behavior.
Applied to files:
backend/apps/common/search/query_parser.py
🔇 Additional comments (1)
backend/apps/common/search/query_parser.py (1)
221-236
: No changes required; project targets Python 3.13, so match-case is supported. CI’s setup-python and pyproject.toml both specify Python 3.13.
Understood, I will take care next time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
backend/apps/common/search/query_parser.py (2)
506-531
: Numeric parsing behavior acknowledged.Converting floats/exponents to ints (e.g.,
1.9 -> 1
,1e3 -> 1000
) is intentional per discussion; OverflowError handling is correct.
179-182
: Preserve value casing; don’t lowercase the entire query.Lowercasing mutates values (e.g., quoted phrases, project names) and contradicts the stated behavior. Normalize field names in
_parse_token
(already done) and keep values untouched.- if not self.case_sensitive: - query = query.lower() + # Preserve value casing; field names are normalized in _parse_token
🧹 Nitpick comments (6)
backend/apps/common/search/query_parser.py (6)
113-116
: Align docstrings with intended case-sensitivity behavior.Clarify that field names are always lowercased; values are preserved regardless of the flag (the flag is for downstream matching).
@@ - - Case sensitivity: normalize field names only, preserve original value casing when - case sensitive is True + - Case sensitivity: field names are always normalized to lowercase; values are preserved. @@ - case_sensitive: Controls value-case behavior for downstream search, - field names are always normalized to lowercase. + case_sensitive: Downstream matching hint for values; field names are always + normalized to lowercase by this parser.Also applies to: 144-146
258-263
: Keep quotes for free-text tokens to retain phrase semantics.Dropping quotes loses the signal for phrase searches. Preserve the token as-is.
- value=QueryParser._remove_quotes(token), + value=token,
534-559
: Validate calendar dates, not just format.Regex accepts impossible dates (e.g., 2025-99-99). In strict mode, raise; otherwise skip.
Add import:
+from datetime import date as _date
Validate after normalization:
@@ - result[0] = QueryParser._normalize_date(result[0]) - return operator, result[0] + result[0] = QueryParser._normalize_date(result[0]) + try: + _date.fromisoformat(result[0]) + except ValueError as e: + raise QueryParserError( + message=f"Invalid calendar date: {e!s}", error_type="DATE_VALUE_ERROR" + ) from e + return operator, result[0]Also applies to: 3-5
351-376
: Avoid rebuilding the token parser per call.Move the token regex/parser to class-level constants to reduce per-parse allocations.
Add class-level constants:
@@ DATE_PATTERN = Regex(r"\d{4}-\d{2}-\d{2}") | Regex(r"\d{8}") # YYYY-MM-DD or YYYYMMDD format + TOKEN_REGEX = r'\S+:"([^"\\]|\\.)*"|"([^"\\]|\\.)*"|\S+' + TOKEN_PARSER = ZeroOrMore(Regex(TOKEN_REGEX))Use it:
- token_regex = "|".join(reg) - parser = ZeroOrMore(Regex(token_regex)) + parser = QueryParser.TOKEN_PARSERAlso applies to: 118-126
388-409
: Defensive guard for empty tokens.Unlikely, but safe to avoid IndexError on
token[0]
.token = token.strip() + if not token: + return None, token if token[0] == '"': return None, token
248-264
: Minor: unify string parsing path.Consider reusing
FIELD_VALUE.parseString(...)[0]
here for consistency with_parse_string_value
(keeps quotes and escaping consistently). If you prefer the simplervalue=token
, ignore.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
backend/poetry.lock
is excluded by!**/*.lock
📒 Files selected for processing (2)
backend/apps/common/search/query_parser.py
(1 hunks)backend/tests/apps/common/search/query_parser_test.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- backend/tests/apps/common/search/query_parser_test.py
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: Adarsh0427
PR: OWASP/Nest#2114
File: backend/tests/apps/common/search/query_parser_test.py:120-121
Timestamp: 2025-08-22T14:42:54.652Z
Learning: The query parser in backend/apps/common/search/query_parser.py can gracefully handle unclosed quotes in field values. For example, `created:"2024-12-01` (missing closing quote) is parsed correctly to produce `{'field': 'created', 'type': 'date', 'op': '=', 'date': '2024-12-01'}`. Test data with unclosed quotes like `'"2023-12-31'` is intentional to verify this robust parsing behavior.
📚 Learning: 2025-08-22T14:42:54.652Z
Learnt from: Adarsh0427
PR: OWASP/Nest#2114
File: backend/tests/apps/common/search/query_parser_test.py:120-121
Timestamp: 2025-08-22T14:42:54.652Z
Learning: The query parser in backend/apps/common/search/query_parser.py can gracefully handle unclosed quotes in field values. For example, `created:"2024-12-01` (missing closing quote) is parsed correctly to produce `{'field': 'created', 'type': 'date', 'op': '=', 'date': '2024-12-01'}`. Test data with unclosed quotes like `'"2023-12-31'` is intentional to verify this robust parsing behavior.
Applied to files:
backend/apps/common/search/query_parser.py
🧬 Code graph analysis (1)
backend/apps/common/search/query_parser.py (1)
backend/apps/ai/agent/tools/rag/rag_tool.py (1)
query
(34-63)
🔇 Additional comments (2)
backend/apps/common/search/query_parser.py (2)
361-366
: Escaped quotes in quoted segments — good catch.The updated regex handles
\"
correctly; this fixes broken tokenization for inputs likeproject:"OWASP \"ZAP\""
.
118-126
: UNQUOTED_VALUE includes"
by design — acceptable given unclosed-quote handling.Matches the retrieved learning to accept tokens like
created:"2024-12-01
robustly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
backend/apps/common/search/query_parser.py (1)
179-182
: Stop lowercasing the entire query (breaks value casing).Lowercasing the whole query mutates values (e.g., project titles, authors) in case-insensitive mode. Field normalization is already handled in
_parse_token
. Preserve original value casing.- query = query.strip() - if not self.case_sensitive: - query = query.lower() + query = query.strip()
🧹 Nitpick comments (5)
backend/apps/common/search/query_parser.py (5)
113-116
: Clarify case-sensitivity doc: preserve values; lowercase fields only.Current wording implies preserving values only when
case_sensitive=True
. Parsing should always preserve values; only field names are normalized.- - Case sensitivity: normalize field names only, preserve original value casing when - case sensitive is True + - Case sensitivity: normalize field names only; always preserve original value casing. + The `case_sensitive` flag affects downstream matching, not parsing here.
3-5
: Validate calendar dates (e.g., reject 2025-02-30).Add a real date check after normalization to prevent impossible dates slipping through.
-from dataclasses import asdict, dataclass -from enum import Enum +from dataclasses import asdict, dataclass +from enum import Enum +from datetime import date- result = QueryParser.DATE_PATTERN.parseString(clean_value, parseAll=True) - result[0] = QueryParser._normalize_date(result[0]) - return operator, result[0] + result = QueryParser.DATE_PATTERN.parseString(clean_value, parseAll=True) + normalized = QueryParser._normalize_date(result[0]) + # Calendar validation + try: + date.fromisoformat(normalized) + except ValueError as ve: + raise QueryParserError( + message=f"Invalid calendar date: {normalized}", + error_type="DATE_VALUE_ERROR", + ) from ve + return operator, normalizedAlso applies to: 553-555
513-516
: Document numeric coercion and improve range error.Make truncation semantics explicit and avoid raising/catching a bare
OverflowError
intentionally.- Returns: - Tuple of (operator, numeric_value) where numeric_value is integer - and "=" is the default operator. + Returns: + Tuple of (operator, numeric_value) where numeric_value is an integer. + Floats and scientific notation are accepted and truncated via int(float(...)). + "=" is the default operator.- if (numeric_value > 2**64 or numeric_value < 0): - raise OverflowError + if numeric_value > 2**64 or numeric_value < 0: + raise ValueError("numeric value out of supported range [0, 2**64]")- except (ValueError, OverflowError, ParseException) as e: + except (ValueError, OverflowError, ParseException) as e: raise QueryParserError( message=f"Invalid numeric value: {e!s}", error_type="NUMBER_VALUE_ERROR" ) from eAlso applies to: 525-527, 529-533
258-263
: Confirm intent: quoted strings kept for field values, but stripped for free-text.Field values preserve quotes (e.g., '"C# C++"'), while free-text tokens drop quotes. Is this asymmetry intentional for API consumers?
I can align both to return unquoted values consistently (and add a flag indicating original quoting) if desired.
293-296
: Minor simplification for emptiness check.Equivalent and a bit cleaner.
- return token.strip().strip('"').strip() == "" + return token.strip().strip('"') == ""
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
backend/apps/common/search/query_parser.py
(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: Adarsh0427
PR: OWASP/Nest#2114
File: backend/tests/apps/common/search/query_parser_test.py:120-121
Timestamp: 2025-08-22T14:42:54.652Z
Learning: The query parser in backend/apps/common/search/query_parser.py can gracefully handle unclosed quotes in field values. For example, `created:"2024-12-01` (missing closing quote) is parsed correctly to produce `{'field': 'created', 'type': 'date', 'op': '=', 'date': '2024-12-01'}`. Test data with unclosed quotes like `'"2023-12-31'` is intentional to verify this robust parsing behavior.
📚 Learning: 2025-08-22T14:42:54.652Z
Learnt from: Adarsh0427
PR: OWASP/Nest#2114
File: backend/tests/apps/common/search/query_parser_test.py:120-121
Timestamp: 2025-08-22T14:42:54.652Z
Learning: The query parser in backend/apps/common/search/query_parser.py can gracefully handle unclosed quotes in field values. For example, `created:"2024-12-01` (missing closing quote) is parsed correctly to produce `{'field': 'created', 'type': 'date', 'op': '=', 'date': '2024-12-01'}`. Test data with unclosed quotes like `'"2023-12-31'` is intentional to verify this robust parsing behavior.
Applied to files:
backend/apps/common/search/query_parser.py
🪛 Ruff (0.12.2)
backend/apps/common/search/query_parser.py
526-526: Abstract raise
to an inner function
(TRY301)
🔇 Additional comments (2)
backend/apps/common/search/query_parser.py (2)
398-406
: Field-name normalization looks good.Always lowercasing field names at parse time is consistent with the schema and avoids surprises. Values are left intact (once the whole-query lowercasing is removed).
361-379
: Tokenization handles escaped quotes correctly.Good use of patterns allowing
\"
inside quotes for both key-value and standalone quoted tokens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've made a good progress with this PR and very close to merging it. However, you still skip our mandatory steps to ensure code quality and tests. Please run make check-test locally. Also look into these when you get a chance:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully, this will serve as a strong foundation to build upon.
Thank you @Adarsh0427 👍
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
backend/apps/common/search/query_parser.py (1)
172-175
: Do not lowercase the entire query (breaks values in case-insensitive mode)Lowercasing the whole query mutates quoted/unquoted values (e.g., project names). Normalize field names only; keep values intact.
- query = query.strip() - if not self.case_sensitive: - query = query.lower() + query = query.strip()
🧹 Nitpick comments (4)
backend/apps/common/search/query_parser.py (4)
117-120
: Clarify case-sensitivity statement to match intended behaviorRecommend wording that does not imply value lowercasing.
- - Case sensitivity: normalize field names only, preserve original value casing when - case sensitive is True + - Case sensitivity: normalize field names only; preserve original value casing
348-390
: Tokenizer creation on every call; consider precompiling at class levelbuilding the Regex/ZeroOrMore parser each time adds overhead. Precompile once as a class attribute.
class QueryParser: @@ - def _split_tokens(query: str) -> list[str]: + # Precompiled tokenizer + TOKENIZER = ZeroOrMore( + Regex('|'.join([ + r'\S+:"([^"\\]|\\.)*"', + r'"([^"\\]|\\.)*"', + r'\S+', + ])) + ) + + def _split_tokens(query: str) -> list[str]: @@ - parser = ZeroOrMore( - Regex( - "|".join( - [ - regex_components["key_value"], - regex_components["quoted_string"], - regex_components["unquoted_word"], - ] - ) - ) - ) try: - result = parser.parseString(query, parseAll=True) + result = QueryParser.TOKENIZER.parseString(query, parseAll=True)
472-491
: String parsing is strict on unclosed quotes for field values—confirm intentFree-text tokens tolerate unclosed quotes, but field string values require closed quotes. If you want parity with dates’ “unclosed quote tolerance,” consider accepting unclosed quoted strings here too.
122-129
: Grammar definitions look solid; minor import tidy-upQUOTED_VALUE with escaping and default “=” via Optional/oneOf are correct. You can collapse the two pyparsing import blocks for cleanliness.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
backend/poetry.lock
is excluded by!**/*.lock
📒 Files selected for processing (3)
backend/apps/common/search/query_parser.py
(1 hunks)backend/pyproject.toml
(1 hunks)backend/tests/apps/common/search/query_parser_test.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- backend/tests/apps/common/search/query_parser_test.py
- backend/pyproject.toml
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: Adarsh0427
PR: OWASP/Nest#2114
File: backend/tests/apps/common/search/query_parser_test.py:120-121
Timestamp: 2025-08-22T14:42:54.652Z
Learning: The query parser in backend/apps/common/search/query_parser.py can gracefully handle unclosed quotes in field values. For example, `created:"2024-12-01` (missing closing quote) is parsed correctly to produce `{'field': 'created', 'type': 'date', 'op': '=', 'date': '2024-12-01'}`. Test data with unclosed quotes like `'"2023-12-31'` is intentional to verify this robust parsing behavior.
📚 Learning: 2025-08-22T14:42:54.652Z
Learnt from: Adarsh0427
PR: OWASP/Nest#2114
File: backend/tests/apps/common/search/query_parser_test.py:120-121
Timestamp: 2025-08-22T14:42:54.652Z
Learning: The query parser in backend/apps/common/search/query_parser.py can gracefully handle unclosed quotes in field values. For example, `created:"2024-12-01` (missing closing quote) is parsed correctly to produce `{'field': 'created', 'type': 'date', 'op': '=', 'date': '2024-12-01'}`. Test data with unclosed quotes like `'"2023-12-31'` is intentional to verify this robust parsing behavior.
Applied to files:
backend/apps/common/search/query_parser.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Run frontend e2e tests
- GitHub Check: Run frontend unit tests
- GitHub Check: Run backend tests
🔇 Additional comments (2)
backend/apps/common/search/query_parser.py (2)
409-417
: Field-name normalization looks goodLowercasing field identifiers at parse time aligns with the “fields always lowercase” rule.
493-516
: Boolean parsing returns actual booleans—LGTMConsistent with type semantics and downstream usage.
def _parse_number_value(value: str) -> tuple[str, int]: | ||
"""Parse numeric value with optional comparison operator. | ||
|
||
Args: | ||
value: String value to parse | ||
|
||
Returns: | ||
Tuple of (operator, numeric_value) where numeric_value is integer, | ||
clamped to [0, 2^32] range and "=" is the default operator. | ||
|
||
Raises: | ||
QueryParserError: If value is not a valid number or int overflow occurs | ||
|
||
""" | ||
try: | ||
operator, clean_value = QueryParser._parse_comparison_pattern( | ||
QueryParser._remove_quotes(value) | ||
) | ||
|
||
numeric_value = int(float(clean_value)) | ||
if numeric_value > 2**32 or numeric_value < 0: | ||
operator = ">" if numeric_value < 0 else operator | ||
numeric_value = max(0, min(numeric_value, 2**32)) | ||
except ValueError as e: | ||
raise QueryParserError( | ||
message=f"Invalid numeric value: {e!s}", error_type="NUMBER_VALUE_ERROR" | ||
) from e | ||
except OverflowError as e: | ||
raise QueryParserError( | ||
message=f"Numeric value overflow: {e!s}", error_type="NUMBER_VALUE_ERROR" | ||
) from e | ||
|
||
return operator, numeric_value | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Do not change the comparison operator when clamping numeric values
Flipping the operator for negatives (e.g., stars:<-5 => stars:>0) changes intent and can over-broaden results. Clamp the value but keep the operator unchanged.
- numeric_value = int(float(clean_value))
- if numeric_value > 2**32 or numeric_value < 0:
- operator = ">" if numeric_value < 0 else operator
- numeric_value = max(0, min(numeric_value, 2**32))
+ numeric_value = int(float(clean_value))
+ # Clamp to allowed range without altering the operator semantics
+ numeric_value = max(0, min(numeric_value, 2**32))
If you prefer, in non-strict mode you could skip unsatisfiable constraints (e.g., "<0") instead of clamping; in strict mode, raise.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def _parse_number_value(value: str) -> tuple[str, int]: | |
"""Parse numeric value with optional comparison operator. | |
Args: | |
value: String value to parse | |
Returns: | |
Tuple of (operator, numeric_value) where numeric_value is integer, | |
clamped to [0, 2^32] range and "=" is the default operator. | |
Raises: | |
QueryParserError: If value is not a valid number or int overflow occurs | |
""" | |
try: | |
operator, clean_value = QueryParser._parse_comparison_pattern( | |
QueryParser._remove_quotes(value) | |
) | |
numeric_value = int(float(clean_value)) | |
if numeric_value > 2**32 or numeric_value < 0: | |
operator = ">" if numeric_value < 0 else operator | |
numeric_value = max(0, min(numeric_value, 2**32)) | |
except ValueError as e: | |
raise QueryParserError( | |
message=f"Invalid numeric value: {e!s}", error_type="NUMBER_VALUE_ERROR" | |
) from e | |
except OverflowError as e: | |
raise QueryParserError( | |
message=f"Numeric value overflow: {e!s}", error_type="NUMBER_VALUE_ERROR" | |
) from e | |
return operator, numeric_value | |
def _parse_number_value(value: str) -> tuple[str, int]: | |
"""Parse numeric value with optional comparison operator. | |
Args: | |
value: String value to parse | |
Returns: | |
Tuple of (operator, numeric_value) where numeric_value is integer, | |
clamped to [0, 2^32] range and "=" is the default operator. | |
Raises: | |
QueryParserError: If value is not a valid number or int overflow occurs | |
""" | |
try: | |
operator, clean_value = QueryParser._parse_comparison_pattern( | |
QueryParser._remove_quotes(value) | |
) | |
numeric_value = int(float(clean_value)) | |
# Clamp to allowed range without altering the operator semantics | |
numeric_value = max(0, min(numeric_value, 2**32)) | |
except ValueError as e: | |
raise QueryParserError( | |
message=f"Invalid numeric value: {e!s}", error_type="NUMBER_VALUE_ERROR" | |
) from e | |
except OverflowError as e: | |
raise QueryParserError( | |
message=f"Numeric value overflow: {e!s}", error_type="NUMBER_VALUE_ERROR" | |
) from e | |
return operator, numeric_value |
🤖 Prompt for AI Agents
In backend/apps/common/search/query_parser.py around lines 518 to 551, the code
currently flips the comparison operator when clamping out-of-range numeric
values (e.g., negative values become ">" with 0), which alters query semantics;
instead, clamp numeric_value to the [0, 2**32] range but do NOT modify operator
— preserve the original operator returned by _parse_comparison_pattern; if you
want optional behavior, implement separate handling for strict vs non-strict
modes (in strict mode raise a QueryParserError for unsatisfiable constraints
like "<0", in non-strict mode either skip such filters or leave them clamped)
but the immediate fix is: remove the operator reassignment and only adjust
numeric_value to max(0, min(numeric_value, 2**32)).
* Implement Initial Query Parser for Search Interfaces [#2038] * fixed suggestions addressed bots * fixed edge cases for token parsing and enhaced test cases * fixed suggestions addressed by bot in query_parser_test.py * addressed the suggestions and fixed them * Refine field name regex and validation logic Updated the field name regex pattern and modified the return logic in the field name validation method. * fixed suggestions addressed by bot * Fix Docs case-sensitivity behavior explanation * Resolved pnpm run format issue in some frontend files * Restored frontend files format to original * Update code * Resolved suggestions * format check * updated parser to return value with type and enhanced test cases * Implement overflow check for numeric value parsing * fixed pre-commit checks and handled try/except more effectively * Update code --------- Co-authored-by: Arkadii Yakovets <[email protected]> Co-authored-by: Arkadii Yakovets <[email protected]> Co-authored-by: Kate Golovanova <[email protected]>
Proposed change
Resolves #2038
Features Implemented
Core Features
>
,<
,>=
,<=
,=
for numeric and date fields.-
Quoted strings
: Support for multi-word values:project:"OWASP ZAP"
(support all ASCII characters inside quotes).-
Unquoted string
: Support single word value:language:python
.-
Nested quoted string
: Support nested quoted values:project:"OWASP "ZAP""
.-
Boolean values
:true
,false
,yes
,no
,1
,0
,on
,off
.-
Date format
: YYYY-MM-DD/YYYYMMDD with comparison operators .-
Number
: Integer values within the range of [0,2^32].Query Output Structure:
Query Examples
Example 1: Multi-field Query
Example 2: Numeric Comparison
Example 3: Quoted String
Example 4: Date Range Query
Example 5: Free Text Search (
default_field = "query"
)Checklist