Skip to content

Conversation

srielau
Copy link
Contributor

@srielau srielau commented Aug 20, 2025

What changes were proposed in this pull request?

We are proposing to greatly expand the availability of named parameter markers to virtually all places literals are allowed.
Th eway to do this is by adding a pre-processing parser which's sole purpose is to find named parameter markers and replace them with literals without having to impact the grammar or analyzer profoundly.

Why are the changes needed?

Users have shown a desire to parameterize not only ezpressions in queries, but also DDL scripts and utility commands.
Many such uses, such as COMMENT and TBLPROPERTIES do not get processed by Analyzer rules.

Does this PR introduce any user-facing change?

Yes, it expands teh usage of named parameter markers.

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Yes, it was co-authored by cursor.

This commit moves the core parameter substitution parser implementation
from the parmsubstitution branch to the preparser branch:

- SubstituteParamsParser: Main parser for substituting parameter markers in SQL
- SubstituteParmsAstBuilder: AST builder for extracting parameter locations

The usage code (BindParameters rule) remains in the parmsubstitution branch
as requested, only the core parser implementation is moved.
This commit implements parameter substitution as a pre-processing step
in the SparkSqlParser, allowing parameter markers to be substituted
before the main SQL parser executes.

Key changes:
1. Added ParameterContext classes and ThreadLocal management for
   passing parameter values to the parser
2. Modified SparkSqlParser to perform parameter substitution before
   variable substitution and main parsing
3. Updated SparkSession sql methods to use new parameter substitution
   approach via ThreadLocal context

The flow is now:
1. SparkSession sets parameter context in ThreadLocal
2. SparkSqlParser detects and substitutes parameter markers
3. Variable substitution happens on the result
4. Main ANTLR parsing proceeds with fully substituted SQL

This eliminates the need for ParameterizedQuery LogicalPlan nodes
and moves parameter handling to the parsing phase where it belongs.
- Remove unused Literal import
- Fix trailing whitespace issues
Added fully qualified class names for:
- SubstituteParamsParser
- ThreadLocalParameterContext
- ParameterContext
- SubstitutionRule
- NamedParameterContext
- PositionalParameterContext

This should resolve the compilation issues preventing parameter substitution from working.
This will help us understand why parameter substitution is not working:
- Added debug logging in parse() method to show parameter context
- Added debug logging in substituteParametersIfNeeded() to trace execution
- Will help identify if context is being set or if substitution is failing
Removed trailing whitespace from lines in the parameter substitution integration code.
Future commits will ensure all files are free of trailing whitespace.
- Broke long import statement across multiple lines (imports should be < 100 chars)
- Split long ThreadLocalParameterContext.get() call across lines
- All lines now comply with 100-character limit

This establishes the coding style guideline:
- Maximum line length: 100 characters
- Break long imports and method calls appropriately
- Use proper indentation for continuation lines
Created CODING_STYLE_GUIDELINES.md with rules for:
- Maximum line length: 100 characters
- No trailing whitespace
- Proper import statement formatting
- Method definition formatting
- Pattern matching formatting
- Pre-commit checks and tools
- Examples of common fixes

This establishes consistent standards for all future code changes and provides
clear guidelines for maintaining code quality.
The parameter substitution integration has been verified to work correctly:
- Named parameters are properly detected and substituted
- Parameter values are correctly converted to SQL literals
- Integration flow: SparkSession -> ThreadLocal -> SparkSqlParser -> SubstituteParamsParser

Example working usage:
spark.sql("create or replace view v(c1) AS VALUES (:parm)", Map("parm" -> "hello"))
Successfully substitutes :parm with 'hello' before main parsing.
@srielau srielau changed the title Preparser [WIP] Preparser Aug 20, 2025
@github-actions github-actions bot added the SQL label Aug 20, 2025
@srielau srielau force-pushed the preparser branch 2 times, most recently from 816a26f to e415f4f Compare August 20, 2025 19:05
@@ -0,0 +1,127 @@
# Coding Style Guidelines
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this for llm?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol, yes I remove it

| namedParameterMarkerVal
;

namedParameterMarkerVal
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems useless.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it is not. Because without pushing it down we need visits for stringLitWithoutMarker (either all or none)

if (ctx.RECURSIVE() == null) {
operationNotAllowed("Cannot specify MAX RECURSION LEVEL when the CTE is not marked as " +
"RECURSIVE", ctx)
}
Some(nCtx.INTEGER_VALUE().getText().toInt)
Some(nCtx.integerValue().getText().toInt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the normal parser, shall we fail in integerValue if parameter marker is found? it should be replaced by the pre-parser already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants