Skip to content

Conversation

@dylwil3
Copy link
Collaborator

@dylwil3 dylwil3 commented Oct 24, 2025

When formatting clause headers for clauses that are not their own node, like an else clause or finally clause, we begin searching for the keyword at the end of the previous statement. However, if the previous statement ended in a semicolon this caused a panic because we only expected trivia between the end of the last statement and the keyword.

This PR adjusts the starting point of our search for the keyword to begin after the optional semicolon in these cases.

Closes #21065

@dylwil3 dylwil3 requested a review from MichaReiser as a code owner October 24, 2025 17:34
@dylwil3 dylwil3 added bug Something isn't working formatter Related to the formatter labels Oct 24, 2025
Comment on lines 437 to 447
/// Finds the range of `keyword` starting the search at `start_position`. Expects only comments and `(` between
/// Finds the range of `keyword` starting the search at `start_position`. Expects only trivia between
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't related to this PR but this doc-comment looked incorrect to me - there was nothing in the code that allowed an open parentheses, as far as I could tell.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 24, 2025

ruff-ecosystem results

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

}

fn after_optional_semicolon(end_of_statement: TextSize, source: &str) -> TextSize {
let mut tokenizer = SimpleTokenizer::starts_at(end_of_statement, source);
Copy link
Member

@MichaReiser MichaReiser Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm inclined to instead add an argument to find_keyword to force every caller to think about whether the body can contain a semicolon or not that may need to be skipped. It also avoids the need to lex the token at end_of_statement twice in the common case, just so that we can skip the semi.

The way I'd do it is to add a new enum SkipSemicolon which also gives us the opportunity to document when a semicolon needs to be skipped.

A maybe simpler API would be to change the start_position of find_keywrod to an enum StartOffset that has two variants

  • ClauseStart(TextSize)
  • LastStatement(TextSize)

The advantage of this API is that it doesn't introduce more arguments and it also allows find_keyword to just do the right thing.

This might even allow us to remove some of the unwrap by adding a StartOffset::end_of_last_statement(&suite) and StartOffset::clause_start(impl Ranged)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I added StartOffset::clause_start(impl Ranged) but not StartOffset::end_of_last_statement(&suite) because it looks like each of the "last statement" examples required some custom logic depending on the statement type. I removed after_optional_semicolon and replaced it with inlined logic that does not re-lex the keyword in the common case.

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this. I've a suggestion on how we can make this more error resilient against a future me forgeting to call after_optional_semicolon

@dylwil3 dylwil3 merged commit 116611b into astral-sh:main Oct 27, 2025
37 checks passed
dcreager added a commit that referenced this pull request Oct 27, 2025
* origin/main:
  Respect `--output-format` with `--watch` (#21097)
  [`pyflakes`] Revert to stable behavior if imports for module lie in alternate branches for `F401` (#20878)
  Fix finding keyword range for clause header after statement ending with semicolon (#21067)
  [ty] Fix bug where ty would think all types had an `__mro__` attribute (#20995)
  Restore `indent.py` (#21094)
  [`flake8-django`] Apply `DJ001` to annotated fields (#20907)
  Clearer error message when `line-length` goes beyond threshold (#21072)
  Update upload and download artifacts github actions (#21083)
  Update dependency mdformat-mkdocs to v4.4.2 (#21088)
  Update cargo-bins/cargo-binstall action to v1.15.9 (#21086)
  Update Rust crate clap to v4.5.50 (#21090)
  Update Rust crate get-size2 to v0.7.1 (#21091)
  Update Rust crate bstr to v1.12.1 (#21089)
  Add missing docstring sections to the numpy list (#20931)
  [`pydoclint`] Fix false positive on explicit exception re-raising (`DOC501`, `DOC502`) (#21011)
  [ty] Use constructor parameter types as type context (#21054)
dcreager added a commit that referenced this pull request Oct 28, 2025
…l-constraint-sets

* dcreager/refactor-constraint-mdtests: (60 commits)
  add static_asserts
  move all reveal diagnostics to separate line
  add more gradual tests
  better comment
  restructure a bit
  better names/comments
  simplify before implication
  two typevars!
  mdformat
  rename mdtest
  move is_subtype_of_given into ConstraintSet
  move where we grab these
  add ConstraintSet.{always,never}
  move range_constraint into ConstraintSet class
  [ty] Rename `inner` query for better debugging experience (#21106)
  [ty] Add new "constraint implication" typing relation (#21010)
  [semantic error tests]: refactor semantic error tests to separate files (#20926)
  Respect `--output-format` with `--watch` (#21097)
  [`pyflakes`] Revert to stable behavior if imports for module lie in alternate branches for `F401` (#20878)
  Fix finding keyword range for clause header after statement ending with semicolon (#21067)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working formatter Related to the formatter

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Formatter panic when range formatting certain clause headers with semicolons

2 participants