-
Couldn't load subscription status.
- Fork 1.6k
Expose some methods of ruff_python_parser::Lexer
#21074
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you tell me more about your use case?
The lexer is very much tied to our parser and not really intended to be public API.
| /// This means that the input source should be the complete source code and not the | ||
| /// sliced version. | ||
| pub(crate) fn new(source: &'src str, mode: Mode, start_offset: TextSize) -> Self { | ||
| pub fn new(source: &'src str, mode: Mode, start_offset: TextSize) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be okay to have a function next to lex_tokens that also takes an offset but I rather keep the constructor pub(crate)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np.But it does feel a bit redundant as it would have the same signature and would call Lexer::new under the hood. so we end up with two identical methods only that one of them is pub and the other is pub(crate)
I'm not even sure how to call it 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After your explanation here: #21074 (comment)
There's no need to have this one public. will revert
|
ofc:) I'm trying to implement a Rust iterator that behaves like CPython’s internal, undocumented On each
Because the input is consumed lazily, I need to keep track of my current position (offset) in the source. I don’t want to re-tokenize everything from the beginning every time a new line is read. For example, imagine the first line is: def foo():After processing the passNow, my def foo():
passI should be able to get the next token starting from where the Here’s a short Python snippet that illustrates the behavior: import io
import _tokenize
buf = io.StringIO(
"""
def func():
pass
-)( ERROR $&-
for i in range(1):
pass
""")
try:
for tup in _tokenize.TokenizerIter(buf.readline, extra_tokens=False):
# (token numeric value, token value, (char_offset_start, line_start), (char_offset_end, line_end), current_line)
# Token numeric val from: https://github.com/python/cpython/blob/ebf955df7a89ed0c7968f79faec1de49f61ed7cb/Lib/token.py#L7-L79
print(tup)
except BaseException as err:
print(f"{err=}")
print(f"{buf.read()=}") # Remaining buffer that wasn't touched. |
|
Thanks for the explanation. I don't think So what you really want is a way to update the underlying |
Oh, good to know. So, if I understand it correctly: And for this PR, I'd need to make the following adjustments: Add ruff/crates/ruff_python_parser/src/lib.rs Line 74 in 64ab79e
And make this ruff/crates/ruff_python_parser/src/lexer.rs Line 134 in 64ab79e
|
|
It's not clear to me why you need the current_* methos over just calling next token? Adjusting the cursor Location has the same problem as creating a new lexer: it doesn't account for the internal state. |
|
@MichaReiser After your explanation of:
It seems like tysm for the replies and explanations! |
While trying to lex an incomplete source code that comes from a lazy iterator or a
BufRead, I had trouble with:Unfortunately
ruff/crates/ruff_python_parser/src/lexer.rs
Lines 1823 to 1824 in 64ab79e
doesn't let you do that.
Feel free to close this PR if this is an unwanted change