-
Notifications
You must be signed in to change notification settings - Fork 112
Open
Description
I want to feed some Big5-UAO encoded data. Since there is no encoding
parameter (or something like that), I tried using ByteStream
:
stream = ByteStream(screen)
stream.select_other_charset("@")
stream.feed(bytes_object)
However, after checking the source code, it seems that this setup equals to:
stream = Stream(screen)
stream.feed(bytes_object.decode("latin-1"))
This method doesn't work because the bytes of Big5-UAO encoded string may contain control characters like \x9d
, and match_text
failed to match the entire string:
Lines 132 to 135 in 676610b
_special = set([ctrl.ESC, ctrl.CSI_C1, ctrl.NUL, ctrl.DEL, ctrl.OSC_C1]) | |
_special.update(basic) | |
_text_pattern = re.compile( | |
"[^" + "".join(map(re.escape, _special)) + "]+") |
Here I generated a list of unicode character which contains control characters if encoded in Big5-UAO:
https://gist.github.com/eight04/3de731b7300a6b5036e082f801e2e3e9
How about encoding the bytes into unicode string with Big5-UAO before passing it to stream.feed
?
We can't. In our usecase, we need a special feature called "雙色字". It colors a double width charater with two different colors. For example:
- Encode
"我"
into bytesb'\xa7\xda'
- Insert ANSI escape code to pos 0 and pos 1:
b'\x1b[1;31m\xa7\x1b[32m\xda'
- This is what it looks like:
As a result, we can't decode the bytes before the escape code is parsed.
May we can add a flag to disable C1 controls in Stream.feed
parser?
Metadata
Metadata
Assignees
Labels
No labels