Skip to content

How do you specify a different encoding? #118

@eight04

Description

@eight04

I want to feed some Big5-UAO encoded data. Since there is no encoding parameter (or something like that), I tried using ByteStream:

stream = ByteStream(screen)
stream.select_other_charset("@")
stream.feed(bytes_object)

However, after checking the source code, it seems that this setup equals to:

stream = Stream(screen)
stream.feed(bytes_object.decode("latin-1"))

This method doesn't work because the bytes of Big5-UAO encoded string may contain control characters like \x9d, and match_text failed to match the entire string:

pyte/pyte/streams.py

Lines 132 to 135 in 676610b

_special = set([ctrl.ESC, ctrl.CSI_C1, ctrl.NUL, ctrl.DEL, ctrl.OSC_C1])
_special.update(basic)
_text_pattern = re.compile(
"[^" + "".join(map(re.escape, _special)) + "]+")

Here I generated a list of unicode character which contains control characters if encoded in Big5-UAO:
https://gist.github.com/eight04/3de731b7300a6b5036e082f801e2e3e9

How about encoding the bytes into unicode string with Big5-UAO before passing it to stream.feed?

We can't. In our usecase, we need a special feature called "雙色字". It colors a double width charater with two different colors. For example:

  • Encode "我" into bytes b'\xa7\xda'
  • Insert ANSI escape code to pos 0 and pos 1: b'\x1b[1;31m\xa7\x1b[32m\xda'
  • This is what it looks like: https://i.imgur.com/j0hwhZM.png

As a result, we can't decode the bytes before the escape code is parsed.


May we can add a flag to disable C1 controls in Stream.feed parser?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions