How do you specify a different encoding?

I want to feed some Big5-UAO encoded data. Since there is no `encoding` parameter (or something like that), I tried using `ByteStream`:
```py
stream = ByteStream(screen)
stream.select_other_charset("@")
stream.feed(bytes_object)
```
However, after checking the source code, it seems that this setup equals to:
```py
stream = Stream(screen)
stream.feed(bytes_object.decode("latin-1"))
```
This method doesn't work because the bytes of Big5-UAO encoded string may contain control characters like `\x9d`, and `match_text` failed to match the entire string:
https://github.com/selectel/pyte/blob/676610b43954b644c05823371df6daf87caafdad/pyte/streams.py#L132-L135
Here I generated a list of unicode character which contains control characters if encoded in Big5-UAO:
https://gist.github.com/eight04/3de731b7300a6b5036e082f801e2e3e9

#### How about encoding the bytes into unicode string with Big5-UAO before passing it to `stream.feed`?

We can't. In our usecase, we need a special feature called "雙色字". It colors a double width charater with two different colors. For example:

* Encode `"我"` into bytes `b'\xa7\xda'`
* Insert ANSI escape code to pos 0 and pos 1: `b'\x1b[1;31m\xa7\x1b[32m\xda'`
* This is what it looks like: ![https://i.imgur.com/j0hwhZM.png](https://i.imgur.com/j0hwhZM.png)

As a result, we can't decode the bytes before the escape code is parsed.

-------

May we can add a flag to disable C1 controls in `Stream.feed` parser?


	_special = set([ctrl.ESC, ctrl.CSI_C1, ctrl.NUL, ctrl.DEL, ctrl.OSC_C1])
	_special.update(basic)
	_text_pattern = re.compile(
	"[^" + "".join(map(re.escape, _special)) + "]+")

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How do you specify a different encoding? #118

How about encoding the bytes into unicode string with Big5-UAO before passing it to `stream.feed`?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How do you specify a different encoding? #118

Description

How about encoding the bytes into unicode string with Big5-UAO before passing it to stream.feed?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

How about encoding the bytes into unicode string with Big5-UAO before passing it to `stream.feed`?