remove 'Content-Encoding' header when returning responses #14

BurnzZ · 2022-05-02T05:37:36Z

Fixes #11. Built on top of #10.

Lots of approach to fix it but it would seem that this is the simplest one. The only downside is that the response instance returned to Scrapy doesn't have the accurate representation of the complete headers since Content-Encoding is removed. However, that should be okay since we're preserving it anyways in response.zyte_api_response.headers.

kmike · 2022-05-02T19:16:19Z

The fix makes sense to me, though argument could be made that it's actually an issue with Zyte API: httpResponseBody is not a raw response, it's returned after content-encoding applied. So, our options:

Keep API as-is, remove header here. Pro: simple; I think that's a correct way to tell Scrapy not to decode. Cons: every client may need to do the same, if they support automatic handling of this header. Not sure if that's a big issue.
Keep API as-is, remove header here, documnent the API behavior propely in docs.zyte.com.
Remove the header in API. Pro: headers would match the content. Cons: headers won't be the same as website returns.
Don't decode responses in API, e.g. don't do gunzip. This is the most "pure" way, as it preserves the most information, but it could lead to some confusion on how to work with httpResponseBody, and on increased complexity for clients.

from these options, I like (2) the most. @akshayphilar any preference from you?

akshayphilar · 2022-05-02T20:14:51Z

@kmike agree, the second approach makes sense.

remove 'Content-Encoding' header when returning responses

fb0b412

BurnzZ requested a review from kmike May 2, 2022 05:37

kmike merged commit 052d0d6 into zyte-api-response May 11, 2022

BurnzZ deleted the fix-decompression-error branch May 16, 2022 03:31

BurnzZ mentioned this pull request May 17, 2022

Scrapy errors due to decompression attempt based on Content-Encoding #11

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

remove 'Content-Encoding' header when returning responses #14

remove 'Content-Encoding' header when returning responses #14

Uh oh!

BurnzZ commented May 2, 2022

Uh oh!

kmike commented May 2, 2022

Uh oh!

akshayphilar commented May 2, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

remove 'Content-Encoding' header when returning responses #14

remove 'Content-Encoding' header when returning responses #14

Uh oh!

Conversation

BurnzZ commented May 2, 2022

Uh oh!

kmike commented May 2, 2022

Uh oh!

akshayphilar commented May 2, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants