Skip to content

Conversation

@BurnzZ
Copy link
Member

@BurnzZ BurnzZ commented May 2, 2022

Fixes #11. Built on top of #10.


Lots of approach to fix it but it would seem that this is the simplest one. The only downside is that the response instance returned to Scrapy doesn't have the accurate representation of the complete headers since Content-Encoding is removed. However, that should be okay since we're preserving it anyways in response.zyte_api_response.headers.

@BurnzZ BurnzZ requested a review from kmike May 2, 2022 05:37
@kmike
Copy link
Member

kmike commented May 2, 2022

The fix makes sense to me, though argument could be made that it's actually an issue with Zyte API: httpResponseBody is not a raw response, it's returned after content-encoding applied. So, our options:

  1. Keep API as-is, remove header here. Pro: simple; I think that's a correct way to tell Scrapy not to decode. Cons: every client may need to do the same, if they support automatic handling of this header. Not sure if that's a big issue.
  2. Keep API as-is, remove header here, documnent the API behavior propely in docs.zyte.com.
  3. Remove the header in API. Pro: headers would match the content. Cons: headers won't be the same as website returns.
  4. Don't decode responses in API, e.g. don't do gunzip. This is the most "pure" way, as it preserves the most information, but it could lead to some confusion on how to work with httpResponseBody, and on increased complexity for clients.

from these options, I like (2) the most. @akshayphilar any preference from you?

@akshayphilar
Copy link

@kmike agree, the second approach makes sense.

@kmike kmike merged commit 052d0d6 into zyte-api-response May 11, 2022
@BurnzZ BurnzZ deleted the fix-decompression-error branch May 16, 2022 03:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants