Add MaxDocumentLength and custom UserAgent support #13

nightbloos · 2020-07-15T16:00:44Z

Due to the reason that sometimes URLs can be to some pages/files that are a pretty big one - we want to be able to abort the reading body for those pages.

- added support for og:type - fixed incorrect handling of relative paths - changed to "silent" checks Content-Length in Head requests

…o 2 different functions `GetDocument` and `ParseDocument` (#2)

Due to the reason that we "re-create" new link - only by scheme, host, and path - is present risk to lose some other data from the original link. Previously `/some/path.png?param=value`, was transformed into `http://mydomain.com/some/path.png` Now this issue should be fixed, and the output should be `http://mydomain.com/some/path.png?param=value`

For some ULRs was found that we can't get for strange reason the `og:type` data. One of this ULRs - was youtube links. Was detected that in YouTube they keep metadata in body (and not in head as other normal services). And because previously the criteria for breaking loop of procession of tokens was "we have Title + description + ogImage and we passed head" - we were not able to process all other optional meta after that we pass head. Now we are able to control how much tokens we can process before breaking loop (or if we found required optional fields already)

Add MaxDocumentLength and custom UserAgent support

dbd38f5

nightbloos force-pushed the master branch from a5112e7 to dbd38f5 Compare July 15, 2020 16:41

Alexandr Filioglo added 5 commits August 7, 2020 13:18

Fixes and improvements (#1)

c5a417a

- added support for og:type - fixed incorrect handling of relative paths - changed to "silent" checks Content-Length in Head requests

Added support of response headers & added builder. Split Scrape int…

5044043

…o 2 different functions `GetDocument` and `ParseDocument` (#2)

Added Scrape service interface

afaa757

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add MaxDocumentLength and custom UserAgent support #13

Add MaxDocumentLength and custom UserAgent support #13

Uh oh!

nightbloos commented Jul 15, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add MaxDocumentLength and custom UserAgent support #13

Are you sure you want to change the base?

Add MaxDocumentLength and custom UserAgent support #13

Uh oh!

Conversation

nightbloos commented Jul 15, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants