feat: added 3 Bright Data web scraping tools #4700

Idanvilenski · 2025-06-22T10:24:45Z

Added 3 web scraping tools powered by Bright Data
Structured Data tool - contains 40+ different data sets with auto-select according to the website in the URL
Web Unlocker tool - Unlocks any website with blocking bypass
Search Engine tool - Use Bright Data to search Bing Google or Yandex.

You can find the tools in the tool section on the tools section. (under the "LangChain colomn)

Use as tools connected to the agent in a chat-flow or agent-flow for best results

Thanks

- Add BrightDataWebScraper: Web scraping with markdown/HTML output - Add BrightDataSearchEngine: Multi-engine search (Google, Bing, Yandex) - Add BrightDataStructuredData: 40+ dataset auto-detection and extraction All tools include: - Comprehensive error handling - Configurable timeouts and zones - FlowiseAI integration patterns - Debug logging for troubleshooting

The update is containing the components by Bright Data

- Fixed YouTube video/comments dataset ID conflict - Updated regex patterns for Zara, Yahoo Finance, X/Twitter, Booking.com - Enhanced tool descriptions to include all 40+ supported platforms - Improved pattern detection order for better matching - Added comprehensive platform support documentation

…ata/brightdata-Flowise-component into feature/brightdata-tools

Copilot

Pull Request Overview

Adds two new Bright Data–powered tools and a credential definition to support web scraping and search functionality.

Introduces BrightDataWebScraperTool for page scraping with Bright Data Web Unlocker.
Implements BrightDataSearchEngineTool for paginated search results from Google, Bing, and Yandex.
Defines BrightDataApiCredential for managing Bright Data API tokens.

Reviewed Changes

Copilot reviewed 3 out of 10 changed files in this pull request and generated 4 comments.

File	Description
packages/components/nodes/tools/BrightData/BrightDataWebScraper/BrightDataWebScraper.ts	Web scraper tool implementation and node registration
packages/components/nodes/tools/BrightData/BrightDataSearchEngine/BrightDataSearchEngine.ts	Search engine tool with pagination and error handling
packages/components/credentials/BrightData.credential.ts	Bright Data API credential definition

Comments suppressed due to low confidence (1)

packages/components/nodes/tools/BrightData/BrightDataWebScraper/BrightDataWebScraper.ts:125

[nitpick] The class name contains an underscore; rename to 'BrightDataWebScraperTools' to follow PascalCase naming conventions and maintain consistency.

class BrightDataWebScraper_Tools implements INode {

packages/components/nodes/tools/BrightData/BrightDataWebScraper/BrightDataWebScraper.ts

packages/components/nodes/tools/BrightData/BrightDataSearchEngine/BrightDataSearchEngine.ts

…ove unnecessary try/catch

HenryHengZJ · 2025-06-26T14:05:37Z

thanks! can you remove the redundant folder shared, and allow edit for maintainer?

…ata/brightdata-Flowise-component into feature/brightdata-tools

Idanvilenski · 2025-06-29T08:28:29Z

Hi @HenryHengZJ ,

I've removed the redundant shared folder as requested. Regarding "allow edits for maintainer" - this option is not available for PRs from organization forks (brightdata) due to GitHub's policy.

GitHub only allows this feature for personal account forks. If you need to make edits, I'm happy to implement any changes you suggest through the normal review process. or submit from a personal account (but the tool will have to be under BrightData)

Thanks!

0xi4o · 2025-07-04T09:35:59Z

Hey @Idanvilenski. I tried using the Brightdata tools in my chatflow and I can't get them to work. I keep getting 400 errors during tool call. Let me know if I'm doing something wrong or if I should follow certain steps (so we can document it). I'm using the API key from a free Brightdata account.

Idanvilenski · 2025-07-06T12:37:13Z

Hi, @0xi4o thanks for checking out the component,

I'm sorry to see that you have problems with the component, since I can see that the tools are being called correctly by the agent, I think this is an API issue.

Please make sure you have "Admin permissions" for your API key on the Bright Data website (like in the picture) - let me know if thats not the case.

Also, I noticed that the agent is trying to use the search_engine function (which is used for serp searches on google yandex and bing) to perform the web_unlocker / structured_data actions (extract data from a specific website) - we will look into that from our end.

Please look at the permissions issue and let me know if that was the problem.

Thanks,
Idan

0xi4o · 2025-07-09T09:40:28Z

Hey @Idanvilenski. Unfortunately, I'm still running into the same issues. I used an API key with admin permissions.

I did test out the tools individually, and got different errors for each one:

Search Engine:

Structured Data:

Web Scraper:

Idanvilenski · 2025-07-09T13:47:23Z

Hey @0xi4o , I am sorry about the slow process.

Regarding the Search Engine tool:
I tried it now successfully, I suspect it was one of 2 problems:

Not pressing save before running the flow - I get the same results as you if I don't save the flow before running
In the "Additional Parameters" section - Add a description for the tool (like : "use this tool to perform search on any search engine - the result will be a list of URLs" to help the agent know how to call the tool), works without it but good practice.
That was the result I got for the same prompt:

Regarding the Structured Data tool:
You entered the URL "www.example.com" - note that you need to add a real URL, because we use regex to parse the URL and use the relevant data set for that request, you can try to use "https://www.walmart.com/ip/Apple-MacBook-Air-13-3-inch-Laptop-Space-Gray-M1-Chip-Built-for-Apple-Intelligence-8GB-RAM-256GB-storage/609040889?classType=VARIANT&athbdg=L1800" instead.

Regarding the Web Scraper tool:
I appologize for that, it was a problem we had for a few hours - its fixed now.

Here is an example for a more comprehensive use of search + structured data extraction, note that some times its not working / the correct answer arrives after error message because the agent receives the tool's response after answering in the chat (for me when it happened the agent gave the correct answer without additional prompt after a couple of seconds).
https://github.com/user-attachments/assets/d572bc1b-98e6-4378-b24d-a9b8f5f0a06f

Let me know if everything works!
Thanks,
Idan

0xi4o · 2025-07-11T09:51:26Z

@Idanvilenski

I made sure to save and added "use this tool to perform search on any search engine - the result will be a list of URLs" as the tool description in additional parameters. I'm still getting the same result.

For the structured data tool, Walmart links work fine but not Amazon links.

So I added some logs and seems like the site detection for Amazon is not working correctly.

I'm still getting the same error for web scraper:

Idanvilenski · 2025-07-14T09:10:15Z

Regarding the structured data -
there was a problem only with the amazon product data - fixed now (thank you for pointing this out), please try to pull the changes and try again.
I used this system prompt when testing the structured data, it will improve your results:
"
You are a helpful AI assistant.
Your input is a URL
You will insert this URL into your tools and output the response
Important - your response must ALWAYS contain ALL the details you receive from the tool
"

Regarding the search engine -
Note that the search "PlayStation 5 Pro site:amazon" yields no search results:

Searching for "PlayStation 5 Pro" will work better (I also recommend changing system prompt to display links - "You are a helpful AI assistant.
Your input is a search phrase - you will input it into your tool
As your response you will display the full data that was extracted - INCLUDING LINKS"), result is:

If after trying to change the system prompt and prompt, you could send me the logs for the Search Engine request it will be helpful so I can understand the problem - from our end we tried it multiple times and didn't receive this problem.

Regarding the web scraper -
Please use the same system prompt as detailed for the structured data
I tried the same URL and received a correct answer:

You can send me the logs so I can understand the problem.

Also you will receive better results if you will change the temperature in your LLM of choice to 0.1 instead of 0.9 since the responses and tool handling will be more accurate.

Thank you,
Idan

Idanvilenski · 2025-07-15T13:21:15Z

Hi @0xi4o ,

Did you have a chance to try the tool package following the last comment?

You are welcome to send a video of your usage so I can further examine it because we tried multiple times here and didn't got into any problem (you can send to my email at [email protected] if it is over the size limit).

We are also making a demo that will be released once the package approved.

@HenryHengZJ I saw that you where also in the examining process - please let me know how we can speed the process.

Thanks,
Idan

Idanvilenski · 2025-07-23T11:04:01Z

Hi @HenryHengZJ @0xi4o @jimjimovich @matthias ,

I would appreciate if you can address these comments, from our end we will start to promote the integration once launched.

Thanks,
Idan

toi500 · 2025-07-23T23:39:12Z

@Idanvilenski for what it's worth, I just checked this PR, and none of the tools still work for me either. All tools are using the default parameters and an admin API key.

Idanvilenski · 2025-08-14T15:50:24Z

@HenryHengZJ @0xi4o @toi500 @jimjimovich , Thank you for the thorough checks.

I was not sure what is the problem so I started completely new and created 1 Component that can accept both query or URL, and retrieve the SERP page / scraped content. (and uses Bright Data's API). All the user needs to do is paste his API key and scraping zone. This package is simpler so I am sure it will be easier to check as well.

Thanks,
Idan
Please check the new PR #5075

Idanvilenski added 6 commits June 19, 2025 14:02

Merge branch 'main' into feature/brightdata-tools

8f0e649

The update is containing the components by Bright Data

Merge branch 'main' into feature/brightdata-tools

7b0073f

Merge branch 'feature/brightdata-tools' of https://github.com/brightd…

091823b

…ata/brightdata-Flowise-component into feature/brightdata-tools

HenryHengZJ requested a review from Copilot June 24, 2025 10:15

Copilot AI reviewed Jun 24, 2025

View reviewed changes

Idanvilenski added 3 commits June 24, 2025 13:17

Fix linting issues: remove console statements, fix regex escapes, rem…

8c6c985

…ove unnecessary try/catch

Merge branch 'main' into feature/brightdata-tools

5e9e6e6

Merge branch 'main' into feature/brightdata-tools

32386e0

Idanvilenski added 3 commits June 29, 2025 10:50

Remove redundant shared folder as requested by maintainer

b53f125

Merge branch 'feature/brightdata-tools' of https://github.com/brightd…

f53d769

…ata/brightdata-Flowise-component into feature/brightdata-tools

Merge branch 'main' into feature/brightdata-tools

88c6d64

Merge branch 'main' into feature/brightdata-tools

7a7bb74

Fixed Amazon Data Set issue

80d9ae5

Merge branch 'main' into feature/brightdata-tools

75897f9

Idanvilenski mentioned this pull request Aug 14, 2025

feat: Add BrightData Unlocker tool component #5075

Open

HenryHengZJ closed this Aug 18, 2025

Uh oh!

feat: added 3 Bright Data web scraping tools #4700

feat: added 3 Bright Data web scraping tools #4700

Uh oh!

Conversation

Idanvilenski commented Jun 22, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

HenryHengZJ commented Jun 26, 2025

Uh oh!

Idanvilenski commented Jun 29, 2025

Uh oh!

0xi4o commented Jul 4, 2025

Uh oh!

Idanvilenski commented Jul 6, 2025

Uh oh!

0xi4o commented Jul 9, 2025

Uh oh!

Idanvilenski commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

0xi4o commented Jul 11, 2025

Uh oh!

Idanvilenski commented Jul 14, 2025

Uh oh!

Idanvilenski commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Idanvilenski commented Jul 23, 2025

Uh oh!

toi500 commented Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Idanvilenski commented Aug 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Idanvilenski commented Jul 9, 2025 •

edited

Loading

Idanvilenski commented Jul 15, 2025 •

edited

Loading

toi500 commented Jul 23, 2025 •

edited

Loading