Skip to content

Conversation

@Idanvilenski
Copy link

Added 3 web scraping tools powered by Bright Data
Structured Data tool - contains 40+ different data sets with auto-select according to the website in the URL
Web Unlocker tool - Unlocks any website with blocking bypass
Search Engine tool - Use Bright Data to search Bing Google or Yandex.

You can find the tools in the tool section on the tools section. (under the "LangChain colomn)
image (2)

Use as tools connected to the agent in a chat-flow or agent-flow for best results
image (1)

Thanks

- Add BrightDataWebScraper: Web scraping with markdown/HTML output
- Add BrightDataSearchEngine: Multi-engine search (Google, Bing, Yandex)
- Add BrightDataStructuredData: 40+ dataset auto-detection and extraction

All tools include:
- Comprehensive error handling
- Configurable timeouts and zones
- FlowiseAI integration patterns
- Debug logging for troubleshooting
The update is containing the components by Bright Data
- Fixed YouTube video/comments dataset ID conflict
- Updated regex patterns for Zara, Yahoo Finance, X/Twitter, Booking.com
- Enhanced tool descriptions to include all 40+ supported platforms
- Improved pattern detection order for better matching
- Added comprehensive platform support documentation
- Fixed YouTube video/comments dataset ID conflict
- Updated regex patterns for Zara, Yahoo Finance, X/Twitter, Booking.com
- Enhanced tool descriptions to include all 40+ supported platforms
- Improved pattern detection order for better matching
- Added comprehensive platform support documentation
@HenryHengZJ HenryHengZJ requested a review from Copilot June 24, 2025 10:15
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds two new Bright Data–powered tools and a credential definition to support web scraping and search functionality.

  • Introduces BrightDataWebScraperTool for page scraping with Bright Data Web Unlocker.
  • Implements BrightDataSearchEngineTool for paginated search results from Google, Bing, and Yandex.
  • Defines BrightDataApiCredential for managing Bright Data API tokens.

Reviewed Changes

Copilot reviewed 3 out of 10 changed files in this pull request and generated 4 comments.

File Description
packages/components/nodes/tools/BrightData/BrightDataWebScraper/BrightDataWebScraper.ts Web scraper tool implementation and node registration
packages/components/nodes/tools/BrightData/BrightDataSearchEngine/BrightDataSearchEngine.ts Search engine tool with pagination and error handling
packages/components/credentials/BrightData.credential.ts Bright Data API credential definition
Comments suppressed due to low confidence (1)

packages/components/nodes/tools/BrightData/BrightDataWebScraper/BrightDataWebScraper.ts:125

  • [nitpick] The class name contains an underscore; rename to 'BrightDataWebScraperTools' to follow PascalCase naming conventions and maintain consistency.
class BrightDataWebScraper_Tools implements INode {

@HenryHengZJ
Copy link
Contributor

thanks! can you remove the redundant folder shared, and allow edit for maintainer?

@Idanvilenski
Copy link
Author

Hi @HenryHengZJ ,

I've removed the redundant shared folder as requested. Regarding "allow edits for maintainer" - this option is not available for PRs from organization forks (brightdata) due to GitHub's policy.

GitHub only allows this feature for personal account forks. If you need to make edits, I'm happy to implement any changes you suggest through the normal review process. or submit from a personal account (but the tool will have to be under BrightData)

Thanks!

@0xi4o
Copy link
Contributor

0xi4o commented Jul 4, 2025

Hey @Idanvilenski. I tried using the Brightdata tools in my chatflow and I can't get them to work. I keep getting 400 errors during tool call. Let me know if I'm doing something wrong or if I should follow certain steps (so we can document it). I'm using the API key from a free Brightdata account.

Flowise-Build-AI-Agents-Visually-07-04-2025_03_03_PM
Flowise-Build-AI-Agents-Visually-07-04-2025_03_02_PM
Flowise-Build-AI-Agents-Visually-07-04-2025_03_01_PM

@Idanvilenski
Copy link
Author

Hi, @0xi4o thanks for checking out the component,

I'm sorry to see that you have problems with the component, since I can see that the tools are being called correctly by the agent, I think this is an API issue.

Please make sure you have "Admin permissions" for your API key on the Bright Data website (like in the picture) - let me know if thats not the case.
image

Also, I noticed that the agent is trying to use the search_engine function (which is used for serp searches on google yandex and bing) to perform the web_unlocker / structured_data actions (extract data from a specific website) - we will look into that from our end.

Please look at the permissions issue and let me know if that was the problem.

Thanks,
Idan

@0xi4o
Copy link
Contributor

0xi4o commented Jul 9, 2025

Hey @Idanvilenski. Unfortunately, I'm still running into the same issues. I used an API key with admin permissions.

Bright-Data-Web-Data-Platform

I did test out the tools individually, and got different errors for each one:

Search Engine:
Flowise-Build-AI-Agents-Visually-07-09-2025_03_07_PM

Structured Data:
Flowise-Build-AI-Agents-Visually-07-09-2025_03_00_PM

Web Scraper:
Flowise-Build-AI-Agents-Visually-07-09-2025_03_04_PM

@Idanvilenski
Copy link
Author

Idanvilenski commented Jul 9, 2025

Hey @0xi4o , I am sorry about the slow process.

Regarding the Search Engine tool:
I tried it now successfully, I suspect it was one of 2 problems:

  • Not pressing save before running the flow - I get the same results as you if I don't save the flow before running
  • In the "Additional Parameters" section - Add a description for the tool (like : "use this tool to perform search on any search engine - the result will be a list of URLs" to help the agent know how to call the tool), works without it but good practice.
    That was the result I got for the same prompt:
    image

Regarding the Structured Data tool:
You entered the URL "www.example.com" - note that you need to add a real URL, because we use regex to parse the URL and use the relevant data set for that request, you can try to use "https://www.walmart.com/ip/Apple-MacBook-Air-13-3-inch-Laptop-Space-Gray-M1-Chip-Built-for-Apple-Intelligence-8GB-RAM-256GB-storage/609040889?classType=VARIANT&athbdg=L1800" instead.

Regarding the Web Scraper tool:
I appologize for that, it was a problem we had for a few hours - its fixed now.

Here is an example for a more comprehensive use of search + structured data extraction, note that some times its not working / the correct answer arrives after error message because the agent receives the tool's response after answering in the chat (for me when it happened the agent gave the correct answer without additional prompt after a couple of seconds).
https://github.com/user-attachments/assets/d572bc1b-98e6-4378-b24d-a9b8f5f0a06f

Let me know if everything works!
Thanks,
Idan

@0xi4o
Copy link
Contributor

0xi4o commented Jul 11, 2025

@Idanvilenski

I made sure to save and added "use this tool to perform search on any search engine - the result will be a list of URLs" as the tool description in additional parameters. I'm still getting the same result.
Flowise-Build-AI-Agents-Visually-07-10-2025_03_34_PM
Flowise-Build-AI-Agents-Visually-07-10-2025_03_10_PM
Flowise-Build-AI-Agents-Visually-07-10-2025_03_09_PM

For the structured data tool, Walmart links work fine but not Amazon links.
Flowise-Build-AI-Agents-Visually-07-10-2025_03_25_PM
Flowise-Build-AI-Agents-Visually-07-10-2025_03_23_PM

So I added some logs and seems like the site detection for Amazon is not working correctly.

walmart-link-detection amazon-link-detection

I'm still getting the same error for web scraper:
Flowise-Build-AI-Agents-Visually-07-09-2025_03_04_PM

@Idanvilenski
Copy link
Author

Regarding the structured data -
there was a problem only with the amazon product data - fixed now (thank you for pointing this out), please try to pull the changes and try again.
I used this system prompt when testing the structured data, it will improve your results:
"
You are a helpful AI assistant.
Your input is a URL
You will insert this URL into your tools and output the response
Important - your response must ALWAYS contain ALL the details you receive from the tool
"

Regarding the search engine -
Note that the search "PlayStation 5 Pro site:amazon" yields no search results:

image

Searching for "PlayStation 5 Pro" will work better (I also recommend changing system prompt to display links - "You are a helpful AI assistant.
Your input is a search phrase - you will input it into your tool
As your response you will display the full data that was extracted - INCLUDING LINKS"), result is:

image

If after trying to change the system prompt and prompt, you could send me the logs for the Search Engine request it will be helpful so I can understand the problem - from our end we tried it multiple times and didn't receive this problem.

Regarding the web scraper -
Please use the same system prompt as detailed for the structured data
I tried the same URL and received a correct answer:

image image

You can send me the logs so I can understand the problem.

Also you will receive better results if you will change the temperature in your LLM of choice to 0.1 instead of 0.9 since the responses and tool handling will be more accurate.

Thank you,
Idan

@Idanvilenski
Copy link
Author

Idanvilenski commented Jul 15, 2025

Hi @0xi4o ,

Did you have a chance to try the tool package following the last comment?

You are welcome to send a video of your usage so I can further examine it because we tried multiple times here and didn't got into any problem (you can send to my email at [email protected] if it is over the size limit).

We are also making a demo that will be released once the package approved.

@HenryHengZJ I saw that you where also in the examining process - please let me know how we can speed the process.

Thanks,
Idan

@Idanvilenski
Copy link
Author

Hi @HenryHengZJ @0xi4o @jimjimovich @matthias ,

I would appreciate if you can address these comments, from our end we will start to promote the integration once launched.

Thanks,
Idan

@toi500
Copy link
Contributor

toi500 commented Jul 23, 2025

@Idanvilenski for what it's worth, I just checked this PR, and none of the tools still work for me either. All tools are using the default parameters and an admin API key.

image image image

@Idanvilenski
Copy link
Author

@HenryHengZJ @0xi4o @toi500 @jimjimovich , Thank you for the thorough checks.

I was not sure what is the problem so I started completely new and created 1 Component that can accept both query or URL, and retrieve the SERP page / scraped content. (and uses Bright Data's API). All the user needs to do is paste his API key and scraping zone. This package is simpler so I am sure it will be easier to check as well.

Thanks,
Idan
Please check the new PR #5075

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants