-
-
Notifications
You must be signed in to change notification settings - Fork 23.1k
feat: added 3 Bright Data web scraping tools #4700
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Add BrightDataWebScraper: Web scraping with markdown/HTML output - Add BrightDataSearchEngine: Multi-engine search (Google, Bing, Yandex) - Add BrightDataStructuredData: 40+ dataset auto-detection and extraction All tools include: - Comprehensive error handling - Configurable timeouts and zones - FlowiseAI integration patterns - Debug logging for troubleshooting
The update is containing the components by Bright Data
- Fixed YouTube video/comments dataset ID conflict - Updated regex patterns for Zara, Yahoo Finance, X/Twitter, Booking.com - Enhanced tool descriptions to include all 40+ supported platforms - Improved pattern detection order for better matching - Added comprehensive platform support documentation
- Fixed YouTube video/comments dataset ID conflict - Updated regex patterns for Zara, Yahoo Finance, X/Twitter, Booking.com - Enhanced tool descriptions to include all 40+ supported platforms - Improved pattern detection order for better matching - Added comprehensive platform support documentation
…ata/brightdata-Flowise-component into feature/brightdata-tools
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Adds two new Bright Data–powered tools and a credential definition to support web scraping and search functionality.
- Introduces
BrightDataWebScraperToolfor page scraping with Bright Data Web Unlocker. - Implements
BrightDataSearchEngineToolfor paginated search results from Google, Bing, and Yandex. - Defines
BrightDataApiCredentialfor managing Bright Data API tokens.
Reviewed Changes
Copilot reviewed 3 out of 10 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| packages/components/nodes/tools/BrightData/BrightDataWebScraper/BrightDataWebScraper.ts | Web scraper tool implementation and node registration |
| packages/components/nodes/tools/BrightData/BrightDataSearchEngine/BrightDataSearchEngine.ts | Search engine tool with pagination and error handling |
| packages/components/credentials/BrightData.credential.ts | Bright Data API credential definition |
Comments suppressed due to low confidence (1)
packages/components/nodes/tools/BrightData/BrightDataWebScraper/BrightDataWebScraper.ts:125
- [nitpick] The class name contains an underscore; rename to 'BrightDataWebScraperTools' to follow PascalCase naming conventions and maintain consistency.
class BrightDataWebScraper_Tools implements INode {
packages/components/nodes/tools/BrightData/BrightDataWebScraper/BrightDataWebScraper.ts
Outdated
Show resolved
Hide resolved
packages/components/nodes/tools/BrightData/BrightDataWebScraper/BrightDataWebScraper.ts
Outdated
Show resolved
Hide resolved
packages/components/nodes/tools/BrightData/BrightDataSearchEngine/BrightDataSearchEngine.ts
Show resolved
Hide resolved
packages/components/nodes/tools/BrightData/BrightDataSearchEngine/BrightDataSearchEngine.ts
Outdated
Show resolved
Hide resolved
|
thanks! can you remove the redundant folder |
|
Hi @HenryHengZJ , I've removed the redundant shared folder as requested. Regarding "allow edits for maintainer" - this option is not available for PRs from organization forks (brightdata) due to GitHub's policy. GitHub only allows this feature for personal account forks. If you need to make edits, I'm happy to implement any changes you suggest through the normal review process. or submit from a personal account (but the tool will have to be under BrightData) Thanks! |
|
Hey @Idanvilenski. I tried using the Brightdata tools in my chatflow and I can't get them to work. I keep getting 400 errors during tool call. Let me know if I'm doing something wrong or if I should follow certain steps (so we can document it). I'm using the API key from a free Brightdata account. |
|
Hi, @0xi4o thanks for checking out the component, I'm sorry to see that you have problems with the component, since I can see that the tools are being called correctly by the agent, I think this is an API issue. Please make sure you have "Admin permissions" for your API key on the Bright Data website (like in the picture) - let me know if thats not the case. Also, I noticed that the agent is trying to use the search_engine function (which is used for serp searches on google yandex and bing) to perform the web_unlocker / structured_data actions (extract data from a specific website) - we will look into that from our end. Please look at the permissions issue and let me know if that was the problem. Thanks, |
|
Hey @Idanvilenski. Unfortunately, I'm still running into the same issues. I used an API key with admin permissions. I did test out the tools individually, and got different errors for each one: |
|
Hey @0xi4o , I am sorry about the slow process. Regarding the Search Engine tool:
Regarding the Structured Data tool: Regarding the Web Scraper tool: Here is an example for a more comprehensive use of search + structured data extraction, note that some times its not working / the correct answer arrives after error message because the agent receives the tool's response after answering in the chat (for me when it happened the agent gave the correct answer without additional prompt after a couple of seconds). Let me know if everything works! |
|
Hi @0xi4o , Did you have a chance to try the tool package following the last comment? You are welcome to send a video of your usage so I can further examine it because we tried multiple times here and didn't got into any problem (you can send to my email at [email protected] if it is over the size limit). We are also making a demo that will be released once the package approved. @HenryHengZJ I saw that you where also in the examining process - please let me know how we can speed the process. Thanks, |
|
Hi @HenryHengZJ @0xi4o @jimjimovich @matthias , I would appreciate if you can address these comments, from our end we will start to promote the integration once launched. Thanks, |
|
@Idanvilenski for what it's worth, I just checked this PR, and none of the tools still work for me either. All tools are using the default parameters and an admin API key.
|
|
@HenryHengZJ @0xi4o @toi500 @jimjimovich , Thank you for the thorough checks. I was not sure what is the problem so I started completely new and created 1 Component that can accept both query or URL, and retrieve the SERP page / scraped content. (and uses Bright Data's API). All the user needs to do is paste his API key and scraping zone. This package is simpler so I am sure it will be easier to check as well. Thanks, |























Added 3 web scraping tools powered by Bright Data
Structured Data tool - contains 40+ different data sets with auto-select according to the website in the URL
Web Unlocker tool - Unlocks any website with blocking bypass
Search Engine tool - Use Bright Data to search Bing Google or Yandex.
You can find the tools in the tool section on the tools section. (under the "LangChain colomn)

Use as tools connected to the agent in a chat-flow or agent-flow for best results

Thanks