-
Notifications
You must be signed in to change notification settings - Fork 1k
MCP and Tool-Call Support #981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Summary
This pull request adds Model Context Protocol (MCP) and custom tool support to Stagehand agents, significantly expanding the framework's capabilities beyond browser automation to include third-party service integrations. The changes introduce a flexible architecture where agents can connect to external MCP servers and execute custom tools alongside existing browser automation functions.
The implementation provides three ways to configure integrations: (1) simple string URLs that are internally converted to MCP connections, (2) pre-established connections using the built-in connectToMCPServer
utility, and (3) custom Client objects from the @modelcontextprotocol/sdk
. The agent configuration now accepts integrations
and tools
parameters in the AgentConfig
interface.
Core architectural changes include:
- New MCP utilities:
lib/mcp/connection.ts
providesconnectToMCPServer
for establishing connections, whilelib/mcp/utils.ts
implementsresolveTools
to convert MCP tools into AI SDK's ToolSet format - Agent client updates: Both OpenAI and Anthropic CUA clients now accept and execute custom tools alongside their computer use functionality
- Operator handler refactor: Complete rewrite from schema-based responses to a tool-calling architecture that combines built-in Stagehand operations (act, extract, goto, etc.) with external MCP tools
- Type system enhancements: New error handling with
MCPConnectionError
, JSON Schema to Zod conversion utilities, and updated type definitions
The changes maintain backward compatibility while enabling agents to adapt to complex web environments requiring external API access, search capabilities, database interactions, and other third-party services. Examples demonstrate real-world usage with Exa AI search integration.
Confidence score: 4/5
- This PR introduces complex new functionality but appears well-architected with proper error handling and flexible API design
- Score reflects the significant architectural changes in critical files like the operator handler, though the implementation follows good patterns
- Pay close attention to
lib/handlers/operatorHandler.ts
and the MCP connection utilities for potential integration issues
18 files reviewed, 3 comments
} | ||
case "string": { | ||
if (schema.enum) { | ||
return z.string().refine((val) => schema.enum!.includes(val)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Using .refine()
for enum validation instead of .enum()
- this works but .enum()
would be more direct for string enums
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
@@ -949,6 +940,27 @@ export class Stagehand { | |||
|
|||
return { | |||
execute: async (instructionOrOptions: string | AgentExecuteOptions) => { | |||
const tools = options?.integrations | |||
? await resolveTools(options?.integrations, options?.tools) | |||
: (options?.tools ?? {}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should probably gate tools behind experimental:true
until we roll out api support
clientInstance = client; | ||
} | ||
|
||
let nextCursor: string | undefined = undefined; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does cursor stand for? a pointer to iterate through the listed tools?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah it's for handling pagination of the tools
@@ -42,7 +44,7 @@ export class StagehandAgentHandler { | |||
options.modelName, | |||
options.clientOptions || {}, | |||
options.userProvidedInstructions, | |||
options.experimental, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to keep experimental for context pruning? or now that it's deployed it's obsolete
}), | ||
}, | ||
executeOptions, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
table into separate pr once deployed
e2e tests failing, need to double check |
|
||
// Add the assistant message with tool_use blocks to the history | ||
if (this.experimental) { | ||
compressConversationImages(nextInputItems); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this removes image compression. don't remove, probably good to move out of experimental though
const agent = stagehand.agent({ | ||
provider: "openai", | ||
model: "computer-use-preview", | ||
integrations: [ | ||
`https://mcp.exa.ai/mcp?exaApiKey=${process.env.EXA_API_KEY}`, | ||
], | ||
instructions: `You have access to web search through Exa. Use it to find current information before browsing.`, | ||
options: { | ||
apiKey: process.env.OPENAI_API_KEY, | ||
}, | ||
}); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when I run this I get Error executing tool web_search_exa: MCP error -32602: MCP error -32602: Invalid arguments for tool web_search_exa: [
why
At the moment, Stagehand excels at browser automation but is limited in being able to interact with third-party services. Many production use cases require supporting these integrations. As the scope of browser agents expand, it’s becoming increasingly important that agents adapt to the web rather than the web adapting to them.
what changed
Added support for integrations (MCPs) and custom tools to be passed in to the Stagehand agent. Here are some examples:
The above syntax will internally create the MCP connection. However, if you prefer a self-managed connection, there are two more options available. The first option is to use our built-in util to establish a connection:
The alternative is to establish a connection to the MCP server with your own client. Stagehand takes in a
Client
object from the@modelcontextprotocol/sdk
package. Please refer to their documentation for more information.test plan
Internal agent evals.