Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
204 changes: 204 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,204 @@
# Claude Agent Context - GoFetch MCP Server

This document provides essential context for Claude agents working with the GoFetch MCP (Model Context Protocol) server repository.

## Project Overview

**GoFetch** is a Go implementation of an MCP server that retrieves web content. It's designed as a more efficient alternative to the original Python MCP fetch server, offering:

- **Lower memory usage** and **faster startup/shutdown**
- **Single binary deployment** for enhanced security
- **Better concurrent request handling**
- **Container security** with non-root user and distroless images
- **Multiple transport protocols** (SSE and StreamableHTTP)

## Architecture & Key Components

### Core Packages

1. **`pkg/config/`** - Server configuration management
- Handles command-line flags and environment variables
- Supports transport types: `sse` and `streamable-http`
- Configuration for port, user agent, robots.txt compliance, proxy settings

2. **`pkg/server/`** - MCP server implementation
- Main server logic with transport-specific handlers
- Tool registration and request handling
- Endpoint management for SSE and StreamableHTTP

3. **`pkg/fetcher/`** - HTTP content retrieval
- Handles HTTP requests with proper headers
- Integrates with robots.txt checking
- Content processing and error handling

4. **`pkg/processor/`** - Content processing
- HTML to Markdown conversion using `html-to-markdown`
- Content extraction using `go-readability`
- Text formatting with pagination and truncation

5. **`pkg/robots/`** - Robots.txt compliance
- Fetches and parses robots.txt files
- Validates URL access permissions
- Configurable to ignore robots.txt rules

### Entry Point
- **`cmd/server/main.go`** - Application entry point
- Parses configuration
- Creates and starts the fetch server
- Handles graceful shutdown

## MCP Tool: `fetch`

The server provides a single MCP tool called `fetch` with these parameters:

```json
{
"name": "fetch",
"arguments": {
"url": "https://example.com", // Required: URL to fetch
"max_length": 5000, // Optional: Max characters (default: 5000, max: 1000000)
"start_index": 0, // Optional: Starting character index (default: 0)
"raw": false // Optional: Return raw HTML vs markdown (default: false)
}
}
```

## Transport Protocols

### 1. StreamableHTTP (Default - Recommended)
- **Endpoint**: `http://localhost:8080/mcp`
- **Session Management**: HTTP headers (`Mcp-Session-Id`)
- **Communication**: Single endpoint for both streaming and commands
- **Modern**: Preferred transport for new implementations

### 2. SSE (Legacy)
- **SSE Endpoint**: `http://localhost:8080/sse` (server-to-client)
- **Messages Endpoint**: `http://localhost:8080/messages` (client-to-server)
- **Session Management**: Query parameters (`?sessionid=...`)
- **Legacy**: Maintained for backward compatibility

## Configuration Options

### Command Line Flags
```bash
--transport string # Transport type: sse or streamable-http (default: streamable-http)
--port int # Port number (default: 8080)
--user-agent string # Custom User-Agent string
--ignore-robots-txt # Ignore robots.txt rules
--proxy-url string # Proxy URL for requests
```

### Environment Variables
```bash
TRANSPORT=sse # Override transport type
MCP_PORT=8080 # Override port number
```

## Dependencies & Key Libraries

### Core Dependencies
- **`github.com/modelcontextprotocol/go-sdk`** v1.0.0 - MCP SDK for Go (stable release)
- **`github.com/JohannesKaufmann/html-to-markdown`** - HTML to Markdown conversion
- **`github.com/go-shiori/go-readability`** - Content extraction
- **`golang.org/x/net`** - HTTP client and HTML parsing

### Build & Development
- **Go 1.24+** required
- **Task** for build automation (see `Taskfile.yml`)
- **golangci-lint** for code quality
- **ko** for container image building

## Development Workflow

### Build Commands
```bash
task build # Build the application
task run # Run the application
task test # Run tests
task test-integration # Run integration tests
task lint # Run linting
task fmt # Format code
task clean # Clean build directory
```

### Testing
- Unit tests in each package (`*_test.go` files)
- Integration tests in `test/` directory
- `test/integration-test.sh` - Tests SSE and StreamableHTTP transports with tool calls
- `test/integration-endpoints.sh` - Tests endpoint accessibility and responses
- Requires `yardstick-client` for integration tests (installed via `go install`)

## Container Deployment

### Container Image
- **Registry**: `ghcr.io/stackloklabs/gofetch/server`
- **Security**: Non-root user, distroless image
- **Signing**: Container signing with build provenance

## API Usage Examples

### StreamableHTTP Example
```bash
# Initialize session
SESSION_ID=$(curl -s -D /dev/stderr -X POST "http://localhost:8080/mcp" \
-H "Content-Type: application/json" \
-H "Mcp-Protocol-Version: 2025-06-18" \
-d '{"jsonrpc": "2.0", "id": 1, "method": "initialize", "params": {...}}' \
2>&1 >/dev/null | grep "Mcp-Session-Id:" | cut -d' ' -f2)

# Call fetch tool
curl -X POST "http://localhost:8080/mcp" \
-H "Content-Type: application/json" \
-H "Mcp-Session-Id: $SESSION_ID" \
-d '{"jsonrpc": "2.0", "id": 3, "method": "tools/call", "params": {"name": "fetch", "arguments": {"url": "https://example.com"}}}'
```

## Security Considerations

1. **Robots.txt Compliance**: Respects robots.txt by default (can be disabled)
2. **User-Agent**: Configurable user agent string
3. **Proxy Support**: HTTP proxy configuration
4. **Container Security**: Non-root user, minimal attack surface
5. **Content Limits**: Configurable content length limits

## Error Handling

The server handles various error conditions:
- **HTTP errors**: Non-200 status codes
- **Robots.txt violations**: Access denied by robots.txt
- **Network errors**: Connection timeouts, DNS failures
- **Content processing errors**: HTML parsing failures
- **Configuration errors**: Invalid transport types, port conflicts

## Logging

The server provides comprehensive logging:
- **Startup information**: Transport, port, user agent, endpoints
- **Request logging**: URL fetching, HTTP status codes, content lengths
- **Error logging**: Detailed error messages for debugging
- **Session management**: Session creation and endpoint communication

## Performance Characteristics

- **Memory efficient**: Lower memory usage than Python implementation
- **Fast startup**: Quick server initialization
- **Concurrent handling**: Better request concurrency
- **Content processing**: Efficient HTML to Markdown conversion
- **Caching**: Robots.txt content caching for performance

## Integration Points

### MCP Protocol Compliance
- Full MCP specification compliance (2025-06-18)
- Tool registration and discovery
- Session management with stateful/stateless support
- Error handling and logging
- Proper HTTP method support (GET for SSE, POST for requests, DELETE for session termination)

### External Dependencies
- **Web content**: HTTP fetching with proper headers
- **Robots.txt**: Automatic compliance checking
- **Content extraction**: Readability-based content extraction
- **Markdown conversion**: HTML to Markdown transformation

This context should help Claude agents understand the codebase structure, implementation details, and usage patterns for effective collaboration on the GoFetch MCP server project.
5 changes: 2 additions & 3 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,8 @@ toolchain go1.25.1

require (
github.com/JohannesKaufmann/html-to-markdown v1.6.0
github.com/JohannesKaufmann/html-to-markdown/v2 v2.4.0
github.com/go-shiori/go-readability v0.0.0-20250217085726-9f5bf5ca7612
github.com/modelcontextprotocol/go-sdk v0.5.0
github.com/modelcontextprotocol/go-sdk v1.0.0
golang.org/x/net v0.44.0
)

Expand All @@ -18,7 +17,7 @@ require (
github.com/araddon/dateparse v0.0.0-20210429162001-6b43995a97de // indirect
github.com/go-shiori/dom v0.0.0-20230515143342-73569d674e1c // indirect
github.com/gogs/chardet v0.0.0-20211120154057-b7413eaefb8f // indirect
github.com/google/jsonschema-go v0.2.3 // indirect
github.com/google/jsonschema-go v0.3.0 // indirect
github.com/sebdah/goldie/v2 v2.7.1 // indirect
github.com/sergi/go-diff v1.4.0 // indirect
github.com/yosida95/uritemplate/v3 v3.0.2 // indirect
Expand Down
9 changes: 4 additions & 5 deletions go.sum
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
github.com/JohannesKaufmann/html-to-markdown v1.6.0 h1:04VXMiE50YYfCfLboJCLcgqF5x+rHJnb1ssNmqpLH/k=
github.com/JohannesKaufmann/html-to-markdown v1.6.0/go.mod h1:NUI78lGg/a7vpEJTz/0uOcYMaibytE4BUOQS8k78yPQ=
github.com/JohannesKaufmann/html-to-markdown/v2 v2.4.0/go.mod h1:OLaKh+giepO8j7teevrNwiy/fwf8LXgoc9g7rwaE1jk=
github.com/PuerkitoBio/goquery v1.9.2 h1:4/wZksC3KgkQw7SQgkKotmKljk0M6V8TUvA8Wb4yPeE=
github.com/PuerkitoBio/goquery v1.9.2/go.mod h1:GHPCaP0ODyyxqcNoFGYlAprUFH81NuRPd0GX3Zu2Mvk=
github.com/andybalholm/cascadia v1.3.2/go.mod h1:7gtRlve5FxPPgIgX36uWBX58OdBsSS6lUvCFb+h7KvU=
Expand All @@ -20,14 +19,14 @@ github.com/gogs/chardet v0.0.0-20211120154057-b7413eaefb8f/go.mod h1:Pcatq5tYkCW
github.com/google/go-cmp v0.6.0/go.mod h1:17dUlkBOakJ0+DkrSSNjCkIjxS6bF9zb3elmeNGIjoY=
github.com/google/go-cmp v0.7.0 h1:wk8382ETsv4JYUZwIsn6YpYiWiBsYLSJiTsyBybVuN8=
github.com/google/go-cmp v0.7.0/go.mod h1:pXiqmnSA92OHEEa9HXL2W4E7lf9JzCmGVUdgjX3N/iU=
github.com/google/jsonschema-go v0.2.3 h1:dkP3B96OtZKKFvdrUSaDkL+YDx8Uw9uC4Y+eukpCnmM=
github.com/google/jsonschema-go v0.2.3/go.mod h1:r5quNTdLOYEz95Ru18zA0ydNbBuYoo9tgaYcxEYhJVE=
github.com/google/jsonschema-go v0.3.0 h1:6AH2TxVNtk3IlvkkhjrtbUc4S8AvO0Xii0DxIygDg+Q=
github.com/google/jsonschema-go v0.3.0/go.mod h1:r5quNTdLOYEz95Ru18zA0ydNbBuYoo9tgaYcxEYhJVE=
github.com/kr/pretty v0.1.0/go.mod h1:dAy3ld7l9f0ibDNOQOHHMYYIIbhfbHSm3C4ZsoJORNo=
github.com/kr/pty v1.1.1/go.mod h1:pFQYn66WHrOpPYNljwOMqo10TkYh1fy3cYio2l3bCsQ=
github.com/kr/text v0.1.0/go.mod h1:4Jbv+DJW3UT/LiOwJeYQe1efqtUx/iVham/4vfdArNI=
github.com/mattn/go-runewidth v0.0.10/go.mod h1:RAqKPSqVFrSLVXbA8x7dzmKdmGzieGRCM46jaSJTDAk=
github.com/modelcontextprotocol/go-sdk v0.5.0 h1:WXRHx/4l5LF5MZboeIJYn7PMFCrMNduGGVapYWFgrF8=
github.com/modelcontextprotocol/go-sdk v0.5.0/go.mod h1:degUj7OVKR6JcYbDF+O99Fag2lTSTbamZacbGTRTSGU=
github.com/modelcontextprotocol/go-sdk v1.0.0 h1:Z4MSjLi38bTgLrd/LjSmofqRqyBiVKRyQSJgw8q8V74=
github.com/modelcontextprotocol/go-sdk v1.0.0/go.mod h1:nYtYQroQ2KQiM0/SbyEPUWQ6xs4B95gJjEalc9AQyOs=
github.com/pkg/errors v0.8.1/go.mod h1:bwawxfHBFNV+L2hUp1rHADufV3IMtnDRdf1r5NINEl0=
github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
Expand Down
2 changes: 1 addition & 1 deletion pkg/server/server.go
Original file line number Diff line number Diff line change
Expand Up @@ -174,7 +174,7 @@ func (fs *FetchServer) startSSEServer() error {
// Create SSE handler according to MCP specification
sseHandler := mcp.NewSSEHandler(func(_ *http.Request) *mcp.Server {
return fs.mcpServer
})
}, &mcp.SSEOptions{})

// Handle SSE endpoint
mux.Handle("/sse", sseHandler)
Expand Down
5 changes: 3 additions & 2 deletions test/integration-endpoints.sh
Original file line number Diff line number Diff line change
Expand Up @@ -89,8 +89,9 @@ else
fi

echo "🔎 Checking /mcp (GET)"
MCP_GET_STATUS=$(check_status GET "http://localhost:8081/mcp" "" "" || true)
if [ "$MCP_GET_STATUS" = "200" ] || [ "$MCP_GET_STATUS" = "400" ]; then
# GET without session should return 405 according to MCP spec for stateful servers
MCP_GET_STATUS=$(curl -s -o /dev/null -m 5 -w "%{http_code}" -H 'Accept: text/event-stream' "http://localhost:8081/mcp" 2>/dev/null || echo "000")
if [ "$MCP_GET_STATUS" = "200" ] || [ "$MCP_GET_STATUS" = "400" ] || [ "$MCP_GET_STATUS" = "405" ]; then
echo "✓ /mcp endpoint reachable via GET ($MCP_GET_STATUS)"
else
echo "! /mcp endpoint GET not reachable (status: $MCP_GET_STATUS)"
Expand Down
8 changes: 6 additions & 2 deletions test/integration-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -67,10 +67,12 @@ if docker ps | grep -q gofetch-sse-test; then
fi

echo "🔧 Testing tool calling via SSE..."
if yardstick-client -transport sse -address localhost -port 8080 -action=call-tool -tool=fetch -args='{"url":"https://example.com"}' | grep -q "This domain is for use in illustrative examples in documents"; then
OUTPUT=$(yardstick-client -transport sse -address localhost -port 8080 -action=call-tool -tool=fetch -args='{"url":"https://example.com"}' 2>/dev/null)
if echo "$OUTPUT" | grep -q "This domain is for use in illustrative examples in documents"; then
echo "✅ SSE tool call returned expected output"
else
echo "! SSE tool call did not return expected output"
echo "Output received: $OUTPUT"
exit 1
fi
else
Expand Down Expand Up @@ -117,10 +119,12 @@ if docker ps | grep -q gofetch-http-test; then
fi

echo "🔧 Testing tool calling via streamable-http..."
if yardstick-client -transport streamable-http -address localhost -port 8081 -action=call-tool -tool=fetch -args='{"url":"https://example.com"}' | grep -q "This domain is for use in illustrative examples in documents"; then
OUTPUT=$(yardstick-client -transport streamable-http -address localhost -port 8081 -action=call-tool -tool=fetch -args='{"url":"https://example.com"}' 2>/dev/null)
if echo "$OUTPUT" | grep -q "This domain is for use in illustrative examples in documents"; then
echo "✅ Streamable tool call returned expected output"
else
echo "! Streamable tool call did not return expected output"
echo "Output received: $OUTPUT"
exit 1
fi
else
Expand Down