|
| 1 | +# Claude Agent Context - GoFetch MCP Server |
| 2 | + |
| 3 | +This document provides essential context for Claude agents working with the GoFetch MCP (Model Context Protocol) server repository. |
| 4 | + |
| 5 | +## Project Overview |
| 6 | + |
| 7 | +**GoFetch** is a Go implementation of an MCP server that retrieves web content. It's designed as a more efficient alternative to the original Python MCP fetch server, offering: |
| 8 | + |
| 9 | +- **Lower memory usage** and **faster startup/shutdown** |
| 10 | +- **Single binary deployment** for enhanced security |
| 11 | +- **Better concurrent request handling** |
| 12 | +- **Container security** with non-root user and distroless images |
| 13 | +- **Multiple transport protocols** (SSE and StreamableHTTP) |
| 14 | + |
| 15 | +## Architecture & Key Components |
| 16 | + |
| 17 | +### Core Packages |
| 18 | + |
| 19 | +1. **`pkg/config/`** - Server configuration management |
| 20 | + - Handles command-line flags and environment variables |
| 21 | + - Supports transport types: `sse` and `streamable-http` |
| 22 | + - Configuration for port, user agent, robots.txt compliance, proxy settings |
| 23 | + |
| 24 | +2. **`pkg/server/`** - MCP server implementation |
| 25 | + - Main server logic with transport-specific handlers |
| 26 | + - Tool registration and request handling |
| 27 | + - Endpoint management for SSE and StreamableHTTP |
| 28 | + |
| 29 | +3. **`pkg/fetcher/`** - HTTP content retrieval |
| 30 | + - Handles HTTP requests with proper headers |
| 31 | + - Integrates with robots.txt checking |
| 32 | + - Content processing and error handling |
| 33 | + |
| 34 | +4. **`pkg/processor/`** - Content processing |
| 35 | + - HTML to Markdown conversion using `html-to-markdown` |
| 36 | + - Content extraction using `go-readability` |
| 37 | + - Text formatting with pagination and truncation |
| 38 | + |
| 39 | +5. **`pkg/robots/`** - Robots.txt compliance |
| 40 | + - Fetches and parses robots.txt files |
| 41 | + - Validates URL access permissions |
| 42 | + - Configurable to ignore robots.txt rules |
| 43 | + |
| 44 | +### Entry Point |
| 45 | +- **`cmd/server/main.go`** - Application entry point |
| 46 | + - Parses configuration |
| 47 | + - Creates and starts the fetch server |
| 48 | + - Handles graceful shutdown |
| 49 | + |
| 50 | +## MCP Tool: `fetch` |
| 51 | + |
| 52 | +The server provides a single MCP tool called `fetch` with these parameters: |
| 53 | + |
| 54 | +```json |
| 55 | +{ |
| 56 | + "name": "fetch", |
| 57 | + "arguments": { |
| 58 | + "url": "https://example.com", // Required: URL to fetch |
| 59 | + "max_length": 5000, // Optional: Max characters (default: 5000, max: 1000000) |
| 60 | + "start_index": 0, // Optional: Starting character index (default: 0) |
| 61 | + "raw": false // Optional: Return raw HTML vs markdown (default: false) |
| 62 | + } |
| 63 | +} |
| 64 | +``` |
| 65 | + |
| 66 | +## Transport Protocols |
| 67 | + |
| 68 | +### 1. StreamableHTTP (Default - Recommended) |
| 69 | +- **Endpoint**: `http://localhost:8080/mcp` |
| 70 | +- **Session Management**: HTTP headers (`Mcp-Session-Id`) |
| 71 | +- **Communication**: Single endpoint for both streaming and commands |
| 72 | +- **Modern**: Preferred transport for new implementations |
| 73 | + |
| 74 | +### 2. SSE (Legacy) |
| 75 | +- **SSE Endpoint**: `http://localhost:8080/sse` (server-to-client) |
| 76 | +- **Messages Endpoint**: `http://localhost:8080/messages` (client-to-server) |
| 77 | +- **Session Management**: Query parameters (`?sessionid=...`) |
| 78 | +- **Legacy**: Maintained for backward compatibility |
| 79 | + |
| 80 | +## Configuration Options |
| 81 | + |
| 82 | +### Command Line Flags |
| 83 | +```bash |
| 84 | +--transport string # Transport type: sse or streamable-http (default: streamable-http) |
| 85 | +--port int # Port number (default: 8080) |
| 86 | +--user-agent string # Custom User-Agent string |
| 87 | +--ignore-robots-txt # Ignore robots.txt rules |
| 88 | +--proxy-url string # Proxy URL for requests |
| 89 | +``` |
| 90 | + |
| 91 | +### Environment Variables |
| 92 | +```bash |
| 93 | +TRANSPORT=sse # Override transport type |
| 94 | +MCP_PORT=8080 # Override port number |
| 95 | +``` |
| 96 | + |
| 97 | +## Dependencies & Key Libraries |
| 98 | + |
| 99 | +### Core Dependencies |
| 100 | +- **`github.com/modelcontextprotocol/go-sdk`** v1.0.0 - MCP SDK for Go (stable release) |
| 101 | +- **`github.com/JohannesKaufmann/html-to-markdown`** - HTML to Markdown conversion |
| 102 | +- **`github.com/go-shiori/go-readability`** - Content extraction |
| 103 | +- **`golang.org/x/net`** - HTTP client and HTML parsing |
| 104 | + |
| 105 | +### Build & Development |
| 106 | +- **Go 1.24+** required |
| 107 | +- **Task** for build automation (see `Taskfile.yml`) |
| 108 | +- **golangci-lint** for code quality |
| 109 | +- **ko** for container image building |
| 110 | + |
| 111 | +## Development Workflow |
| 112 | + |
| 113 | +### Build Commands |
| 114 | +```bash |
| 115 | +task build # Build the application |
| 116 | +task run # Run the application |
| 117 | +task test # Run tests |
| 118 | +task test-integration # Run integration tests |
| 119 | +task lint # Run linting |
| 120 | +task fmt # Format code |
| 121 | +task clean # Clean build directory |
| 122 | +``` |
| 123 | + |
| 124 | +### Testing |
| 125 | +- Unit tests in each package (`*_test.go` files) |
| 126 | +- Integration tests in `test/` directory |
| 127 | + - `test/integration-test.sh` - Tests SSE and StreamableHTTP transports with tool calls |
| 128 | + - `test/integration-endpoints.sh` - Tests endpoint accessibility and responses |
| 129 | +- Requires `yardstick-client` for integration tests (installed via `go install`) |
| 130 | + |
| 131 | +## Container Deployment |
| 132 | + |
| 133 | +### Container Image |
| 134 | +- **Registry**: `ghcr.io/stackloklabs/gofetch/server` |
| 135 | +- **Security**: Non-root user, distroless image |
| 136 | +- **Signing**: Container signing with build provenance |
| 137 | + |
| 138 | +## API Usage Examples |
| 139 | + |
| 140 | +### StreamableHTTP Example |
| 141 | +```bash |
| 142 | +# Initialize session |
| 143 | +SESSION_ID=$(curl -s -D /dev/stderr -X POST "http://localhost:8080/mcp" \ |
| 144 | + -H "Content-Type: application/json" \ |
| 145 | + -H "Mcp-Protocol-Version: 2025-06-18" \ |
| 146 | + -d '{"jsonrpc": "2.0", "id": 1, "method": "initialize", "params": {...}}' \ |
| 147 | + 2>&1 >/dev/null | grep "Mcp-Session-Id:" | cut -d' ' -f2) |
| 148 | + |
| 149 | +# Call fetch tool |
| 150 | +curl -X POST "http://localhost:8080/mcp" \ |
| 151 | + -H "Content-Type: application/json" \ |
| 152 | + -H "Mcp-Session-Id: $SESSION_ID" \ |
| 153 | + -d '{"jsonrpc": "2.0", "id": 3, "method": "tools/call", "params": {"name": "fetch", "arguments": {"url": "https://example.com"}}}' |
| 154 | +``` |
| 155 | + |
| 156 | +## Security Considerations |
| 157 | + |
| 158 | +1. **Robots.txt Compliance**: Respects robots.txt by default (can be disabled) |
| 159 | +2. **User-Agent**: Configurable user agent string |
| 160 | +3. **Proxy Support**: HTTP proxy configuration |
| 161 | +4. **Container Security**: Non-root user, minimal attack surface |
| 162 | +5. **Content Limits**: Configurable content length limits |
| 163 | + |
| 164 | +## Error Handling |
| 165 | + |
| 166 | +The server handles various error conditions: |
| 167 | +- **HTTP errors**: Non-200 status codes |
| 168 | +- **Robots.txt violations**: Access denied by robots.txt |
| 169 | +- **Network errors**: Connection timeouts, DNS failures |
| 170 | +- **Content processing errors**: HTML parsing failures |
| 171 | +- **Configuration errors**: Invalid transport types, port conflicts |
| 172 | + |
| 173 | +## Logging |
| 174 | + |
| 175 | +The server provides comprehensive logging: |
| 176 | +- **Startup information**: Transport, port, user agent, endpoints |
| 177 | +- **Request logging**: URL fetching, HTTP status codes, content lengths |
| 178 | +- **Error logging**: Detailed error messages for debugging |
| 179 | +- **Session management**: Session creation and endpoint communication |
| 180 | + |
| 181 | +## Performance Characteristics |
| 182 | + |
| 183 | +- **Memory efficient**: Lower memory usage than Python implementation |
| 184 | +- **Fast startup**: Quick server initialization |
| 185 | +- **Concurrent handling**: Better request concurrency |
| 186 | +- **Content processing**: Efficient HTML to Markdown conversion |
| 187 | +- **Caching**: Robots.txt content caching for performance |
| 188 | + |
| 189 | +## Integration Points |
| 190 | + |
| 191 | +### MCP Protocol Compliance |
| 192 | +- Full MCP specification compliance (2025-06-18) |
| 193 | +- Tool registration and discovery |
| 194 | +- Session management with stateful/stateless support |
| 195 | +- Error handling and logging |
| 196 | +- Proper HTTP method support (GET for SSE, POST for requests, DELETE for session termination) |
| 197 | + |
| 198 | +### External Dependencies |
| 199 | +- **Web content**: HTTP fetching with proper headers |
| 200 | +- **Robots.txt**: Automatic compliance checking |
| 201 | +- **Content extraction**: Readability-based content extraction |
| 202 | +- **Markdown conversion**: HTML to Markdown transformation |
| 203 | + |
| 204 | +This context should help Claude agents understand the codebase structure, implementation details, and usage patterns for effective collaboration on the GoFetch MCP server project. |
0 commit comments