Skip to content

Conversation

ferenci84
Copy link
Contributor

@ferenci84 ferenci84 commented Aug 29, 2025

Description

Resolves #7479

What Happens Now

When you use OpenRouter with Anthropic models (like anthropic/claude-sonnet-4, anthropic/claude-opus-4, etc.) and caching is enabled:

  1. System Message Caching: If cacheSystemMessage is true, the system message will be sent with cache_control: { type: "ephemeral" }

  2. Conversation Caching: If cacheConversation is true, the last two user messages in the conversation will be cached

  3. Automatic Processing: OpenRouter automatically handles the caching headers internally - no additional headers needed

Example Request Body

When sending a request to OpenRouter with an Anthropic model, the modified body will look like:

{
  "model": "anthropic/claude-sonnet-4",
  "messages": [
    {
      "role": "system",
      "content": [
        {
          "type": "text",
          "text": "You are a helpful assistant",
          "cache_control": { "type": "ephemeral" }
        }
      ]
    },
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Previous context...",
          "cache_control": { "type": "ephemeral" }
        }
      ]
    },
    {
      "role": "assistant",
      "content": "Response..."
    },
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Current question",
          "cache_control": { "type": "ephemeral" }
        }
      ]
    }
  ]
}

Non-Anthropic Models

When using non-Anthropic models (like GPT-4, Llama, etc.) through OpenRouter, the caching fields are NOT added, ensuring compatibility with all model providers.

Checklist

  • I've read the contributing guide
  • The relevant docs, if any, have been updated or created
  • The relevant tests, if any, have been updated or created

Tests

OpenRouter.vitest.ts is added.


Summary by cubic

Add Anthropic caching support to OpenRouter for Claude models. When caching is enabled, we annotate the system and last two user messages with cache_control so OpenRouter handles caching; other models are unchanged.

  • New Features
    • Detect Anthropic/Claude models and modify the chat body when cacheBehavior is set.
    • Add cache_control: ephemeral to the system message (cacheSystemMessage) and the last two user messages (cacheConversation); for array content, tag the last text part.
    • Leave non-Anthropic models untouched; added vitest coverage for these cases.

@ferenci84 ferenci84 marked this pull request as ready for review August 29, 2025 21:29
@ferenci84 ferenci84 requested a review from a team as a code owner August 29, 2025 21:29
@ferenci84 ferenci84 requested review from tingwai and removed request for a team August 29, 2025 21:29
@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Aug 29, 2025
Copy link
Collaborator

@RomneyDa RomneyDa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ferenci84 this will be awesome! some nitpicks, the completion option is the only important thing

if (!model) return false;
const modelLower = model.toLowerCase();
return (
modelLower.includes("claude") || modelLower.includes("anthropic/claude")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is redundant, since if includes anthropic/claude will always include claude

}

const shouldCacheConversation = this.cacheBehavior.cacheConversation;
const shouldCacheSystemMessage = this.cacheBehavior.cacheSystemMessage;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also handle the deprecated but relevant completion option: completionOptions.promptCaching
See relevant hotfix here: #7652

@github-project-automation github-project-automation bot moved this from Todo to In Progress in Issues and PRs Sep 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:L This PR changes 100-499 lines, ignoring generated files.
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

No Prompt Caching support for OpenRouter
2 participants