Skip to content

Conversation

@yuchen-db
Copy link
Collaborator

@yuchen-db yuchen-db commented Aug 31, 2025

Added a new flag --receive.metric-name-shards that creates tenant sharding based on metric names.

How it works:

  • When flag is set to N (e.g. 64), it hashes metric names and creates N tenants
  • Router mode: Changes tenant from my-tenant to hashring-name-35 (where 35 = hash(metric_name) % 64)
  • Ingestor mode: Only queries TSDBs that match the metric's shard instead of all TSDBs

Benefits:

  • Creates finer-grained tenant separation based on metric names
  • Reduces query fan-out in ingestors (no need for cuckoo filter)

Note that routing is not changed at all. It is more complicated and we will have another PR to address it.

@yuchen-db yuchen-db force-pushed the yuchen-db/name-tenant branch from fac201a to 916a2ed Compare August 31, 2025 14:59
@yuchen-db yuchen-db changed the title metric name based tenant sharding metric name tenant Sep 2, 2025
@yuchen-db yuchen-db requested review from hczhu-db and jnyi September 2, 2025 02:25
Copy link
Collaborator

@hczhu-db hczhu-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for starting the first V2 PR. There are still some implementation decisions to make and we need to meet and discuss.
At the high level, let's separate the write path and read path in different PRs.

Comment on lines +1210 to +1211
cmd.Flag("receive.metric-name-shards", "When set and greater than 0, enables metric name sharding. In RouterOnly mode, modifies tenant header to {hashring-name}-{hash(metric_name) % shards}. In IngestorOnly/RouterIngestor mode, optimizes query fan-out by only querying TSDBs matching the metric's shard. Disables cuckoo filter when enabled.").
Default("0").IntVar(&rc.metricNameShards)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use a configmap for the sharding schema instead of a command flag. Thanos Router needs to support the schema defined in the V2 design doc. It has special metric groups besides the number of shards for each metric scope.

  scopeName: "az-eastus2",
  shards: 20,
  // The metrics that are heavily used by alert rules or have super high cardinality
  // can be in special groups to avoid skewed data partitions.
  specialMetricGroups: [
    {
      name: "kube-metrics",
      metrics: [
        "container_cpu_usage_seconds_total",
        "container_memory_working_set_bytes"
      ]
    },
    {
      name: "rpc-metrics",
      metrics: ["rpc_client_requests_total"]
    }
  ]
}

=== shard name calculation ===

if metric_name in any special metric group then
  shard_name = the speical metric group name
else
  shard_name = hash(metric_name) % shards

hash := hasher.Sum64()

shard := hash % uint64(h.metricNameShards)
modifiedTenant := fmt.Sprintf("%s-%d", hashringName, shard)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What semantics will hashringName have in our context?

@databricks databricks deleted a comment from hczhu-db Sep 2, 2025
@yuchen-db
Copy link
Collaborator Author

let's discuss in meeting. Deleted the screenshot in this public repo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants