Skip to content

Allow X or X% hosts to have bad values #29

@shlomi-noach

Description

@shlomi-noach

Say we're measuring replication lag and we have 10 replicas. For some apps, it would be OK if one lags. Maybe two. And it would be better let them lag and have ongoing operations, as opposed to stalling everything.

The suggestion is to have a per-cluster config that indicates how many hosts can be down. This would either be an absolute number, or a ratio/percentile. For smaller setups it makes more sense to have an absolute number (e.g. "1 host can be lagging"). For larger setups it may be better to work by percentile ("up to 5% of hosts may be lagging").

I'm unsure whether to support both.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions