Skip to content

Conversation

PSUdaemon
Copy link
Contributor

@PSUdaemon PSUdaemon commented Oct 15, 2025

Configurable Hash Algorithm for Consistent Hash Parent Selection

Overview

Makes the hash algorithm used in consistent hash parent selection configurable at startup, adding two faster alternatives to the existing SipHash-2-4 implementation.

Motivation

The current implementation hard-codes SipHash-2-4 for consistent hash parent selection. While secure and DoS-resistant, it may not be optimal for all deployments. This change allows operators to choose faster algorithms based on their specific performance requirements and threat models.

Changes

New Hash Implementations (Zero Dependencies)

  • SipHash Template (include/tscore/HashSip.h)

    • Template-based implementation: ATSHashSip<c_rounds, d_rounds>
    • SipHash-1-3 (type alias ATSHash64Sip13): ~50% faster than SipHash-2-4
    • SipHash-2-4 (type alias ATSHash64Sip24): Existing algorithm, now template-based
    • License: CC0 (public domain, ASF Category A)
    • Zero code duplication between variants
    • Header-only template with compile-time optimization
  • HashWyhash (include/tscore/HashWyhash.h, src/tscore/HashWyhash.cc)

    • Wyhash v4.1: ~3-5x faster than SipHash-2-4
    • License: Unlicense (public domain, ASF Category A)
    • DoS-resistant, processes 32-byte blocks
    • Uses seed=0 for deterministic behavior

Configuration Infrastructure

  • New Configuration Variable:
    proxy.config.http.parent_proxy.consistent_hash_algorithm: siphash24
  • Values: siphash24 (default), siphash13, wyhash
  • Only affects round_robin=consistent_hash in parent.config
  • Requires restart to take effect
  • Implementation:
    • Added ParentHashAlgorithm enum to ParentSelection.h
    • Factory pattern in ParentConsistentHash::createHashInstance()
    • Config reading in ParentRecord::Init()
    • Registered in RecordsConfig.cc

Testing

745 assertions, all passing:

  • Unit Tests (44 assertions)
    • test_HashAlgorithms.cc: Comprehensive tests for HashSip13 and Wyhash
    • Tests: determinism, empty input, single byte, block boundaries, incremental updates, URL patterns, clear/reuse
  • Integration Tests (14 assertions)
    • test_ParentHashConfig.cc: Config parsing and validation
    • Tests: valid inputs, invalid input fallback, case sensitivity, backward compatibility
  • Related Tests (687 assertions)
    • test_NextHopConsistentHash: 111 assertions
    • test_NextHopRoundRobin: 55 assertions
    • test_NextHopStrategyFactory: 521 assertions

Documentation

  • records.yaml.en.rst
    • Full configuration documentation
    • Performance characteristics for each algorithm
    • Migration warning about request redistribution
  • parent.config.en.rst
    • Hash algorithm reference in consistent_hash section
    • Cross-reference to records.yaml for details

Backward Compatibility

Fully backward compatible:

  • Default remains siphash24 (existing behavior unchanged)
  • All hash implementations use seed=0 for deterministic behavior across restarts
  • Existing tests pass with no regressions
  • No changes to parent selection logic, only hash implementations

Migration Consideration:

Changing the hash algorithm will cause requests to be redistributed differently across parent proxies. This can lead to cache churn and increased origin load during the transition. Plan migrations carefully and consider doing them during low-traffic periods.

Performance Characteristics

Algorithm Speed vs SipHash-2-4 Compression Rounds Finalization Rounds DoS Resistant
siphash24 Baseline (1.0x) 2 4 Yes
siphash13 ~1.5x faster 1 3 Yes
wyhash ~3-5x faster N/A (different design) N/A Yes

Implementation Highlights

  • Template-Based SipHash: Uses ATSHashSip<c_rounds, d_rounds> template to eliminate code duplication between SipHash variants. Type aliases ATSHash64Sip24 and ATSHash64Sip13 provide convenient access. Compiler optimizes loops at compile time for zero runtime overhead.
  • Deterministic Seeding: All hash implementations use seed=0 to ensure consistent parent selection across server restarts, preventing cache churn.
  • Factory Pattern: ParentConsistentHash::createHashInstance() selects hash algorithm based on config, with fallback to SipHash-2-4 for unknown values.

Future Work

  • Phase 2: Add XXH3 if an external dependency is acceptable
  • Phase 3: Implement per-parent-set hash configuration with configurable seed values

Testing Instructions

  1. Build with changes: cmake --build build
  2. Run hash tests: ./build/src/tscore/test_tscore "[HashSip13]" "[HashWyhash]"
  3. Run config tests: ./build/src/proxy/unit_tests/test_proxy
  4. Verify default: Check that proxy.config.http.parent_proxy.consistent_hash_algorithm defaults to siphash24 in configs/records.yaml.default.in

Configuration Example

Global hash algorithm setting (in records.yaml):

http:
  parent_proxy:
    consistent_hash_algorithm: wyhash  # or siphash24, siphash13

Parent selection rule (in parent.config):
# The hash algorithm configured in records.yaml will be used
dest_domain=example.com parent=p1:80,p2:80 round_robin=consistent_hash

Note: The hash algorithm is a global setting that affects all parent selections using round_robin=consistent_hash. It cannot be configured per-parent-set in this implementation. Future work (Phase 3) may add per-parent-set hash configuration.

@PSUdaemon PSUdaemon requested a review from zwoop October 15, 2025 22:43
@PSUdaemon PSUdaemon self-assigned this Oct 15, 2025
@PSUdaemon PSUdaemon force-pushed the parent_selection_hash_extension branch 5 times, most recently from 235ada1 to 5edaa90 Compare October 16, 2025 00:00
@jrushford
Copy link
Contributor

jrushford commented Oct 16, 2025 via email

@PSUdaemon PSUdaemon force-pushed the parent_selection_hash_extension branch 4 times, most recently from 056624f to fbdaa1a Compare October 16, 2025 01:23
@PSUdaemon
Copy link
Contributor Author

@PSUdaemon this is relevant to strategies also as it uses the SipHash-2-4 for its consistent hash used for parent selection. You might consider a PR for strategies.yaml. I believe that strategies will deprecate parent selection in the future. Regards Jrushford

@jrushford, I looked into this and seems like a logical next PR. If the community likes this one and merges it, I'll do that next.

@PSUdaemon PSUdaemon force-pushed the parent_selection_hash_extension branch from fbdaa1a to f27095f Compare October 16, 2025 01:55
@PSUdaemon PSUdaemon requested a review from jrushford October 16, 2025 02:30
@PSUdaemon PSUdaemon force-pushed the parent_selection_hash_extension branch from e2591b4 to db539a1 Compare October 16, 2025 18:38
@maskit
Copy link
Member

maskit commented Oct 17, 2025

It seems like the description was generated by AI, which is fine and sorry if it wasn't. Was the code generated by AI? If so, can you make sure that the code is compliant with the guideline from ASF?
https://www.apache.org/legal/generative-tooling.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants