Rewriting xxHash in Rust
A clean-room Rust reimplementation of xxHash: bit-exact parity across all four variants, NEON-accelerated XXH3, and comparable CLI-level throughput to the C reference on Apple Silicon.
Rewrite studyFirst benchmark pass
CLI-level throughput across four scenarios on Apple Silicon. The Rust implementation matches or exceeds the C reference on XXH64 at 16 MiB and trails by about 8% on XXH3_128 at 1 MiB.
Parity
508/508 tests
XXH64 16 MiB
Comparable to C
XXH3_128 1 MiB
~8% behind C
Scenarios
4 of 8 declared
Samples
Smoke-level (2 per tool)
- Bit-exact output across XXH32, XXH64, XXH3_64, and XXH3_128 for all tested input lengths, seeds, and streaming patterns.
- NEON-optimized XXH3 long-input paths on Apple Silicon stay bit-exact with the scalar reference.
- On XXH64 at 16 MiB, the Rust and C implementations are comparable (~3,972 vs ~3,694 MB/s cross-run median CLI-level throughput).
- On XXH3_128 at 1 MiB, the C reference leads by about 8% (~448 vs ~414 MB/s).
- At 4 KiB payloads, all comparators converge to the same throughput floor (~2 MB/s), dominated entirely by process startup overhead.
I reimplemented the xxHash family of hash functions in Rust from the published specification, covering all four variants (XXH32, XXH64, XXH3_64, XXH3_128) plus a CLI tool with behavioral parity against the upstream xxhsum. Then I benchmarked the result against the C reference and two contrast comparators.
The short version: the Rust implementation produces bit-exact output for every variant and passes 508 parity tests. On CLI-level throughput, it matches or exceeds the C reference on XXH64 at 16 MiB and trails by about 8% on XXH3_128 at 1 MiB. At small payloads, process startup dominates and all tools converge to the same floor.
Correctness came first
Before touching benchmarks, I validated that the Rust hash core produces the same output as the C reference across every variant and edge case.
The parity suite covers 508 individual test points:
- Boundary-length vectors: lengths 0, 1, 3, 4, 8, 9, 16, 17, 128, 129, 240, 241, and larger long-input cases for each algorithm.
- Seeded variants: both default (seed 0) and non-zero seeds produce reference-compatible digests.
- Streaming equivalence: the
reset/update/digeststreaming API produces the same results as one-shot hashing across multiple chunking patterns. Repeateddigest()calls on unchanged state return stable results, andupdate(A) → digest() → update(B)matches one-shot hashing onA || B.
All 508 tests pass at the measured revision. (evidence: parity_summary.json)
SIMD parity on Apple Silicon
On AArch64, the release build exercises NEON-optimized XXH3 long-input paths. These produce bit-exact output matching the scalar fallback for both XXH3_64 and XXH3_128, covering streaming variants, derived-secret paths, and both seeded and seed-0 inputs.
CLI behavioral parity
The CLI tool achieves behavioral parity with the upstream xxhsum for the validated surface, which includes algorithm selection, seed support, file and stdin hashing, GNU and BSD output formats, little-endian output, escaped-filename handling, file-list input, and the full check-mode policy stack (--quiet, --status, --warn, --strict, --ignore-missing).
Parity is validated through direct output comparison: 31 algorithm-selection tests, 69 output-format tests, 53 input-flow tests, and 355 check-mode tests. (evidence: parity_summary.json)
Benchmark methodology
The benchmarks measure end-to-end CLI throughput: each comparator is invoked as an external process that reads a payload file and produces a digest on stdout. This captures the full cost of process startup, I/O, and hashing rather than isolating the hash function in a microbenchmark loop.
Comparators
| ID | Role | Version |
|---|---|---|
c_xxhsum | Reference | xxhsum 0.8.3 (Yann Collet) |
rust_xxhash_rs | Subject | xxhash-rs 0.1.0 |
b3sum | Contrast | b3sum 1.8.3 |
md5 | Contrast | macOS system /sbin/md5 |
c_xxhsum and rust_xxhash_rs are parity oracles: the harness verifies that they produce the same digest before accepting timing samples. b3sum and md5 provide throughput context from different hash families.
Scenarios
| Scenario | Algorithm | Payload |
|---|---|---|
xxh64-4k | XXH64 | 4 KiB |
xxh64-1m | XXH64 | 1 MiB |
xxh64-16m | XXH64 | 16 MiB |
xxh3-128-1m | XXH3_128 | 1 MiB |
Each scenario uses warmup iterations (discarded) followed by measured iterations. The summary statistic is the median of measured samples. A hard correctness gate ensures c_xxhsum and rust_xxhash_rs agree on the output digest before timing results are accepted. (evidence: benchmark_summary.json)
Results
XXH64 at 16 MiB
At this payload size, process startup is a small fraction of total time, and the numbers primarily reflect hash throughput.
| Comparator | Median throughput |
|---|---|
c_xxhsum | ~3,694 MB/s |
rust_xxhash_rs | ~3,972 MB/s |
b3sum | ~3,965 MB/s |
md5 | ~532 MB/s |
The Rust implementation, C reference, and BLAKE3 all land in the same range (~3.7–4.0 GB/s) on XXH64 at 16 MiB, while MD5 trails at ~532 MB/s. The Rust and C xxHash numbers are close enough that run-to-run variance could change their relative order.
XXH3_128 at 1 MiB
| Comparator | Median throughput |
|---|---|
c_xxhsum | ~448 MB/s |
rust_xxhash_rs | ~414 MB/s |
b3sum | ~333 MB/s |
md5 | ~272 MB/s |
For XXH3_128 at 1 MiB, the C reference leads the Rust implementation by about 8% (~448 vs ~414 MB/s). Both C and Rust NEON-optimized XXH3 paths are exercised on this Apple Silicon host.
XXH64 at 1 MiB
| Comparator | Median throughput |
|---|---|
c_xxhsum | ~565 MB/s |
rust_xxhash_rs | ~472 MB/s |
b3sum | ~424 MB/s |
md5 | ~306 MB/s |
At 1 MiB, process startup is a larger fraction of measured time. The C reference leads the Rust implementation by about 16% (~565 vs ~472 MB/s), though some of that gap reflects startup and I/O variance rather than pure hash throughput differences.
XXH64 at 4 KiB
| Comparator | Median throughput |
|---|---|
c_xxhsum | ~2.2 MB/s |
rust_xxhash_rs | ~2.0 MB/s |
b3sum | ~1.7 MB/s |
md5 | ~2.4 MB/s |
At 4 KiB, process startup overwhelms the hash computation. All comparators converge to a similar throughput floor (~2 MB/s). These numbers say nothing about hash performance and are included only to illustrate the startup-dominated regime.
Interpretation
The CLI-level benchmarks show that xxhash-rs delivers throughput in the same range as the C reference across the measured scenarios. On the largest payload (XXH64 at 16 MiB), the two are comparable. On XXH3_128 at 1 MiB and XXH64 at 1 MiB, the C reference leads by 8–16%, though process startup, file I/O, and output formatting contribute fixed overhead that compresses the apparent gap at smaller payloads.
Applications that embed the hash library directly would see higher throughput from both implementations, with the fixed startup cost removed.
Limitations
-
Single-platform benchmarks. All measurements were taken on a single Apple Silicon host (arm64, macOS). Performance on x86_64 may differ, particularly for SIMD-accelerated XXH3 paths where SSE2/AVX2 code paths have not been benchmarked.
-
CLI-level measurement. Process startup overhead dominates at small payload sizes and partially masks hash throughput differences at medium sizes.
-
Smoke-level sample counts. The pinned runs use 2 measured iterations per comparator per scenario. A production-grade study would use higher sample counts for tighter confidence intervals.
-
Subset of declared scenarios. The evidence pack covers 4 of the 8 declared benchmark scenarios. The remaining scenarios are declared in the manifest but not included in the pinned runs.
-
Validated CLI surface only. Features outside the validated surface (for example, the upstream
--benchmarkmode) are not implemented or tested. -
No production deployment evidence. The parity and benchmark evidence demonstrates correctness and baseline performance, not production readiness.
Licensing and clean-room boundary
This is a clean-room reimplementation. The hash algorithms were implemented from the published xxHash specification and the BSD-2-Clause-licensed reference library material. The CLI achieves behavioral compatibility through black-box observation of the upstream xxhsum tool, without translating or copying any GPL-licensed source code.
The upstream project has two license regimes: BSD-2-Clause for the xxHash library and specification (freely usable, informed the Rust hash core), and GPLv2 for the xxhsum CLI tool (treated as an external behavioral oracle only). No xxhsum source files, help text, error messages, or implementation logic were incorporated into this repository.
xxHash was created by Yann Collet. The Rust reimplementation is released under the MIT OR Apache-2.0 dual license.
Reproducibility
The measured revision for all evidence is evidence-v1.
git clone https://github.com/sagaragas/xxhash-rs.git
cd xxhash-rs
git checkout evidence-v1
cargo build --workspace --release
cargo test --workspace --all-targets -- --test-threads=3
python3 publication/claim_map.py --verify
python3 publication/traceability_check.py
The evidence pack is committed under publication/evidence/ and includes parity test results, benchmark summaries with correctness gate outcomes, raw timing samples for three pinned claim-ready runs, and a claim-to-evidence map.