Performance Benchmarks
Strata is designed for high-performance distributed workloads where I/O throughput and coordination latency are critical. The project includes a suite of benchmarks using Criterion.rs to provide statistically significant performance measurements across various subsystems.
Running Benchmarks
Benchmarks are distributed across the workspace crates. You can run the entire suite or target specific components using cargo bench.
Execute All Benchmarks
# Run all benchmarks in the workspace
cargo bench
Target Specific Components
# Benchmark the coordinator's request handling
cargo bench -p coordinator
# Benchmark shard assignment algorithms
cargo bench -p data-shard
# Benchmark checkpoint I/O (Local vs S3)
cargo bench -p storage
Key Benchmark Suites
Checkpoint Throughput
These benchmarks measure the time taken to serialize model state and persist it to the configured storage backend.
| Metric | Target | Baseline Performance | |--------|--------|----------------------| | Local Write | NVMe SSD | ~500 MB/s | | S3 Write | AWS S3 (US-East-1) | ~200 MB/s | | Serialization | Protobuf/Zero-copy | ~1.2 GB/s |
Usage Example: To test throughput with a specific payload size:
# Benchmarks writing 1GB checkpoints
CHECKPOINT_SIZE_MB=1024 cargo bench -p checkpoint
Coordinator Scalability
The coordinator benchmarks simulate high-frequency heartbeats and state requests to measure RPS (Requests Per Second) and P99 latency.
- Capacity: 10,000+ RPS on a single coordinator instance.
- Barrier Latency: Measures the overhead of worker synchronization.
- 100 workers: <50ms P99.
- 1,000 workers: <120ms P99.
Shard Assignment & Consistent Hashing
These benchmarks evaluate the efficiency of the data-shard crate when redistributing shards during worker join/leave events.
- Initial Assignment: <10ms for 1,000 workers and 10,000 shards.
- Rebalance Overhead: Measures the number of shards "moved" during a node failure. Due to consistent hashing, this remains near the theoretical minimum of
1/N.
Performance Profiling
For deep analysis of I/O bottlenecks or CPU usage during training, Strata supports standard Rust profiling tools.
Flamegraphs
Generate a flamegraph to visualize the execution time of the async runtime:
# Install flamegraph tool
cargo install flamegraph
# Run coordinator benchmarks with profiling
cargo flamegraph --dev -p coordinator --bench request_latency
S3 Performance Tuning
When benchmarking S3 throughput, performance is heavily influenced by your environment. For production-grade results:
- Ensure the benchmark is running on an EC2 instance within the same region as your S3 bucket.
- Check that the instance has sufficient Network Bandwidth (e.g.,
m5n.4xlargeor similar). - Use the
STORAGE_BACKEND=s3environment variable:STORAGE_BACKEND=s3 CHECKPOINT_BUCKET=my-bench-bucket cargo bench -p storage
Interpreting Results
After running cargo bench, Criterion generates an HTML report providing detailed graphs and comparisons:
- Location:
target/criterion/report/index.html - Metrics: Look for the Mean Execution Time and Throughput (MB/s) indicators.
- Regressions: Criterion will automatically compare the current run against the last saved baseline and highlight any performance regressions in red.