Performance Benchmarks

Strata is designed for high-performance distributed workloads where I/O throughput and coordination latency are critical. The project includes a suite of benchmarks using Criterion.rs to provide statistically significant performance measurements across various subsystems.

Running Benchmarks

Benchmarks are distributed across the workspace crates. You can run the entire suite or target specific components using cargo bench.

Execute All Benchmarks

# Run all benchmarks in the workspace
cargo bench

Target Specific Components

# Benchmark the coordinator's request handling
cargo bench -p coordinator

# Benchmark shard assignment algorithms
cargo bench -p data-shard

# Benchmark checkpoint I/O (Local vs S3)
cargo bench -p storage

Key Benchmark Suites

Checkpoint Throughput

These benchmarks measure the time taken to serialize model state and persist it to the configured storage backend.

| Metric | Target | Baseline Performance | |--------|--------|----------------------| | Local Write | NVMe SSD | ~500 MB/s | | S3 Write | AWS S3 (US-East-1) | ~200 MB/s | | Serialization | Protobuf/Zero-copy | ~1.2 GB/s |

Usage Example: To test throughput with a specific payload size:

# Benchmarks writing 1GB checkpoints
CHECKPOINT_SIZE_MB=1024 cargo bench -p checkpoint

Coordinator Scalability

The coordinator benchmarks simulate high-frequency heartbeats and state requests to measure RPS (Requests Per Second) and P99 latency.

Capacity: 10,000+ RPS on a single coordinator instance.
Barrier Latency: Measures the overhead of worker synchronization.
- 100 workers: <50ms P99.
- 1,000 workers: <120ms P99.

Shard Assignment & Consistent Hashing

These benchmarks evaluate the efficiency of the data-shard crate when redistributing shards during worker join/leave events.

Initial Assignment: <10ms for 1,000 workers and 10,000 shards.
Rebalance Overhead: Measures the number of shards "moved" during a node failure. Due to consistent hashing, this remains near the theoretical minimum of 1/N.

Performance Profiling

For deep analysis of I/O bottlenecks or CPU usage during training, Strata supports standard Rust profiling tools.

Flamegraphs

Generate a flamegraph to visualize the execution time of the async runtime:

# Install flamegraph tool
cargo install flamegraph

# Run coordinator benchmarks with profiling
cargo flamegraph --dev -p coordinator --bench request_latency

S3 Performance Tuning

When benchmarking S3 throughput, performance is heavily influenced by your environment. For production-grade results:

Ensure the benchmark is running on an EC2 instance within the same region as your S3 bucket.
Check that the instance has sufficient Network Bandwidth (e.g., m5n.4xlarge or similar).

Use the STORAGE_BACKEND=s3 environment variable:

STORAGE_BACKEND=s3 CHECKPOINT_BUCKET=my-bench-bucket cargo bench -p storage

Interpreting Results

After running cargo bench, Criterion generates an HTML report providing detailed graphs and comparisons:

Location: target/criterion/report/index.html
Metrics: Look for the Mean Execution Time and Throughput (MB/s) indicators.
Regressions: Criterion will automatically compare the current run against the last saved baseline and highlight any performance regressions in red.