Consistent Hashing & Sharding
Overview
Strata employs a sophisticated sharding strategy centered around Consistent Hashing and Epoch Coordination. This ensures that data distribution remains stable even as the cluster scales or workers fail, minimizing unnecessary data movement and ensuring deterministic access patterns across training runs.
The core logic resides in the data-shard crate, which provides a high-level ShardManager to coordinate assignments between workers and datasets.
Shard Management
The ShardManager is the primary interface for managing how datasets are sliced into shards and distributed across active workers.
Basic Usage
To distribute data, you first register workers and dataset parameters. The manager then computes assignments based on the current cluster state.
use data_shard::ShardManager;
// 1. Initialize the manager
let manager = ShardManager::new();
// 2. Register active workers
manager.register_worker("worker-0");
manager.register_worker("worker-1");
// 3. Define a dataset
// Parameters: ID, total_samples, shard_size, shuffle, seed
manager.register_dataset_params(
"imagenet-train",
1_281_167,
10_000,
true,
42,
);
// 4. Retrieve assignments for a specific worker and epoch
let assignments = manager.get_shard_for_worker("imagenet-train", "worker-0", 0);
Dataset Registration Parameters
When registering a dataset via register_dataset_params, the following inputs are required:
| Parameter | Type | Description |
| :--- | :--- | :--- |
| dataset_id | DatasetId | Unique identifier for the dataset. |
| total_samples | u64 | Total number of records/images in the dataset. |
| shard_size | u64 | Number of samples per individual shard. |
| shuffle | bool | Whether to enable deterministic shuffling. |
| seed | u64 | Random seed for reproducible shuffling logic. |
Consistent Hashing
Unlike traditional modulo-based sharding ($shard_id \pmod N$), Strata uses consistent hashing with virtual nodes. This provides two critical benefits:
- Stability: When a worker is added or removed, only $1/N$ of the shards are reassigned on average.
- Load Balancing: Virtual nodes ensure that shards are distributed evenly even when worker counts are small.
Fault Tolerance & Rebalancing
If a worker heartbeats timeout in the coordinator, the ShardManager handles the rebalance. Existing workers typically retain over 80% of their previously assigned shards, allowing them to continue training with minimal cache invalidation.
// Simulate a worker failure
manager.remove_worker("worker-1");
// Trigger a global rebalance of all registered datasets
let new_assignments = manager.rebalance_shards();
Epoch Coordination
Strata ensures that data is shuffled deterministically across epochs. While consistent hashing handles where a shard goes, the EpochCoordinator handles which samples are in which shard for a given epoch.
- Deterministic Shuffling: By providing an epoch index to
get_shard_for_worker, the system generates a stable permutation of shards for that specific epoch. - Global Sync: Every worker in the cluster sees the same shard-to-worker mapping for a given epoch, preventing overlapping data processing.
Advancing Epochs
When the training loop completes a pass over the data, the coordinator advances the epoch to trigger a new permutation:
// Advance the dataset state to the next epoch
manager.advance_epoch("imagenet-train");
Data Types
The following types are used across the sharding and coordinator APIs:
ShardAssignment
Returned when querying worker tasks.
| Field | Type | Description |
| :--- | :--- | :--- |
| dataset_id | String | The ID of the parent dataset. |
| shard_id | u64 | The unique index of the shard. |
| start_index | u64 | The inclusive start sample index. |
| end_index | u64 | The exclusive end sample index. |
| file_paths | Vec<String> | The actual data files associated with this shard. |
| epoch | u64 | The epoch this assignment is valid for. |
Performance Characteristics
- Assignment Latency: <10ms for 1,000 workers and 10,000 shards.
- Scale: Tested to support clusters up to 1,000 workers with negligible coordinator overhead.
- Memory: Uses
DashMapfor thread-safe, concurrent access to shard states without global locks.