Docker & Compose
Strata is designed to run in a containerized environment to ensure consistency across distributed nodes. The project provides a multi-stage Docker architecture that supports local development, simulated scaling, and production-grade deployments with S3 integration.
Quick Start (Development)
The fastest way to experience Strata's coordination capabilities is using the default Docker Compose configuration. This launches the Rust coordinator, the React dashboard, and four simulated workers.
# Build and start the cluster
docker-compose up --build
Once the containers are healthy:
- Dashboard: http://localhost:3000
- Coordinator API: http://localhost:3001/api/status
- gRPC Interface:
localhost:50051
Production Deployment
For production environments, Strata switches to persistent storage (AWS S3) and optimized builds.
1. Environment Configuration
Create a .env file in the root directory to provide your cloud credentials:
# Storage Configuration
STORAGE_BACKEND=s3
CHECKPOINT_BUCKET=my-strata-checkpoints
AWS_REGION=us-east-1
AWS_ACCESS_KEY_ID=your_key_id
AWS_SECRET_ACCESS_KEY=your_secret_key
# Logging
RUST_LOG=info
2. Launch with Production Profile
Use the production override file to enable S3 storage backends and disable simulated demo data:
docker-compose -f docker-compose.prod.yml up -d
Service Architecture
The system is composed of three primary containerized components:
Coordinator (crates/coordinator)
The central brain of the system.
- Ports:
50051(gRPC),3001(HTTP API). - Role: Manages the worker registry, shard assignments, and barrier synchronization.
- Config: Driven by environment variables for storage backends.
Dashboard (dashboard/)
The visual monitoring interface.
- Ports:
3000. - Role: Provides real-time visualization of worker health, throughput metrics, and dataset sharding status.
- Note: In Docker environments, the
VITE_API_URLis automatically configured to point to the coordinator service.
Workers (Distributed)
Simulated or real training nodes.
- Role: Connects to the coordinator via gRPC to request shards and signal checkpoint readiness.
- Scaling: You can scale the number of worker containers using Docker Compose:
docker-compose up --scale worker=10 -d
Configuration Reference
The following environment variables control the behavior of the Docker containers:
| Variable | Description | Allowed Values | Default |
|----------|-------------|----------------|---------|
| STORAGE_BACKEND | Type of storage for checkpoints | local, s3 | local |
| CHECKPOINT_BUCKET| S3 Bucket name (required for S3) | String | - |
| AWS_REGION | AWS Region for S3 storage | String | us-east-1 |
| RUST_LOG | Logging verbosity | debug, info, warn | info |
| MAX_WORKERS | Max worker capacity for coordinator | Integer | 1000 |
| HEARTBEAT_TIMEOUT| Seconds before a worker is marked dead | Integer | 30 |
Troubleshooting Containers
Viewing Logs
To debug synchronization issues or shard assignment logic:
# View all logs
docker-compose logs -f
# Follow coordinator logs specifically
docker-compose logs -f coordinator
Resetting the Environment
If you need to clear local volumes (local checkpoints and database state):
docker-compose down -v