declarative task definition with type-safe sdk
Defines workflow tasks using a TypeScript-first SDK that compiles task definitions into a schema-aware registry, enabling static type checking across task inputs/outputs and automatic serialization of complex types. The Task Definition API creates tasks as first-class objects with built-in support for retries, timeouts, and concurrency limits, stored in a workerCatalog that the run engine references during execution.
Unique: Uses a monorepo-based build system (Turborepo) with task schema compilation that generates a workerCatalog at build time, enabling the run engine to validate task invocations against pre-compiled schemas rather than runtime reflection or JSON schema validation
vs alternatives: Stronger type safety than Temporal or Airflow because task contracts are validated at TypeScript compile time, not runtime, catching integration bugs before deployment
distributed task execution with checkpoint-resume semantics
Executes tasks across distributed workers using a state machine-driven run engine that persists execution checkpoints to enable resumption after failures or long-running operations. The checkpoint system captures execution state at defined points (waitpoints), allowing tasks to pause, wait for external events, and resume without re-executing completed work. Implemented via the Run Engine Architecture with dedicated checkpointSystem and waitpointSystem components that manage state transitions.
Unique: Implements a dual-system checkpoint architecture: executionSnapshotSystem captures full execution state at arbitrary points, while checkpointSystem and waitpointSystem provide explicit pause/resume semantics with distributed locking via Redis to prevent concurrent execution conflicts
vs alternatives: More granular than AWS Step Functions because checkpoints can be placed at any task step, not just between state transitions, enabling true mid-function resumption for long-running operations
distributed locking and concurrency control
Implements distributed locking via Redis to prevent concurrent execution of the same task or conflicting state transitions. Uses Redis EVAL scripts for atomic lock acquisition and release, ensuring exactly-once semantics across multiple coordinator instances. Concurrency management system enforces per-task concurrency limits (e.g., max 5 concurrent executions), with queuing of excess requests. Prevents race conditions in checkpoint updates and dequeue operations.
Unique: Uses Redis EVAL scripts for atomic lock operations, avoiding race conditions that could occur with separate GET/SET commands. Integrates with concurrency management system to enforce per-task limits without requiring separate rate-limiting service.
vs alternatives: More efficient than database-based locking because Redis operations are in-memory and sub-millisecond, whereas database locks require disk I/O and transaction overhead
task lifecycle hooks for custom initialization and cleanup
Provides lifecycle hooks (onStart, onSuccess, onFailure, onRetry) that execute custom code before task execution, after success, after failure, or before retry attempts. Hooks are defined in task configuration and executed by the run engine as part of the run state machine. Enables cross-cutting concerns like metrics emission, notification sending, or resource cleanup without modifying task code. Hooks have access to task context and execution metadata.
Unique: Hooks are integrated into the run state machine, executing at specific state transitions rather than as separate event handlers. Provides access to full task context and execution metadata, enabling rich customization without external event systems.
vs alternatives: More integrated than webhook-based approaches because hooks execute in-process with full context access, whereas webhooks require serialization and network round-trips
build extensions and custom task compilation
Allows developers to define custom build extensions that transform task code during compilation, enabling code generation, instrumentation, or optimization. Build extensions hook into the Turborepo build system and can modify task definitions before they're registered in the workerCatalog. Enables use cases like automatic OpenTelemetry instrumentation, code splitting, or custom serialization logic without manual implementation.
Unique: Integrates with Turborepo build system to allow compile-time task transformation, enabling code generation and instrumentation without runtime overhead. Extensions have access to full TypeScript AST, enabling sophisticated code analysis and generation.
vs alternatives: More powerful than decorator-based approaches because extensions can perform arbitrary code transformation, whereas decorators are limited to metadata attachment
ttl-based automatic run expiration and cleanup
Automatically expires and cleans up old task runs based on configurable TTL (time-to-live) policies, freeing database storage and improving query performance. The TTL system (ttlSystem component) periodically scans for expired runs and marks them for deletion. Supports per-environment TTL configuration (e.g., dev runs expire after 7 days, prod runs after 90 days). Deleted runs are archived to cold storage before permanent deletion.
Unique: Implements TTL as a dedicated system component (ttlSystem) that runs periodically, rather than relying on database-level TTL features. Supports per-environment configuration and integrates with execution snapshot system to archive data before deletion.
vs alternatives: More flexible than database-level TTL because per-environment policies can be configured without database schema changes, and archived data can be queried separately
multi-provider task scheduling and dequeue orchestration
Routes task execution across multiple compute providers (Docker, Kubernetes, serverless) using a provider abstraction layer that abstracts provider-specific deployment details. The dequeue system polls task queues managed by Redis, applies concurrency limits and rate limiting per task, and dispatches work to available workers based on provider capacity and task affinity. Queue management uses distributed locking to ensure exactly-once dequeue semantics across multiple coordinator instances.
Unique: Uses a pluggable provider architecture (Docker, Kubernetes providers as separate apps) with a coordinator service that abstracts provider-specific logic, enabling new providers to be added without modifying core scheduling logic. Dequeue system implements distributed locking via Redis EVAL scripts to guarantee exactly-once semantics.
vs alternatives: More flexible than Celery because provider abstraction allows seamless switching between Docker/K8s/serverless without code changes, whereas Celery requires separate broker/worker configurations per backend
run lifecycle state machine with automatic retry and error handling
Manages task execution lifecycle through a deterministic state machine (defined in runEngine.server.ts and statuses.ts) that transitions runs through states: PENDING → QUEUED → EXECUTING → COMPLETED/FAILED/RETRYING. Implements automatic retry logic with exponential backoff, configurable retry limits per task, and error categorization to distinguish transient vs permanent failures. Failed runs trigger the retryAttemptSystem which re-enqueues work based on retry policies.
Unique: Implements a centralized run state machine in the run engine that all coordinator instances reference, with state transitions persisted to database and validated via distributed locking, ensuring no concurrent state conflicts. Retry logic is decoupled from task code via runAttemptSystem, allowing retry policies to be updated without redeploying tasks.
vs alternatives: More deterministic than Temporal because state transitions are explicitly modeled in a single state machine rather than distributed across workflow code, making failure modes easier to reason about
+6 more capabilities