Hatchet
WorkflowFreeDistributed task queue for AI workloads.
Capabilities13 decomposed
dag-based workflow orchestration with hierarchical concurrency control
Medium confidenceHatchet executes multi-step workflows defined as directed acyclic graphs (DAGs) stored in the v1_dag table, with hierarchical concurrency management that enforces limits at workflow, step, and action levels. The system uses a state machine approach for task lifecycle management (v1_task table) with automatic persistence, enabling workflows to survive process failures and resume from checkpoints. Concurrency constraints are evaluated at dispatch time via the dispatcher service, preventing resource exhaustion while maintaining fairness across concurrent workflow runs.
Implements hierarchical concurrency control (workflow-level, step-level, action-level) with fairness scheduling via dispatcher state machine, rather than simple queue-based limits. Uses PostgreSQL partitioning on v1_task table by tenant and time for scalability, with automatic payload offloading to external storage when task inputs exceed inline thresholds.
Provides tighter concurrency guarantees than Celery (which uses worker-level limits) and more granular control than Airflow (which lacks action-level concurrency), enabling precise rate-limiting for LLM API calls without overprovisioning workers.
event-driven workflow triggering with cel expression matching
Medium confidenceHatchet triggers workflow runs in response to external events matched against CEL (Common Expression Language) filters stored in v1_filter and v1_match tables. The event matching system evaluates incoming events against registered workflow triggers, supporting complex conditional logic (e.g., 'event.type == "payment" && event.amount > 100') without requiring code changes. Events are persisted in the OLAP analytics schema (v1-olap) for audit trails and analytics, enabling both real-time triggering and historical event analysis.
Uses CEL (Common Expression Language) for filter expressions instead of custom DSL or regex, enabling expressive, type-safe event matching without code generation. Separates event persistence (v1-olap OLAP schema) from operational task tracking (v1-core schema), allowing independent scaling of analytics vs. real-time triggering.
More flexible than Airflow's static trigger rules and more performant than Temporal's event replay model because CEL evaluation is stateless and doesn't require full workflow re-execution for filtering.
payload offloading and external storage integration for large task inputs/outputs
Medium confidenceHatchet stores task payloads (inputs and outputs) in the v1_task_payload table as JSONB by default, but automatically offloads large payloads (>threshold, typically 1MB) to external storage (S3, GCS, Azure Blob Storage). The system stores a reference (URL or object key) in the database and fetches the payload on-demand when needed. This prevents PostgreSQL bloat and enables handling of very large payloads (e.g., multi-MB LLM responses, large file contents). Payload offloading is transparent to the application — the SDK handles fetching and caching automatically.
Automatic payload offloading to external storage (S3, GCS) when payload exceeds threshold, with transparent SDK integration. Stores payload reference in database, enabling efficient querying without loading large payloads. Supports multiple storage backends via pluggable storage interface.
More efficient than storing all payloads in PostgreSQL (which causes bloat and slow queries) and more transparent than requiring manual payload management. Automatic threshold-based offloading unlike Temporal which requires explicit payload compression.
message queue abstraction supporting rabbitmq and postgresql-based pgmq
Medium confidenceHatchet abstracts the message queue layer to support both RabbitMQ (for high-throughput deployments) and PostgreSQL-based PGMQ (for simpler deployments without external dependencies). The message queue is used for task distribution, event publishing, and inter-service communication. The abstraction layer (pkg/config/shared/shared.go) allows switching between queue implementations via configuration without code changes. PGMQ is particularly useful for development and small deployments because it requires only PostgreSQL; RabbitMQ is recommended for production deployments with high throughput.
Provides pluggable message queue abstraction supporting both RabbitMQ (high-throughput) and PostgreSQL-based PGMQ (simple, no external deps). Configuration-driven queue selection (pkg/config/shared/shared.go) enables switching implementations without code changes. PGMQ is particularly valuable for reducing operational complexity in smaller deployments.
More flexible than Celery (which requires Redis or RabbitMQ) because PGMQ option eliminates external dependencies. More scalable than Airflow (which uses DAG serialization) because message queue enables true asynchronous task distribution.
frontend dashboard for workflow monitoring and management
Medium confidenceHatchet includes a web-based dashboard (frontend/app/src/lib/api/generated/Api.ts) for monitoring workflow execution, viewing run history, and managing workflows. The dashboard displays real-time workflow status, step-by-step execution details, task logs, and failure reasons. Users can trigger workflow runs manually, view analytics (execution time trends, failure rates), and configure workflow settings. The dashboard is built with TypeScript/React and communicates with the API server via REST endpoints. Authentication is integrated with the API layer, supporting API keys and JWT tokens.
Web-based dashboard built with TypeScript/React, integrated with REST API for real-time workflow monitoring. Displays step-by-step execution details, logs, and failure reasons. Supports manual workflow triggering and analytics visualization. Included in core distribution, no separate deployment needed.
More user-friendly than Airflow's UI for non-technical users because it focuses on workflow execution rather than DAG editing. More real-time than Temporal's UI because Hatchet uses polling-based updates (though WebSocket would be faster).
grpc-based worker registration and real-time task assignment
Medium confidenceHatchet workers register with the dispatcher service via gRPC streaming (internal/services/dispatcher/dispatcher_v1.go), establishing persistent bidirectional connections for real-time task assignment. Workers send heartbeats and availability signals; the dispatcher maintains worker state (ACTIVE, INACTIVE, DRAINING) and assigns tasks based on worker capacity and concurrency constraints. Task assignment is pull-based (workers request work) rather than push-based, reducing dispatcher load and enabling workers to control their own throughput. The dispatcher uses a state machine to track action assignment lifecycle (PENDING_ASSIGNMENT → ASSIGNED → STARTED → COMPLETED).
Implements pull-based task assignment via gRPC streaming (workers request work) rather than push-based (dispatcher sends tasks), reducing dispatcher memory footprint and enabling workers to backpressure. Worker state machine (ACTIVE/INACTIVE/DRAINING) enables graceful shutdown without task loss, unlike Celery's abrupt worker termination.
Lower latency than HTTP-based task assignment (Celery, RQ) because gRPC streaming maintains persistent connections; more resilient than Temporal's worker heartbeat model because workers explicitly request work rather than relying on timeout-based failure detection.
multi-tenant workflow isolation with configurable resource limits
Medium confidenceHatchet enforces complete data isolation per tenant at the database schema level (all tables include tenant_id foreign key) and API layer (authentication middleware validates tenant context). Each tenant can configure resource limits (max concurrent workflows, max workers, rate limits) stored in configuration tables. The system uses PostgreSQL row-level security (RLS) policies to prevent cross-tenant data leakage, and the API server validates tenant context on every request via middleware (api/v1/server/middleware/telemetry/telemetry.go). Tenant-scoped metrics and analytics are isolated in the OLAP schema.
Enforces tenant isolation at three layers: database schema (tenant_id on all tables), PostgreSQL RLS policies, and API middleware validation. Resource limits are configurable per tenant and enforced at dispatcher dispatch time, preventing one tenant from starving others. Unlike Airflow (single-tenant) or Temporal (tenant isolation via namespaces), Hatchet's multi-tenancy is built into the core architecture.
Stronger isolation than Temporal's namespace-based approach because Hatchet uses PostgreSQL RLS for row-level enforcement; more flexible than Airflow's single-tenant model because it supports arbitrary tenant configurations without code changes.
automatic task retry with exponential backoff and timeout enforcement
Medium confidenceHatchet persists task state in the v1_task table with configurable retry policies (max retries, backoff multiplier, max backoff duration) and timeout constraints. When a task fails or times out, the system automatically reschedules it with exponential backoff (e.g., 1s, 2s, 4s, 8s) up to a maximum retry count. Timeouts are enforced by the dispatcher (soft timeout) and workers (hard timeout via context cancellation). Failed tasks are marked with failure reason and stack trace for debugging. The retry logic is deterministic and idempotent — retrying a task with the same input produces the same result.
Combines soft timeouts (dispatcher-enforced) with hard timeouts (worker context cancellation) for defense-in-depth. Retry state is persisted in PostgreSQL (v1_task.retry_count, last_retry_at) enabling resumption after dispatcher failure. Backoff calculation is deterministic (no jitter by default) but can be randomized via configuration.
More reliable than Celery's retry mechanism because retry state is persisted in PostgreSQL rather than in-memory; more flexible than Temporal's retry policy because Hatchet allows per-step configuration without workflow code changes.
rate limiting and fairness scheduling for concurrent api calls
Medium confidenceHatchet implements rate limiting at multiple levels: per-workflow-run (max concurrent steps), per-step (max concurrent actions), and per-action (via dispatcher fairness scheduling). The dispatcher uses a fairness algorithm to distribute available capacity across competing workflow runs, preventing starvation when multiple workflows request the same action. Rate limits are stored in v1_workflow_concurrency_limit and v1_step_concurrency_limit tables and evaluated at dispatch time. The system supports both hard limits (reject excess requests) and soft limits (queue and backoff). This is particularly useful for LLM API calls where rate limits are strict and overages are expensive.
Implements fairness scheduling at dispatcher level (not worker level), ensuring that when multiple workflows compete for limited API quota, each gets fair access. Uses hierarchical concurrency limits (workflow → step → action) enabling fine-grained control. Integrates with LLM-specific patterns (e.g., token-based rate limiting for OpenAI).
More sophisticated than Celery's rate limiting (which is per-worker, not global) and more efficient than Temporal's approach (which uses external rate limiter services). Fairness scheduling prevents starvation unlike simple queue-based approaches.
dual-schema database architecture for operational and analytical workloads
Medium confidenceHatchet uses two separate PostgreSQL schemas: v1-core for operational data (tasks, workflows, runs with high write frequency) and v1-olap for analytics (events, metrics, aggregates optimized for read-heavy queries). The v1-core schema uses row-level partitioning by tenant and time to manage table size and enable efficient pruning. The v1-olap schema stores denormalized event data and pre-aggregated metrics for reporting without impacting operational query performance. Data flows from v1-core to v1-olap via asynchronous ETL (event processing pipeline), enabling independent scaling and optimization of each schema.
Separates v1-core (operational, partitioned by tenant and time) from v1-olap (analytical, denormalized) at schema level, enabling independent optimization. Uses PostgreSQL partitioning for automatic data lifecycle management (old partitions can be archived/deleted). Asynchronous ETL pipeline decouples operational latency from analytical freshness.
More sophisticated than single-schema approaches (Airflow, Temporal) which require complex query optimization to balance operational and analytical workloads. Enables faster operational queries and more flexible analytics than monolithic schemas.
python and typescript sdks with code generation from openapi specification
Medium confidenceHatchet provides Python (pkg/client/rest/gen.go) and TypeScript (frontend/app/src/lib/api/generated/Api.ts) SDKs auto-generated from an OpenAPI specification (api-contracts/openapi/openapi.yaml). The SDKs expose high-level APIs for workflow definition, task submission, and result retrieval, abstracting away gRPC and REST details. Code generation ensures SDK consistency with server API changes — when the OpenAPI spec is updated, SDKs are regenerated automatically. The SDKs include type-safe request/response models (data-contracts.ts) and handle authentication, serialization, and error handling transparently.
SDKs are auto-generated from OpenAPI specification (api-contracts/openapi/openapi.yaml), ensuring consistency with server API. Includes type-safe request/response models (data-contracts.ts) and handles authentication/serialization transparently. Supports both REST and gRPC transports via SDK abstraction layer.
More maintainable than hand-written SDKs because code generation ensures consistency; more type-safe than Celery's Python API because SDKs are generated from formal spec. Supports multiple languages (Python, TypeScript) from single spec unlike Temporal which requires separate SDK implementations.
workflow run state persistence and resumption after failures
Medium confidenceHatchet persists the complete state of each workflow run in PostgreSQL (v1_workflow_run table with status: PENDING, RUNNING, COMPLETED, FAILED) along with step execution state (v1_step_run table). When a dispatcher or worker crashes, the system can resume workflow execution from the last completed step without re-executing already-finished work. Step outputs are persisted in v1_task_payload table, enabling downstream steps to access results from previous steps. The system uses optimistic locking (version columns) to prevent concurrent state updates and ensure consistency.
Persists complete workflow and step state in PostgreSQL with optimistic locking for consistency. Step outputs are stored in v1_task_payload table, enabling downstream steps to access results without re-execution. Supports automatic resumption from last completed step without application-level checkpoint logic.
More reliable than Celery (which loses state on worker crash) and simpler than Temporal (which requires explicit activity checkpointing). Automatic resumption without application code changes unlike Airflow (which requires XCom for state passing).
observability and telemetry with structured logging and metrics export
Medium confidenceHatchet integrates structured logging (via middleware in api/v1/server/middleware/telemetry/telemetry.go) and metrics export for monitoring workflow execution. The system logs all significant events (task assignment, step completion, failures) with structured fields (tenant_id, workflow_id, step_id, duration) enabling easy filtering and correlation. Metrics are exported in Prometheus format (task count, execution duration, failure rate) and can be scraped by monitoring systems. Telemetry middleware captures request/response details and injects trace IDs for distributed tracing across services.
Structured logging middleware (api/v1/server/middleware/telemetry/telemetry.go) captures request context and injects trace IDs automatically. Metrics are exported in Prometheus format for integration with standard monitoring stacks. Telemetry is built into core architecture, not bolted on.
More comprehensive than Celery's basic logging and more integrated than Temporal's optional telemetry. Structured logging with correlation IDs enables easier debugging than unstructured logs.
Capabilities are decomposed by AI analysis. Each maps to specific user intents and improves with match feedback.
Related Artifactssharing capabilities
Artifacts that share capabilities with Hatchet, ranked by overlap. Discovered automatically through the match graph.
n8n
Fair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations.
crewai
JavaScript implementation of the Crew AI Framework
Portia AI
Open source framework for building agents that pre-express their planned actions, share their progress and can be interrupted by a human....
ms-agent
MS-Agent: a lightweight framework to empower agentic execution of complex tasks
Dart
Transform workflows with AI: intuitive, customizable, seamlessly...
BulkGPT
Transform bulk tasks with AI: scrape, automate, and analyze...
Best For
- ✓Teams building LLM agent orchestration systems with multi-step reasoning
- ✓AI teams managing rate-limited API calls (OpenAI, Anthropic) across concurrent workflows
- ✓Organizations needing guaranteed workflow completion with automatic retry semantics
- ✓Event-driven AI systems (e.g., trigger summarization on document upload, classification on form submission)
- ✓Teams integrating Hatchet with webhook-based services (GitHub, Stripe, custom APIs)
- ✓Organizations requiring audit trails and event replay capabilities
- ✓AI workflows processing large documents or media files
- ✓Systems with multi-MB LLM responses (e.g., code generation, document analysis)
Known Limitations
- ⚠DAG structure must be acyclic — no dynamic loop constructs; loops require explicit step repetition
- ⚠Concurrency limits are enforced per tenant globally — no per-user or per-API-key granularity
- ⚠Workflow definitions are immutable after creation; schema changes require new workflow versions
- ⚠Maximum workflow complexity scales with PostgreSQL performance; very large DAGs (1000+ steps) may require query optimization
- ⚠CEL expression evaluation adds latency (~5-10ms per event) — not suitable for sub-millisecond triggering
- ⚠Event payload size limited by PostgreSQL JSONB column limits (~1GB per event)
Requirements
Input / Output
UnfragileRank
UnfragileRank is computed from adoption signals, documentation quality, ecosystem connectivity, match graph feedback, and freshness. No artifact can pay for a higher rank.
About
Distributed task queue and workflow engine built for AI workloads. Hatchet features DAG-based workflows, concurrency controls, rate limiting, and fairness scheduling for LLM calls.
Categories
Alternatives to Hatchet
Are you the builder of Hatchet?
Claim this artifact to get a verified badge, access match analytics, see which intents users search for, and manage your listing.
Get the weekly brief
New tools, rising stars, and what's actually worth your time. No spam.
Data Sources
Looking for something else?
Search →