dagster
RepositoryFreeDagster is an orchestration platform for the development, production, and observation of data assets.
Capabilities14 decomposed
declarative asset definition and dependency graph construction
Medium confidenceEnables developers to define data assets as Python functions decorated with @asset, automatically constructing a directed acyclic graph (DAG) of dependencies through function parameter matching and explicit asset_deps declarations. The system parses asset definitions at load time, resolves dependencies via asset keys, and builds an in-memory graph representation that tracks lineage, partitioning schemes, and materialization requirements without requiring manual DAG specification.
Uses decorator-based asset definitions with automatic dependency inference via function parameters, eliminating explicit DAG construction code; integrates with Python's type system for IDE support and enables asset-centric rather than job-centric pipeline organization
Simpler than Airflow's DAG construction and more asset-focused than dbt's model-only approach; provides automatic lineage without requiring separate metadata files
multi-dimensional asset partitioning with dynamic partition support
Medium confidenceImplements a sophisticated partitioning system allowing assets to be divided across time-based (daily, hourly), static categorical, or dynamically-generated partitions, with support for multi-dimensional partitioning (e.g., date × region). The system tracks partition state, enables targeted backfills, and optimizes execution by only materializing changed partitions. Partition definitions are composable and integrate with the asset graph to automatically determine which partitions need execution.
Supports dynamic partitions that are generated at runtime via user-defined functions, enabling partition schemes that adapt to data without code changes; integrates partition state tracking directly into the asset system rather than as a separate concern
More flexible than dbt's static partitioning; provides first-class support for dynamic partitions unlike Airflow's XCom-based approaches; enables efficient backfills without full DAG re-execution
asset health and freshness tracking with automated alerts
Medium confidenceTracks asset freshness (time since last materialization) and health status (latest run success/failure) via the asset health system. Freshness policies define expected materialization intervals (e.g., daily); the system compares actual freshness against policies and marks assets as stale. Health status is queryable via GraphQL and can trigger alerts via sensors. Integration with external systems (Slack, PagerDuty) enables notifications when assets become unhealthy.
Integrates freshness policies directly into asset definitions, enabling declarative SLA enforcement; computes health status from event logs without external monitoring tools
More integrated than Airflow's SLA framework; provides asset-level freshness unlike dbt's model-level approach; enables automatic health tracking without external tools
dynamic asset selection and targeted execution
Medium confidenceProvides AssetSelection API enabling programmatic selection of assets based on keys, tags, groups, or custom predicates. Selections can be composed (union, intersection, difference) and used to target specific assets for execution, backfills, or queries. The system resolves dependencies automatically, ensuring upstream assets are included in execution. Selections are queryable via GraphQL, enabling external systems to discover which assets will be executed.
Provides composable asset selection with automatic dependency resolution, enabling flexible targeting without code changes; selections are first-class objects queryable via GraphQL
More flexible than Airflow's fixed DAG selection; enables tag-based targeting unlike dbt's model-level approach; supports composition operators for complex selections
configuration management with environment-specific overrides
Medium confidenceImplements a configuration system enabling assets, resources, and jobs to accept configuration dictionaries at definition or execution time. Configuration is specified via ConfigurableResource base class or @resource decorator, with schema validation via Pydantic. Environment-specific configs are loaded from YAML files or environment variables, enabling dev/staging/prod deployments without code changes. Configuration is resolved at execution time and injected into asset context.
Integrates configuration management directly into resource definitions via ConfigurableResource, enabling schema validation and environment-specific overrides without separate config files
More integrated than Airflow's Variable system; provides schema validation unlike dbt's profiles.yml; enables runtime overrides without code changes
asset versioning and lineage tracking with data contracts
Medium confidenceTracks asset versions based on code changes, enabling detection of when asset definitions change and triggering re-materialization of downstream assets. Asset lineage is reconstructed from event logs, showing data flow across the pipeline. Data contracts (input/output schemas) can be defined on assets, with validation at execution time to detect schema mismatches. Lineage is queryable via GraphQL and visualizable in the UI.
Integrates asset versioning directly into the asset system, enabling automatic detection of code changes and downstream re-materialization; tracks lineage from event logs without external tools
More automated than dbt's version tracking; provides data contracts unlike Airflow; enables lineage reconstruction without external metadata stores
event-driven asset materialization with rich metadata and observability
Medium confidenceCaptures detailed execution events (AssetMaterializationEvent, DagsterEventType) during asset computation, including execution time, data quality metrics, row counts, and custom metadata. Events are persisted to configurable event log storage (SQLite, PostgreSQL, in-memory) and queryable via GraphQL, enabling real-time monitoring, data lineage reconstruction, and post-execution analysis without requiring external observability tools.
Implements event sourcing for asset execution, storing immutable event records that enable complete reconstruction of pipeline state; integrates metadata capture directly into the execution model rather than as post-hoc logging
More comprehensive than Airflow's task logs; provides structured event queries via GraphQL unlike dbt's file-based artifacts; enables real-time monitoring without external APM tools
sensor-based and schedule-based declarative automation
Medium confidenceProvides two complementary automation mechanisms: Sensors poll external systems (databases, APIs, file systems) on a configurable interval to detect changes and trigger asset materialization, while Schedules execute assets on cron expressions or custom timing logic. Both are defined as Python functions decorated with @sensor or @schedule, integrated into the asset daemon that runs continuously to evaluate automation rules and submit runs to the executor.
Unifies schedule and sensor automation under a single declarative model with shared tick tracking; sensors maintain cursor state to avoid reprocessing, enabling efficient polling of external systems
More flexible than Airflow's fixed scheduling; provides built-in sensor framework unlike dbt which relies on external orchestrators; enables event-driven automation without message queues
resource-based dependency injection and i/o manager abstraction
Medium confidenceImplements a dependency injection system where assets and ops declare required resources (databases, APIs, cloud storage) as function parameters, resolved at execution time from a configured resource dictionary. I/O managers abstract data persistence, enabling assets to read/write data to configurable backends (filesystem, S3, Snowflake, BigQuery) without hardcoding storage logic. Resources are scoped (process, step) and support initialization/cleanup hooks, enabling connection pooling and resource lifecycle management.
Combines dependency injection with I/O manager abstraction, enabling both runtime resource resolution and pluggable storage backends; resources support scoped lifecycle management (process, step) for efficient connection pooling
More flexible than dbt's profiles.yml; provides first-class I/O abstraction unlike Airflow's task-level connections; enables environment-agnostic pipeline code
graphql-based asset and run querying with workspace context
Medium confidenceExposes a comprehensive GraphQL API (dagster_graphql package) enabling queries on asset definitions, run history, event logs, and partition state. Queries execute against DagsterInstance storage, supporting filtering by asset key, run status, time range, and partition. The API includes mutations for triggering runs, launching backfills, and managing dynamic partitions. Workspace context provides multi-tenant isolation and permission scoping, enabling role-based access control in cloud deployments.
Provides a unified GraphQL API for both asset definitions and execution data, enabling single-query access to lineage and run history; workspace context enables multi-tenant isolation without separate databases
More comprehensive than Airflow's REST API; provides structured queries unlike dbt's file-based artifacts; enables programmatic access to lineage without external tools
dbt integration with asset materialization and metadata sync
Medium confidenceIntegrates dbt projects as Dagster assets via the dagster-dbt library, automatically loading dbt models as asset definitions and tracking their dependencies. The integration captures dbt metadata (column descriptions, tests, freshness) and syncs it into Dagster's asset system. Supports dbt cloud execution via API or local dbt CLI invocation, with event parsing to capture dbt test results and model execution metrics as Dagster events.
Automatically loads dbt models as Dagster assets by parsing manifest.json, enabling dbt to be orchestrated alongside Python code without manual asset definition; captures dbt test results as Dagster events for unified observability
More integrated than dbt's native Airflow provider; enables dbt metadata in asset catalogs unlike standalone dbt; supports both dbt Cloud and local execution
pipes framework for external process execution and event streaming
Medium confidenceProvides the Pipes framework (dagster-pipes package) enabling Dagster to orchestrate external processes (Spark jobs, Kubernetes pods, Lambda functions) while capturing their output as Dagster events. External processes write events to stdout in a structured format, which Dagster parses and converts to AssetMaterializationEvent, DagsterEventType, and custom events. Supports multiple execution contexts (Kubernetes, ECS, Spark) with language-agnostic client libraries (Python, Java, Go).
Enables event streaming from external processes via stdout parsing, allowing Dagster to capture execution details without requiring external process to write to Dagster storage; supports multiple execution contexts with unified event model
More flexible than Airflow's task operators; enables true event streaming unlike dbt's file-based approach; supports polyglot execution without language-specific operators
asset backfill orchestration with partition-aware execution
Medium confidenceImplements a backfill system enabling targeted re-execution of asset partitions across time ranges or custom selections. Backfills are submitted as BackfillRequest objects specifying asset selection and partition range, then executed by the executor with automatic dependency resolution. The system tracks backfill progress, enables cancellation, and optimizes execution by only materializing partitions that don't already exist or have changed upstream.
Integrates backfill logic directly into the asset system with automatic dependency resolution; enables partition-aware backfills that only re-execute changed partitions rather than full re-runs
More efficient than Airflow's full DAG re-execution; provides partition-aware backfills unlike dbt's model-level approach; enables targeted recovery without pipeline-wide re-runs
multi-process and distributed executor with resource allocation
Medium confidenceProvides pluggable executors (in-process, multiprocess, Kubernetes, Celery) that determine how ops and assets are executed. The multiprocess executor spawns worker processes for parallel execution with configurable concurrency limits and resource tags. Kubernetes executor submits jobs as Kubernetes pods with resource requests/limits. Executors integrate with the run launcher to manage process lifecycle, capture output, and handle failures with configurable retry logic.
Provides pluggable executor architecture enabling execution in multiple environments (local, Kubernetes, Celery) without code changes; integrates resource tags for declarative allocation
More flexible than Airflow's fixed executor model; supports Kubernetes natively unlike dbt; enables resource-aware execution without external schedulers
Capabilities are decomposed by AI analysis. Each maps to specific user intents and improves with match feedback.
Related Artifactssharing capabilities
Artifacts that share capabilities with dagster, ranked by overlap. Discovered automatically through the match graph.
Asseti
AI-driven platform for optimizing and managing business...
Assets Scout
Streamline asset management with AI-driven verification, real-time insights, and seamless...
Dagster
Data orchestration for ML — software-defined assets, type-checked IO, observability, modern Airflow alternative.
Itemery
Maximize asset control with AI-driven tracking, intuitive dashboards, and mobile...
Hypothetic
Revolutionize 3D/2D asset management and collaboration with AI-powered cloud...
Apache Airflow
Industry-standard workflow orchestration.
Best For
- ✓Data engineers building modular, maintainable data pipelines
- ✓Teams migrating from Airflow who want simpler dependency declaration
- ✓Organizations needing automatic lineage tracking for governance
- ✓Data teams processing time-series data with incremental updates
- ✓Organizations with multi-tenant or multi-region data architectures
- ✓Teams needing fine-grained control over which data subsets to recompute
- ✓Data teams needing SLA enforcement for asset freshness
- ✓Organizations with critical data assets requiring high availability
Known Limitations
- ⚠Circular dependencies are detected at definition time but cannot be resolved; requires manual refactoring
- ⚠Dynamic asset creation (runtime-determined asset counts) requires AssetSelection or dynamic partitions, adding complexity
- ⚠Asset key resolution is string-based; typos in asset names cause runtime failures, not compile-time errors
- ⚠Dynamic partitions require a DynamicPartitionsDefinition with explicit partition key generation; cannot infer partitions from data
- ⚠Partition pruning is manual via AssetSelection; no automatic detection of which partitions changed upstream
- ⚠Multi-dimensional partitioning adds complexity to backfill logic; cross-partition dependencies require careful modeling
Requirements
Input / Output
UnfragileRank
UnfragileRank is computed from adoption signals, documentation quality, ecosystem connectivity, match graph feedback, and freshness. No artifact can pay for a higher rank.
Repository Details
Package Details
About
Dagster is an orchestration platform for the development, production, and observation of data assets.
Categories
Alternatives to dagster
⭐AI-driven public opinion & trend monitor with multi-platform aggregation, RSS, and smart alerts.🎯 告别信息过载,你的 AI 舆情监控助手与热点筛选工具!聚合多平台热点 + RSS 订阅,支持关键词精准筛选。AI 智能筛选新闻 + AI 翻译 + AI 分析简报直推手机,也支持接入 MCP 架构,赋能 AI 自然语言对话分析、情感洞察与趋势预测等。支持 Docker ,数据本地/云端自持。集成微信/飞书/钉钉/Telegram/邮件/ntfy/bark/slack 等渠道智能推送。
Compare →The first "code-first" agent framework for seamlessly planning and executing data analytics tasks.
Compare →Are you the builder of dagster?
Claim this artifact to get a verified badge, access match analytics, see which intents users search for, and manage your listing.
Get the weekly brief
New tools, rising stars, and what's actually worth your time. No spam.
Data Sources
Looking for something else?
Search →