Ibis
FrameworkFreePortable Python dataframe API across 20+ backends.
Capabilities16 decomposed
lazy expression tree construction with symbolic dataframe operations
Medium confidenceBuilds an abstract syntax tree (AST) of dataframe operations without executing them, using a composable expression API where each operation (select, filter, join, aggregate) returns an unevaluated symbolic expression. The system uses ibis/expr/operations/ modules to define operation nodes and ibis/expr/types/ to wrap them in user-facing expression objects, enabling deferred computation and backend-agnostic query representation.
Uses a typed expression system with ibis/common/grounds.py for structural validation and ibis/common/patterns.py for pattern matching on expression nodes, enabling compile-time type safety and optimization passes that alternatives like Polars or Pandas lack. The deferred execution model is enforced at the type level, not just at runtime.
Stronger than Pandas/Polars for multi-backend portability because expressions are backend-agnostic by design; stronger than raw SQL because the Python API catches type errors before compilation and enables programmatic query construction.
multi-backend sql compilation with sqlglot dialect translation
Medium confidenceCompiles lazy expression trees to backend-specific SQL dialects by traversing the AST and translating each operation node to the target backend's SQL syntax. Integrates SQLGlot (ibis/backends/sql/) to handle dialect-specific features (window functions, JSON operations, array handling) and maintains a type mapping registry that converts Ibis types to backend-native types, enabling the same expression to generate correct SQL for DuckDB, BigQuery, Snowflake, PostgreSQL, etc.
Decouples expression semantics from SQL syntax by using SQLGlot's dialect abstraction layer, allowing a single expression tree to compile to 15+ SQL dialects without backend-specific branches in the compiler. The type mapping registry (ibis/backends/sql/type_mapping.py) is extensible per backend, enabling custom type coercion rules.
More flexible than hand-written SQL templates because it generates syntactically correct queries for each dialect automatically; more maintainable than Pandas + backend-specific adapters because the compilation logic is centralized and tested against all backends.
window function support with partitioning and ordering
Medium confidenceImplements window functions (rank, row_number, lag, lead, sum over window, etc.) with support for partitioning and ordering, enabling analytical queries like running totals, rankings, and moving averages. The system compiles window functions to backend-specific SQL syntax (OVER clauses in SQL, window specs in Spark), handling differences in window function support across backends and providing fallback implementations where needed.
Abstracts window function syntax across backends by providing a unified API (e.g., t.column.sum().over(ibis.window(partition_by=..., order_by=...))) that compiles to backend-specific window function syntax. The system handles backends with limited window function support by providing fallback implementations.
More portable than raw SQL window functions because the same code works across backends; more readable than Spark's Window API because it uses method chaining instead of function calls.
join operations with multiple join types and complex join conditions
Medium confidenceSupports multiple join types (inner, left, right, full outer, cross, anti, semi) with complex join conditions (multi-column joins, inequality joins, complex boolean expressions). The system compiles joins to backend-specific SQL syntax and handles differences in join semantics across backends (e.g., how NULL values are handled in join keys).
Supports complex join conditions beyond simple equality (e.g., t1.a > t2.b) by representing joins as operation nodes with arbitrary boolean expressions, not just column equality. The system compiles these to backend-specific SQL, handling backends with limited join support.
More flexible than Pandas merge (which only supports equality joins) because it supports inequality joins and complex conditions; more portable than raw SQL because the same code works across backends.
aggregation and grouping with multiple aggregation functions
Medium confidenceImplements group_by() and aggregate() operations that support multiple aggregation functions (sum, mean, count, min, max, stddev, etc.) applied to different columns, with optional filtering and ordering of results. The system compiles aggregations to backend-specific SQL GROUP BY clauses and handles differences in aggregate function support and naming across backends.
Supports multiple aggregations in a single operation by building an aggregation expression tree that compiles to a single GROUP BY query, rather than requiring separate aggregations and joins. The system optimizes aggregation order to minimize data movement.
More efficient than Pandas groupby (which materializes intermediate results) because aggregations are compiled to backend SQL; more readable than raw SQL because method chaining makes the operation sequence clear.
data type casting and coercion with explicit type conversion
Medium confidenceProvides explicit type casting operations (cast(), astype()) that convert columns between compatible types (e.g., string to integer, float to decimal). The system validates type compatibility at expression construction time and compiles casts to backend-specific type conversion syntax, handling differences in type coercion semantics across backends.
Validates type compatibility at expression construction time using the type system, catching invalid casts early. The system compiles casts to backend-specific syntax (CAST in SQL, astype in Spark, etc.), handling differences in type conversion semantics.
More type-safe than Pandas (which silently coerces types) because invalid casts are caught at construction time; more portable than raw SQL because the same cast syntax works across backends.
string operations and text manipulation with backend-specific functions
Medium confidenceImplements string operations (substring, length, upper, lower, replace, split, concatenate, regex matching) that compile to backend-specific string function syntax. The system abstracts over differences in string function names and behavior across backends (e.g., SUBSTR vs SUBSTRING, regex syntax differences), providing a unified API for text manipulation.
Abstracts string function syntax across backends by providing a unified API (e.g., t.column.upper(), t.column.substr(0, 5)) that compiles to backend-specific functions. The system handles backends with limited string function support by providing fallback implementations.
More portable than raw SQL string functions because the same code works across backends; more readable than Pandas string methods because it integrates with the fluent API.
array and struct operations with nested data type support
Medium confidenceSupports operations on complex types (arrays, structs) including element access, flattening, unnesting, and aggregation of nested data. The system compiles array/struct operations to backend-specific syntax (UNNEST in SQL, explode in Spark, LATERAL FLATTEN in Snowflake), handling differences in nested data support across backends.
Provides a unified API for nested data operations across backends with vastly different nested type support, using backend-specific compilation (UNNEST, explode, LATERAL FLATTEN) to handle differences. The system includes type inference for nested structures.
More portable than raw SQL nested operations because the same code works across backends; more flexible than Pandas (which lacks native nested type support) because it works with modern data warehouses' native nested types.
backend abstraction layer with pluggable execution engines
Medium confidenceProvides a unified backend interface (ibis/backends/base.py) that all 20+ execution engines implement, defining standard methods for table creation, query execution, and result fetching. Each backend (DuckDB, Spark, BigQuery, Snowflake, etc.) inherits from BaseBackend and implements dialect-specific connection, compilation, and execution logic, allowing users to swap backends by changing a single connection line (e.g., ibis.duckdb.connect() → ibis.bigquery.connect()).
Uses a template method pattern (BaseBackend class) where each backend implements compile_expression() and execute() hooks, enabling new backends to be added without modifying core Ibis code. The backend registry (ibis/backends/__init__.py) dynamically loads backends, allowing optional dependencies (e.g., BigQuery client only installed if using BigQuery).
More extensible than Pandas/Polars because the backend interface is explicit and testable; more maintainable than Spark's native DataFrame API because all backends expose the same Python API, reducing cognitive load when switching execution engines.
type-safe schema inference and validation with structured data types
Medium confidenceMaintains a type system (ibis/expr/types/core.py, ibis/common/typing.py) where every expression has a well-defined data type (int64, string, struct, array, etc.) that is validated at expression construction time. The schema system tracks column names and types through transformations, enabling type checking before execution and providing IDE autocomplete for column operations. Type mapping rules (ibis/backends/sql/type_mapping.py per backend) translate between Ibis types and backend-native types.
Uses Python's typing module and custom type annotations (ibis/common/annotations.py) to enforce schema contracts at the expression level, not just at runtime. The grounds.py system provides structural validation of operation arguments, catching invalid transformations before they reach the backend.
Stronger type safety than Pandas (which is dynamically typed) or Polars (which infers types at runtime); comparable to Spark's StructType system but more Pythonic and IDE-friendly due to native type hint integration.
cross-backend test infrastructure with docker-based environment parity
Medium confidenceProvides a comprehensive testing framework (ibis/backends/tests/) that runs the same test suite against all 20+ backends using Docker containers to ensure environment consistency. The test infrastructure includes backend-specific test classes (inheriting from BackendTestBase), TPC-H/TPC-DS benchmark queries for performance validation, and a test discovery system that skips unsupported operations per backend. This ensures that expressions behave identically across backends or clearly documents differences.
Uses a parameterized test approach where a single test function is executed against all backends, with per-backend skip decorators for unsupported operations. The Docker test environment (ibis/backends/tests/docker/) ensures that backends run in isolated, reproducible containers, eliminating environment-specific test failures.
More comprehensive than individual backend test suites because it enforces API consistency across all backends; more maintainable than manual cross-backend testing because tests are written once and run everywhere.
expression optimization via egraph-based rewriting
Medium confidenceApplies automated query optimization using e-graphs (equality graphs) implemented in ibis/common/egraph.py, which represent equivalent expressions as nodes in a graph and apply rewrite rules to find more efficient forms. The system can eliminate redundant operations (e.g., filtering after filtering), push predicates down to reduce data scanned, and reorder operations for better performance. Optimization happens at the symbolic level before compilation to backend SQL.
Uses e-graphs (a technique from compiler optimization and SMT solvers) to represent multiple equivalent expressions compactly, enabling exploration of a large optimization space without enumerating all possibilities. This is more sophisticated than traditional rule-based optimizers because it can find non-obvious optimization opportunities by combining multiple rewrite rules.
More powerful than Pandas/Polars optimizers because it operates on the symbolic expression tree before compilation; comparable to Spark's Catalyst optimizer but more transparent and easier to extend with custom rules.
seamless python-sql interoperability with raw sql fallback
Medium confidenceAllows mixing Ibis expressions with raw SQL strings, enabling users to drop down to backend-specific SQL when needed while maintaining the ability to compose results back into Ibis expressions. The system provides ibis.sql() for embedding SQL fragments and supports executing raw SQL queries through backend connections, with automatic result parsing into Ibis tables. This bridges the gap between Ibis's portability and backend-specific SQL features.
Treats raw SQL as a first-class expression type (ibis.sql.SQL) that can be composed with other Ibis operations, rather than forcing users to choose between pure Ibis or pure SQL. The system automatically handles result parsing and schema inference from SQL queries.
More pragmatic than pure Ibis because it acknowledges that some backend-specific features are worth using; more maintainable than pure SQL because Ibis operations remain portable and composable.
streaming and batch unification with flink backend support
Medium confidenceProvides a unified API for both batch and streaming data processing through the Flink backend (ibis/backends/flink/), enabling the same Ibis expression to execute on historical data (batch) or live streams (streaming) without code changes. The system abstracts over Flink's DataStream and DataSet APIs, allowing users to write once and toggle between batch and streaming execution by changing the backend configuration.
Abstracts over Flink's dual-mode execution model (batch and streaming) by providing a single expression API that compiles to either DataSet (batch) or DataStream (streaming) depending on the backend configuration. This is unique because most frameworks require different APIs for batch vs streaming.
More unified than Spark Structured Streaming (which requires different APIs for batch and streaming) or Flink's native APIs (which expose low-level details); enables code reuse across batch and streaming contexts that alternatives don't support.
deferred expression evaluation with on-demand execution and caching
Medium confidenceImplements lazy evaluation where expressions are not executed until explicitly requested via .execute() or .to_pandas(), allowing users to build complex query plans and execute them only when needed. The system caches compiled expressions and intermediate results (configurable via ibis.config), reducing recompilation overhead when the same expression is executed multiple times. Execution is deferred until the user explicitly materializes results.
Defers execution at the expression level (not just at the backend level), allowing Ibis to apply optimizations and caching before any backend sees the query. The caching system (ibis/common/caching.py) uses expression structure as the cache key, ensuring that semantically identical expressions reuse cached results.
More flexible than eager evaluation (Pandas) because queries can be built and optimized before execution; more transparent than Spark's lazy evaluation because Ibis expressions are explicit about what will be computed.
composable table operations with method chaining and fluent api
Medium confidenceProvides a fluent, method-chaining API where table operations (select, filter, join, group_by, aggregate) return new table expressions that can be immediately chained with further operations. Each method is implemented as an operation node in the expression tree, enabling readable, composable query construction that mirrors SQL's logical flow but with Python's syntax and type safety.
Implements a fluent API where every operation returns a new expression object, enabling arbitrary chaining without special syntax. The expression tree structure (ibis/expr/operations/relations.py) ensures that each method call creates a new node, preserving immutability and enabling optimization.
More readable than raw SQL for complex transformations; more Pythonic than Spark's DataFrame API because it uses method chaining instead of function calls; comparable to Pandas but with type safety and multi-backend support.
Capabilities are decomposed by AI analysis. Each maps to specific user intents and improves with match feedback.
Related Artifactssharing capabilities
Artifacts that share capabilities with Ibis, ranked by overlap. Discovered automatically through the match graph.
polars
Blazingly fast DataFrame library
Polars
Rust-powered DataFrame library 10-100x faster than pandas.
Sdf
SDF is a next-generation build system for data...
Apache Spark
Unified engine for large-scale data processing and ML.
vaex
Out-of-Core DataFrames to visualize and explore big tabular datasets
server
MariaDB server is a community developed fork of MySQL server. Started by core members of the original MySQL team, MariaDB actively works with outside developers to deliver the most featureful, stable, and sanely licensed open SQL server in the industry.
Best For
- ✓Data engineers building portable ETL pipelines across multiple warehouses
- ✓ML practitioners iterating locally then scaling to cloud compute without code changes
- ✓Teams standardizing on a single dataframe API across heterogeneous data infrastructure
- ✓Teams using multiple SQL databases (on-prem PostgreSQL, cloud BigQuery, data warehouse Snowflake)
- ✓Data engineers who need to migrate queries between backends with minimal refactoring
- ✓Organizations standardizing on Python for data work but with heterogeneous database infrastructure
- ✓Analytical queries requiring window functions (finance, time-series analysis, rankings)
- ✓Teams migrating from SQL window functions to Ibis
Known Limitations
- ⚠Lazy evaluation means errors in query logic only surface at execution time, not during construction
- ⚠Memory overhead for large expression trees before compilation — no automatic expression pruning
- ⚠Custom operations require extending the operation registry; not all pandas/polars operations have direct Ibis equivalents
- ⚠Some advanced backend-specific features (e.g., Snowflake's LATERAL FLATTEN) may not have direct Ibis equivalents
- ⚠Type mapping is one-directional (Ibis → backend); backend-specific types returned from queries must be manually mapped back
- ⚠Compilation overhead adds ~50-200ms per query; not suitable for real-time query generation at sub-millisecond latency
Requirements
Input / Output
UnfragileRank
UnfragileRank is computed from adoption signals, documentation quality, ecosystem connectivity, match graph feedback, and freshness. No artifact can pay for a higher rank.
About
Portable Python dataframe library that provides a unified API across 20+ execution backends including DuckDB, Spark, BigQuery, and Snowflake. Write once, run anywhere — same code works locally and at warehouse scale for ML data prep.
Categories
Alternatives to Ibis
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning
Compare →A python tool that uses GPT-4, FFmpeg, and OpenCV to automatically analyze videos, extract the most interesting sections, and crop them for an improved viewing experience.
Compare →Are you the builder of Ibis?
Claim this artifact to get a verified badge, access match analytics, see which intents users search for, and manage your listing.
Get the weekly brief
New tools, rising stars, and what's actually worth your time. No spam.
Data Sources
Looking for something else?
Search →