Keras 3
FrameworkFreeMulti-backend deep learning API for JAX, TF, and PyTorch.
Capabilities14 decomposed
multi-backend neural network compilation and execution
Medium confidenceCompiles a single Keras 3 model definition to execute identically across JAX, TensorFlow, or PyTorch backends via a unified intermediate representation. The framework translates high-level layer operations into backend-specific computation graphs at compile time, allowing developers to switch backends by changing a single configuration parameter without modifying model code. This is achieved through a backend abstraction layer that maps Keras operations (e.g., `keras.ops.conv2d`) to equivalent backend implementations, with automatic differentiation and gradient computation delegated to the underlying framework.
Keras 3's backend abstraction is implemented via a unified `keras.ops` module that provides 200+ operations with identical semantics across JAX, TensorFlow, and PyTorch, compiled to backend-specific graphs at model instantiation time rather than runtime interpretation, enabling true backend switching without performance penalties from dynamic dispatch.
Unlike PyTorch's ONNX export (lossy, requires separate tooling) or TensorFlow's SavedModel (TensorFlow-locked), Keras 3 maintains a single source of truth that compiles natively to each backend's native format with guaranteed semantic equivalence.
functional api layer composition with symbolic tensor chaining
Medium confidenceEnables declarative model construction by chaining layer calls on symbolic Input tensors, building an acyclic computation graph without executing any operations. Each layer call returns a symbolic tensor representing the output shape and type, allowing developers to compose complex architectures (CNNs, RNNs, Transformers) in a few lines by nesting layer calls. The framework defers actual computation until `model.fit()` or `model.predict()` is invoked, enabling graph-level optimizations and automatic differentiation setup.
Keras 3's functional API uses Python's `__call__` operator overloading to create symbolic tensor chains that build a static computation graph, enabling graph-level optimizations and automatic differentiation without requiring explicit graph construction APIs (unlike TensorFlow 1.x's `tf.Graph` or PyTorch's `torch.jit.trace`).
More concise and readable than PyTorch's imperative `nn.Sequential` for complex architectures, and more flexible than TensorFlow's high-level `Sequential` API because it supports arbitrary branching and multi-input/output patterns without boilerplate.
automatic differentiation and gradient computation
Medium confidenceIntegrates with the underlying backend's autodiff system (JAX's `grad`, TensorFlow's `GradientTape`, PyTorch's `autograd`) to automatically compute gradients of the loss with respect to model parameters during backpropagation. Developers do not explicitly call gradient computation functions; the framework handles this transparently in `model.fit()` or custom training loops via `model.train_step()`. Gradients are computed using reverse-mode autodiff (backpropagation), enabling efficient gradient computation for deep networks.
Keras 3's autodiff integration is transparent and backend-agnostic: the same model code automatically uses JAX's `grad`, TensorFlow's `GradientTape`, or PyTorch's `autograd` depending on the compiled backend, with no explicit gradient computation calls required in user code.
Simpler than PyTorch's explicit `loss.backward()` calls, and more flexible than TensorFlow's `tf.function` which requires graph-mode compilation; Keras 3 supports both eager and graph execution transparently.
optimizer abstraction with multiple algorithms and learning rate scheduling
Medium confidenceProvides a unified optimizer interface supporting multiple algorithms (SGD, Adam, RMSprop, Adagrad, etc.) specified as strings (e.g., 'adam') or optimizer objects in `model.compile()`. Optimizers maintain internal state (momentum, adaptive learning rates) across training steps and apply parameter updates based on gradients. Learning rate scheduling is supported via `keras.optimizers.schedules.*` (e.g., `ExponentialDecay`, `CosineDecay`) or custom schedules, enabling dynamic learning rate adjustment during training without manual intervention.
Keras 3's optimizer abstraction is backend-agnostic and maintains optimizer state (momentum, adaptive learning rates) using the backend's native tensor operations, enabling seamless switching between JAX, TensorFlow, and PyTorch without retraining or state conversion.
More unified than PyTorch's separate `torch.optim` and `torch.optim.lr_scheduler` modules, and simpler than TensorFlow's optimizer API which requires explicit state management; Keras 3 optimizers are fully integrated with the training loop.
loss function abstraction with standard and custom objectives
Medium confidenceProvides a library of loss functions (CrossEntropy, MeanSquaredError, BinaryCrossentropy, etc.) accessible via `keras.losses.*` or as strings (e.g., 'categorical_crossentropy') in `model.compile()`. Loss functions compute a scalar objective value from model predictions and target labels, guiding the optimization process. Custom loss functions can be implemented as Python functions or by subclassing `keras.losses.Loss`, enabling domain-specific objectives (e.g., contrastive loss, focal loss). Loss values are automatically differentiated to compute gradients.
Keras 3's loss functions are backend-agnostic and automatically differentiated using the compiled backend's autodiff system, with support for both built-in losses (optimized implementations) and custom losses (user-defined Python functions), enabling flexible objective specification without backend-specific code.
More flexible than PyTorch's `torch.nn` loss functions because custom losses are first-class citizens and automatically integrated with the training loop, and simpler than TensorFlow's loss API which requires explicit reduction specification.
batch normalization and layer normalization with training/inference modes
Medium confidenceProvides `keras.layers.BatchNormalization` and `keras.layers.LayerNormalization` layers that normalize layer inputs to improve training stability and convergence. BatchNormalization maintains running statistics (mean, variance) computed during training and uses them during inference, requiring a `training` flag to distinguish modes. The framework automatically handles mode switching during `model.fit()` (training=True) and `model.predict()` (training=False), eliminating manual mode management.
Keras 3's normalization layers automatically manage training/inference mode switching via the `training` flag, which is set by `model.fit()` and `model.predict()` without user intervention, and running statistics are maintained as layer state that is updated during training and frozen during inference.
Simpler than PyTorch's manual `model.train()` and `model.eval()` mode switching, and more integrated than TensorFlow's batch norm which requires explicit mode specification in some cases; Keras 3 handles mode switching transparently.
subclassed layer and model customization with imperative forward passes
Medium confidenceAllows developers to define custom layers and models by subclassing `keras.layers.Layer` or `keras.Model`, implementing `__init__()` for layer composition and `call()` for the forward pass logic. This imperative approach enables dynamic control flow (conditionals, loops based on tensor values), stateful operations, and fine-grained control over computation that the functional API cannot express. Custom layers are automatically integrated into the training pipeline via `model.fit()` and support automatic differentiation through the backend's autodiff system.
Keras 3's subclassing API uses Python's method overriding pattern to enable imperative forward passes with full access to the backend's tensor operations, while maintaining automatic differentiation through the backend's autodiff system (JAX's `grad`, TensorFlow's `GradientTape`, PyTorch's `autograd`).
More flexible than the functional API for dynamic architectures, and more Pythonic than TensorFlow's `tf.function` decorator approach because it uses standard OOP patterns without requiring graph-mode compilation annotations.
unified training loop with automatic differentiation and gradient descent
Medium confidenceProvides a high-level `model.fit()` method that orchestrates the entire training process: forward pass, loss computation, backward pass (automatic differentiation), and optimizer step updates. Developers specify the optimizer (e.g., 'adam', 'rmsprop'), loss function (e.g., 'categorical_crossentropy'), and metrics (e.g., 'accuracy') as strings or objects, and the framework handles gradient computation via the backend's autodiff system, batching, validation, and metric aggregation. The method returns a `History` object with per-epoch metrics for analysis.
Keras 3's `model.fit()` abstracts away backend-specific autodiff details (JAX's `grad`, TensorFlow's `GradientTape`, PyTorch's `autograd`) behind a unified interface, automatically selecting the appropriate differentiation mechanism based on the compiled backend and handling gradient accumulation, clipping, and optimizer state management transparently.
Simpler than PyTorch's manual `loss.backward()` and `optimizer.step()` pattern, and more flexible than TensorFlow's `tf.keras.Model.fit()` because it supports custom training logic via `train_step()` override without requiring `tf.function` annotations.
model serialization and checkpoint management
Medium confidenceEnables saving and loading trained models via `model.save()` and `keras.models.load_model()`, supporting both the native Keras format (.keras file containing architecture, weights, and optimizer state) and ONNX export for cross-framework compatibility. The framework also provides `keras.callbacks.ModelCheckpoint` for automatic checkpoint saving during training based on validation metrics, allowing recovery of the best model and resumption of training from a checkpoint.
Keras 3's `.keras` format is a self-contained ZIP archive containing model architecture (JSON), weights (HDF5), and optimizer state, enabling single-file model distribution without external dependencies, while ONNX export provides backend-agnostic interchange format for deployment on non-Python runtimes.
More portable than PyTorch's `.pt` format (which is Python-specific) and simpler than TensorFlow's SavedModel (which requires directory structure and metadata files), with native ONNX support eliminating the need for separate conversion tools.
pretrained model loading and inference via kerashub
Medium confidenceProvides access to pretrained model architectures and weights via KerasHub, a companion library offering task-specific APIs like `CausalLM.from_preset()` for language models and `TextToImage.from_preset()` for diffusion models. Developers specify a model preset name (e.g., 'gemma2_instruct_2b_en', 'stable_diffusion_3_medium') and the framework downloads the architecture and weights from Kaggle Models, instantiating a ready-to-use model for inference or fine-tuning. This abstracts away model architecture details and checkpoint management.
KerasHub's preset system uses a naming convention (e.g., 'gemma2_instruct_2b_en') that maps to model architecture + weights hosted on Kaggle Models, with task-specific APIs (CausalLM, TextToImage) that abstract away model-specific generation logic, enabling one-line inference without knowledge of model internals.
Simpler than HuggingFace Transformers' `from_pretrained()` for specific tasks (no need to specify model class), and more integrated with Keras training than external model hubs, but with a much smaller model catalog than HuggingFace (only Google/Stability AI models shown).
metric computation and monitoring during training
Medium confidenceProvides a `metrics` parameter in `model.compile()` that accepts metric names (strings like 'accuracy', 'mse') or custom `keras.metrics.Metric` objects, computing and aggregating metrics across batches during training and validation. Metrics are evaluated on each batch and accumulated using a stateful object that maintains running averages, enabling per-epoch metric reporting without storing all predictions in memory. Custom metrics can be implemented by subclassing `keras.metrics.Metric` and overriding `update_state()` and `result()` methods.
Keras 3's metrics use a stateful accumulation pattern where each `keras.metrics.Metric` object maintains internal state (e.g., running sum and count for averaging) across batches, enabling memory-efficient metric computation without storing all predictions, and supporting distributed training via state synchronization.
More memory-efficient than PyTorch's approach of storing all predictions and computing metrics post-hoc, and more flexible than TensorFlow's built-in metrics because custom metrics can override any part of the computation pipeline.
callback-based training hooks and custom training logic
Medium confidenceProvides a callback system via `keras.callbacks.Callback` that allows developers to inject custom logic at specific points in the training loop (epoch start/end, batch start/end, training start/end). Built-in callbacks include `ModelCheckpoint` (save best model), `EarlyStopping` (stop training if validation metric plateaus), `ReduceLROnPlateau` (reduce learning rate on metric plateau), and `TensorBoard` (log metrics to TensorBoard). Custom callbacks can be implemented by subclassing `Callback` and overriding hook methods, enabling integration with external systems (logging, hyperparameter tuning, model serving).
Keras 3's callback system uses a hook-based pattern where callbacks register methods (on_epoch_begin, on_batch_end, etc.) that are invoked at specific training loop points, enabling non-invasive extension of training logic without modifying the core `fit()` method or requiring custom training loops.
More flexible than PyTorch's limited callback support (no built-in callback system), and simpler than TensorFlow's `tf.keras.callbacks` because Keras 3 callbacks are backend-agnostic and work identically across JAX, TensorFlow, and PyTorch.
layer and model visualization with architecture diagrams
Medium confidenceProvides `keras.utils.plot_model()` to generate visual representations of model architecture as PNG or SVG diagrams, showing layers, connections, and tensor shapes. The function accepts a model object and optional parameters (show_shapes=True to display tensor dimensions, show_layer_names=True to label layers, rankdir='TB' for top-to-bottom layout). This enables quick verification of model structure before training and aids in documentation and communication of architecture to non-technical stakeholders.
Keras 3's `plot_model()` generates architecture diagrams by traversing the model's layer graph and rendering it via graphviz, with optional tensor shape annotations that help identify shape mismatches without running the model, and support for nested models to show hierarchical architectures.
More integrated than external visualization tools (e.g., Netron), and simpler than PyTorch's `torchviz` which requires dummy input tensors; Keras's symbolic graph enables visualization without execution.
built-in layer library with 100+ standard neural network components
Medium confidenceProvides a comprehensive library of prebuilt layers (Conv2D, Dense, LSTM, Attention, Dropout, BatchNormalization, etc.) accessible via `keras.layers.*`, each implementing standard neural network operations with configurable parameters (filters, kernel size, activation, regularization). Layers are backend-agnostic, automatically compiled to the selected backend (JAX, TensorFlow, PyTorch) at model instantiation. The library covers convolutional, recurrent, attention-based, and normalization layers, enabling construction of most standard architectures without custom implementations.
Keras 3's layer library is implemented as backend-agnostic abstractions that compile to identical semantics across JAX, TensorFlow, and PyTorch, with automatic shape inference and gradient computation, enabling developers to write architecture code once and run it on any backend without modification.
More comprehensive than PyTorch's `torch.nn` for standard layers (includes specialized layers like MultiHeadAttention), and more consistent across backends than TensorFlow's layers which have backend-specific quirks; Keras layers are guaranteed to produce identical results across JAX/TensorFlow/PyTorch.
Capabilities are decomposed by AI analysis. Each maps to specific user intents and improves with match feedback.
Related Artifactssharing capabilities
Artifacts that share capabilities with Keras 3, ranked by overlap. Discovered automatically through the match graph.
Keras
High-level deep learning API — multi-backend (JAX, TensorFlow, PyTorch), simple model building.
keras
Multi-backend Keras
tensorflow
TensorFlow is an open source machine learning framework for everyone.
JAX
Google's numerical computing library — autodiff, JIT, vectorization, NumPy API for ML research.
Deep Learning Systems: Algorithms and Implementation - Tianqi Chen, Zico Kolter

Build a Large Language Model (From Scratch)
A guide to building your own working LLM, by Sebastian Raschka.
Best For
- ✓ML researchers comparing framework performance on identical models
- ✓teams with heterogeneous infrastructure (some TPUs, some GPUs, some CPUs) needing portable code
- ✓developers building framework-agnostic model libraries
- ✓rapid prototyping and experimentation with standard architectures
- ✓teams preferring declarative over imperative code style
- ✓developers building reusable model templates
- ✓practitioners training standard supervised learning models with automatic gradient computation
- ✓researchers implementing custom training algorithms that require gradient access
Known Limitations
- ⚠Not all backend-specific features are exposed; advanced JAX transformations (jit, vmap) may require custom code
- ⚠Performance overhead from abstraction layer is unquantified; native backend code may be faster
- ⚠Some operations may have subtle numerical differences across backends due to implementation variations
- ⚠Custom layers using backend-specific APIs (e.g., `torch.nn.Module` directly) break portability
- ⚠Cannot express dynamic control flow (e.g., if statements based on tensor values); use subclassing API for that
- ⚠Symbolic execution means shape mismatches are caught at model definition time, not data loading time, requiring explicit Input shape specification
Requirements
Input / Output
UnfragileRank
UnfragileRank is computed from adoption signals, documentation quality, ecosystem connectivity, match graph feedback, and freshness. No artifact can pay for a higher rank.
About
Multi-backend deep learning framework that runs on JAX, TensorFlow, and PyTorch, providing a consistent high-level API for building and training neural networks with seamless backend switching and broad ecosystem support.
Categories
Alternatives to Keras 3
Are you the builder of Keras 3?
Claim this artifact to get a verified badge, access match analytics, see which intents users search for, and manage your listing.
Get the weekly brief
New tools, rising stars, and what's actually worth your time. No spam.
Data Sources
Looking for something else?
Search →