multi-backend neural network compilation and execution
Compiles a single Keras 3 model definition to execute identically across JAX, TensorFlow, or PyTorch backends via a unified intermediate representation. The framework translates high-level layer operations into backend-specific computation graphs at compile time, allowing developers to switch backends by changing a single configuration parameter without modifying model code. This is achieved through a backend abstraction layer that maps Keras operations (e.g., `keras.ops.conv2d`) to equivalent backend implementations, with automatic differentiation and gradient computation delegated to the underlying framework.
Unique: Keras 3's backend abstraction is implemented via a unified `keras.ops` module that provides 200+ operations with identical semantics across JAX, TensorFlow, and PyTorch, compiled to backend-specific graphs at model instantiation time rather than runtime interpretation, enabling true backend switching without performance penalties from dynamic dispatch.
vs alternatives: Unlike PyTorch's ONNX export (lossy, requires separate tooling) or TensorFlow's SavedModel (TensorFlow-locked), Keras 3 maintains a single source of truth that compiles natively to each backend's native format with guaranteed semantic equivalence.
functional api layer composition with symbolic tensor chaining
Enables declarative model construction by chaining layer calls on symbolic Input tensors, building an acyclic computation graph without executing any operations. Each layer call returns a symbolic tensor representing the output shape and type, allowing developers to compose complex architectures (CNNs, RNNs, Transformers) in a few lines by nesting layer calls. The framework defers actual computation until `model.fit()` or `model.predict()` is invoked, enabling graph-level optimizations and automatic differentiation setup.
Unique: Keras 3's functional API uses Python's `__call__` operator overloading to create symbolic tensor chains that build a static computation graph, enabling graph-level optimizations and automatic differentiation without requiring explicit graph construction APIs (unlike TensorFlow 1.x's `tf.Graph` or PyTorch's `torch.jit.trace`).
vs alternatives: More concise and readable than PyTorch's imperative `nn.Sequential` for complex architectures, and more flexible than TensorFlow's high-level `Sequential` API because it supports arbitrary branching and multi-input/output patterns without boilerplate.
automatic differentiation and gradient computation
Integrates with the underlying backend's autodiff system (JAX's `grad`, TensorFlow's `GradientTape`, PyTorch's `autograd`) to automatically compute gradients of the loss with respect to model parameters during backpropagation. Developers do not explicitly call gradient computation functions; the framework handles this transparently in `model.fit()` or custom training loops via `model.train_step()`. Gradients are computed using reverse-mode autodiff (backpropagation), enabling efficient gradient computation for deep networks.
Unique: Keras 3's autodiff integration is transparent and backend-agnostic: the same model code automatically uses JAX's `grad`, TensorFlow's `GradientTape`, or PyTorch's `autograd` depending on the compiled backend, with no explicit gradient computation calls required in user code.
vs alternatives: Simpler than PyTorch's explicit `loss.backward()` calls, and more flexible than TensorFlow's `tf.function` which requires graph-mode compilation; Keras 3 supports both eager and graph execution transparently.
optimizer abstraction with multiple algorithms and learning rate scheduling
Provides a unified optimizer interface supporting multiple algorithms (SGD, Adam, RMSprop, Adagrad, etc.) specified as strings (e.g., 'adam') or optimizer objects in `model.compile()`. Optimizers maintain internal state (momentum, adaptive learning rates) across training steps and apply parameter updates based on gradients. Learning rate scheduling is supported via `keras.optimizers.schedules.*` (e.g., `ExponentialDecay`, `CosineDecay`) or custom schedules, enabling dynamic learning rate adjustment during training without manual intervention.
Unique: Keras 3's optimizer abstraction is backend-agnostic and maintains optimizer state (momentum, adaptive learning rates) using the backend's native tensor operations, enabling seamless switching between JAX, TensorFlow, and PyTorch without retraining or state conversion.
vs alternatives: More unified than PyTorch's separate `torch.optim` and `torch.optim.lr_scheduler` modules, and simpler than TensorFlow's optimizer API which requires explicit state management; Keras 3 optimizers are fully integrated with the training loop.
loss function abstraction with standard and custom objectives
Provides a library of loss functions (CrossEntropy, MeanSquaredError, BinaryCrossentropy, etc.) accessible via `keras.losses.*` or as strings (e.g., 'categorical_crossentropy') in `model.compile()`. Loss functions compute a scalar objective value from model predictions and target labels, guiding the optimization process. Custom loss functions can be implemented as Python functions or by subclassing `keras.losses.Loss`, enabling domain-specific objectives (e.g., contrastive loss, focal loss). Loss values are automatically differentiated to compute gradients.
Unique: Keras 3's loss functions are backend-agnostic and automatically differentiated using the compiled backend's autodiff system, with support for both built-in losses (optimized implementations) and custom losses (user-defined Python functions), enabling flexible objective specification without backend-specific code.
vs alternatives: More flexible than PyTorch's `torch.nn` loss functions because custom losses are first-class citizens and automatically integrated with the training loop, and simpler than TensorFlow's loss API which requires explicit reduction specification.
batch normalization and layer normalization with training/inference modes
Provides `keras.layers.BatchNormalization` and `keras.layers.LayerNormalization` layers that normalize layer inputs to improve training stability and convergence. BatchNormalization maintains running statistics (mean, variance) computed during training and uses them during inference, requiring a `training` flag to distinguish modes. The framework automatically handles mode switching during `model.fit()` (training=True) and `model.predict()` (training=False), eliminating manual mode management.
Unique: Keras 3's normalization layers automatically manage training/inference mode switching via the `training` flag, which is set by `model.fit()` and `model.predict()` without user intervention, and running statistics are maintained as layer state that is updated during training and frozen during inference.
vs alternatives: Simpler than PyTorch's manual `model.train()` and `model.eval()` mode switching, and more integrated than TensorFlow's batch norm which requires explicit mode specification in some cases; Keras 3 handles mode switching transparently.
subclassed layer and model customization with imperative forward passes
Allows developers to define custom layers and models by subclassing `keras.layers.Layer` or `keras.Model`, implementing `__init__()` for layer composition and `call()` for the forward pass logic. This imperative approach enables dynamic control flow (conditionals, loops based on tensor values), stateful operations, and fine-grained control over computation that the functional API cannot express. Custom layers are automatically integrated into the training pipeline via `model.fit()` and support automatic differentiation through the backend's autodiff system.
Unique: Keras 3's subclassing API uses Python's method overriding pattern to enable imperative forward passes with full access to the backend's tensor operations, while maintaining automatic differentiation through the backend's autodiff system (JAX's `grad`, TensorFlow's `GradientTape`, PyTorch's `autograd`).
vs alternatives: More flexible than the functional API for dynamic architectures, and more Pythonic than TensorFlow's `tf.function` decorator approach because it uses standard OOP patterns without requiring graph-mode compilation annotations.
unified training loop with automatic differentiation and gradient descent
Provides a high-level `model.fit()` method that orchestrates the entire training process: forward pass, loss computation, backward pass (automatic differentiation), and optimizer step updates. Developers specify the optimizer (e.g., 'adam', 'rmsprop'), loss function (e.g., 'categorical_crossentropy'), and metrics (e.g., 'accuracy') as strings or objects, and the framework handles gradient computation via the backend's autodiff system, batching, validation, and metric aggregation. The method returns a `History` object with per-epoch metrics for analysis.
Unique: Keras 3's `model.fit()` abstracts away backend-specific autodiff details (JAX's `grad`, TensorFlow's `GradientTape`, PyTorch's `autograd`) behind a unified interface, automatically selecting the appropriate differentiation mechanism based on the compiled backend and handling gradient accumulation, clipping, and optimizer state management transparently.
vs alternatives: Simpler than PyTorch's manual `loss.backward()` and `optimizer.step()` pattern, and more flexible than TensorFlow's `tf.keras.Model.fit()` because it supports custom training logic via `train_step()` override without requiring `tf.function` annotations.
+6 more capabilities