Noise Prediction Via U Net With Time Conditioning

1

stable-diffusion-v1-4Model50/100

via “unet-based iterative noise prediction and denoising”

text-to-image model by undefined. 6,21,488 downloads.

Unique: Combines UNet architecture with cross-attention conditioning (injecting CLIP embeddings at 4 resolution scales) and sinusoidal timestep embeddings. Uses a fixed linear noise schedule (beta_start=0.0001, beta_end=0.02) with 1000 timesteps, enabling stable training and inference.

vs others: More parameter-efficient than transformer-based alternatives (e.g., DiT) while maintaining strong semantic conditioning; comparable to proprietary models' architectures but fully open and reproducible.

2

video-diffusion-pytorchFramework44/100

via “noise prediction loss computation for diffusion training”

Implementation of Video Diffusion Models, Jonathan Ho's new paper extending DDPMs to Video Generation - in Pytorch

Unique: Implements noise prediction loss by sampling random diffusion steps and computing L2 distance between U-Net predictions and ground-truth added noise, enabling efficient training without unrolling the full diffusion process

vs others: More computationally efficient than unrolled diffusion training; provides stable gradients compared to some alternative objectives, though equal step weighting may not optimize perceptual quality

3

Denoising Diffusion Probabilistic Models (DDPM)Product24/100

via “noise-prediction-via-u-net-with-time-conditioning”

* 🏆 2020: [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (ViT)](https://arxiv.org/abs/2010.11929)

Unique: DDPM uses sinusoidal positional embeddings (inspired by Transformers) to encode timestep information, which are then injected into the U-Net via learned linear projections and element-wise addition/multiplication. This approach is more parameter-efficient and generalizes better than concatenating timestep as a one-hot vector. The architecture combines convolutional downsampling/upsampling with self-attention at lower resolutions, balancing computational cost and receptive field.

vs others: More efficient than training separate models per timestep and more flexible than fixed timestep embeddings, enabling smooth interpolation across the diffusion schedule and better generalization to unseen timesteps.

Top Matches

Also Known As

Company