3d U Net Architecture With Resnet Blocks For Video Denoising

1

video-diffusion-pytorchFramework48/100

via “3d u-net architecture with resnet blocks for video denoising”

Implementation of Video Diffusion Models, Jonathan Ho's new paper extending DDPMs to Video Generation - in Pytorch

Unique: Extends 2D U-Net design to 3D by using 3D convolutional layers throughout encoder-decoder paths with ResNet-style skip connections, combined with sinusoidal time embeddings that are broadcast and added to feature maps at each resolution level

vs others: More parameter-efficient than some transformer-based video models while maintaining strong inductive biases for spatiotemporal coherence through convolutional locality

2

make-a-video-pytorchFramework46/100

via “resnet block with optional temporal processing”

Implementation of Make-A-Video, new SOTA text to video generator from Meta AI, in Pytorch

Unique: Combines ResNet residual pathways with optional temporal processing layers, allowing temporal operations to be selectively enabled at different network depths rather than globally

vs others: More flexible than fixed temporal processing patterns while maintaining training stability benefits of residual connections, enabling fine-tuned control over temporal processing distribution

3

VideoCrafterModel36/100

via “3d unet temporal-spatial denoising with frame coherence”

VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models

Unique: 3D convolutions operate jointly on temporal and spatial dimensions, enabling the model to learn motion patterns directly rather than treating frames independently. Attention layers capture long-range temporal dependencies, maintaining consistency across multiple frames.

vs others: 3D convolutions provide better temporal coherence than frame-by-frame generation or 2D convolutions with temporal attention; joint spatial-temporal processing more efficient than separate temporal and spatial pathways; architecture enables learning of motion patterns from data.

4

Hotshot-XLModel33/100

via “resnet block-based feature extraction and upsampling/downsampling”

✨ Hotshot-XL: State-of-the-art AI text-to-GIF model trained to work alongside Stable Diffusion XL

Unique: Applies ResNet blocks uniformly across spatial and temporal dimensions in the UNet3D, enabling efficient multi-scale feature extraction while maintaining temporal coherence through skip connections. The architecture is inherited from SDXL's proven design, adapted for temporal processing.

vs others: Skip connections improve training stability and gradient flow compared to plain convolution stacks; enables deeper networks without vanishing gradients. Trade-off is higher memory usage and computational cost compared to simpler architectures.

5

How Diffusion Models Work - DeepLearning.AIProduct21/100

via “u-net architecture for denoising networks”

![](https://img.shields.io/badge/Level-Medium-yellow) ![](https://img.shields.io/badge/Video-blue)

Unique: Provides detailed architectural diagrams and code showing how timestep embeddings are injected at multiple scales via addition/concatenation, and how skip connections preserve spatial information while allowing the network to learn hierarchical denoising features

vs others: More accessible than architecture papers, with visual diagrams and runnable PyTorch code showing the exact layer structure and data flow through the network

Top Matches

Also Known As

Company