Compositional Visual Understanding Through Structured Annotations

1

Visual GenomeDataset56/100

via “compositional-visual-understanding-through-structured-annotations”

108K images with dense scene graphs and 5.4M region descriptions.

Unique: Provides explicit decomposition of images into objects, attributes, and relationships, enabling training of compositional models that understand visual scenes through structured components. Scene graphs naturally support compositional learning by representing images as compositions of objects and relationships.

vs others: Enables compositional learning unlike flat image-label datasets; supports training models that generalize to novel combinations of known components

2

Make-A-SceneModel22/100

via “composition-aware object placement”

Make-A-Scene by Meta is a multimodal generative AI method puts creative control in the hands of people who use it by allowing them to describe and illustrate their vision through both text descriptions and freeform sketches.

3

RoamaroundProduct

via “shared annotation and insight markup”

Top Matches

Also Known As

Company