Capability

Multimodal Prompt Fusion For Text Sketch Coherence

20 artifacts provide this capability.

Want a personalized recommendation?

Top Matches

via “cross-attention fusion of image features and prompt embeddings”

Meta's foundation model for visual segmentation.

Unique: Uses bidirectional cross-attention where both prompts attend to image features and image features attend to prompts, enabling mutual refinement. This design allows prompts to disambiguate image regions and image context to refine prompt interpretation.

vs others: More principled than concatenation-based fusion because attention learns which image regions are relevant to each prompt, avoiding feature dilution from irrelevant image regions and enabling explicit multi-prompt composition.

Multimodal Prompt Fusion For Text Sketch Coherence

Top Matches

Also Known As

Company