Visual Modality Apr 2026

: Align the visual features with textual data (e.g., image captions or user prompts) using techniques like Cross-Modal Alignment to ensure the system "understands" the relationship between words and pictures.

: Implement an " Action-Modality Match " approach where users can switch between typing a brief and uploading a screenshot to iterate on designs or search results visually. Key Visual Elements to Include visual modality

This feature allows a system to understand not just what is in an image, but how those visual elements relate to specific user goals or queries. : Align the visual features with textual data (e