Vision and Language Modeling. The Good, The Bad and The Ugly with Federico Bianchi
Abstract:
Vision and Language Models (VLMs) allow us to combine visual information with natural language to build powerful applications. In this talk, we will discuss recent results and explore where VLMs can be useful, when they do struggle in understanding language and why we should be aware of ethical problems that can arise from their usage.
Reference Papers:
1. Leveraging medical Twitter to build a visual–language foundation model for pathology AI. https://www.biorxiv.org/content/10.1101/2023.03.29.534834v1.
2. Contrastive language and vision learning of general fashion concepts. https://www.nature.com/articles/s41598-022-23052-9 (Scientific Reports 2022).
3. When and Why Vision-Language Models Behave like Bags-Of-Words, and What to Do About It? https://openreview.net/forum?id=KRLUvxh8uaX (ICLR 2023).
4. Easily Accessible Text-to-Image Generation Amplifies Demographic Stereotypes at Large Scale. https://arxiv.org/abs/2211.03759 (FAccT 2023).
For any information request about this event, you can contact Giovanni Tardino (giovanni.tardino@unibocconi.it)