报告题目: Towards Interactive Multi-Modal Visual Understanding
报告时间:2024年6月4日16:00-17:00
报告地点:910023威尼斯官网B404会议室
报告人:冯春梅
报告人国籍:中国
报告人单位:新加坡科技研究局
报告人简介:Chun-Mei Feng is currently a research scientist at A*STAR, Singapore. Before this, she obtained her Ph.D. from Harbin Institute of Technology, Shenzhen, in 2022. During her Ph.D. period, she interned at the Inception Institute of Artificial Intelligence (IIAI) in 2020 and then visited ETH Zurich in 2021. Her research interests lie in multi-modal visual understanding, medical imaging, and decentralized AI in the era of large pretrained models. She has numerous peer-reviewed publications, most in flagship conferences/journals proceedings including CVPR, ICCV, ICLR (Spotlight), MICCAI (Early Accept), AAAI, and some journals, e.g., TIP, TNNLS, and TMI.
报告摘要:Language and visual interactions play a crucial role in our comprehension of the real world, highlighting the significance of Interactive Multi-Modal Visual Understanding as a promising field. This presentation will focus on two steps: interactive in multimodal and interaction with clients. For step one, including Composed Image Retrieval (CIR) and Referring Image Segmentation (RIS). CIR utilizes relevant captions to refine image retrieval results, while RIS employs language descriptions to precisely identify the segmentation targets. For step two, i.e., interaction with clients, it provides privacy protection manner for both CIR and Medical RIS because these will often involve different platforms such as Alibaba, Amazon, and eBay, as well as various hospitals. This talk is dedicated to enhancing multi-modal collaboration and reasoning capabilities. Additionally, it will include a discussion designed to promote the integration of additional modalities and interaction techniques, e.g., the development of practical systems for real-world e-commerce applications, transform understanding abilities into execution, and empower AI agents to achieve human-object interaction.
邀请人:董性平