Talk title:
Toward Factuality in Information Access: Multimodal Factual Knowledge Acquisition
时间:10月20号下午1:30(北京时间)
Join Zoom Meeting
https://nus-sg.zoom.us/j/5052386212?pwd=ZHdpTDJwS1FRMjJLSHFJMFFmR0x6UT09
Meeting ID: 505 238 6212
Passcode: 675758
Abstract:
Recent years witness great success in multimodal foundation models. However, although such models achieve decent scores on various benchmarks, we see that these models understand images as bags of words. In detail, they use object understanding as a shortcut but lacks ability to capture abstract semantics such as verbs. To learn physical world knowledge, we first categorize it according to its temporal dynamics (static -> dynamic) and by its horizon (short/fast thinking -> long/slow thinking). My research aims to bring this deep factual knowledge view to the multimodal world. Such a transformation poses significant challenges: (1) understanding multimodal semantic structures that are abstract (such as events and semantic roles of objects): I will present our solution of zero-shot cross-modal transfer, an effective way to inject event-level knowledge into vision-language foundation models; (2) understanding long-horizon temporal dynamics: I will introduce typical ways to handle long-horizon reasoning, which empower machines to capture complex temporal patterns. (3) After that, we will also briefly analyze the reason of hallucinations and the potential way to ensure factuality via knowledge-driven methods, with example applications like meeting summarization, timeline generation, and question answering. I will then lay out how I plan to promote factuality and truthfulness in multimodal information access, through a structured knowledge view that is easily explainable, highly compositional, and capable of long-horizon reasoning.