1

FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities

The rapid progress of large language models (LLMs) has catalyzed the emergence of multimodal large language models (MLLMs) that unify …

Jin Wang, Yao Lai, Aoxue Li, Shifeng Zhang, Jiacheng Sun, Ning Kang, Chengyue Wu, Zhenguo Li, Ping Luo

FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities

INTER: Mitigating Hallucination in Large Vision-Language Models by Interaction Guidance Sampling

Hallucinations in large vision-language models (LVLMs) pose significant challenges for real-world applications, as LVLMs may generate …

Xin Dong, Shichao Dong, Jin Wang, Jing Huang, Li Zhou, Zenghui Sun, Lihua Jing, Jingsong Lan, Xiaoyong Zhu, Bo Zheng

INTER: Mitigating Hallucination in Large Vision-Language Models by Interaction Guidance Sampling

Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models

Recently, the rapid development of AIGC has significantly boosted the diversities of fake media spread in the Internet, posing …

Jin Wang, Chenghui Lv, Xian Li, Shichao Dong, Huadong Li, Kelu Yao, Chao Li, Wenqi Shao, Ping Luo

Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models

Megactor-sigma: Unlocking flexible mixed-modal control in portrait animation with diffusion transformer

Diffusion models have demonstrated superior performance in portrait animation. However, current approaches relied on either visual or …

Shurong Yang, Huadong Li, Juhao Wu, Minhao Jing, Linze Li, Renhe Ji, Jiajun Liang, Haoqiang Fan, Jin Wang

Megactor-sigma: Unlocking flexible mixed-modal control in portrait animation with diffusion transformer

Sparse Beats Dense: Rethinking Supervision in Radar-Camera Depth Completion

It is widely believed that sparse supervision is worse than dense supervision in the field of depth completion, but the underlying …

Huadong Li, Minhao Jing, Jin Wang, Shichao Dong, Jiajun Liang, Haoqiang Fan, Renhe Ji

Diagnosing the Compositional Knowledge of Vision Language Models from a Game-Theoretic View

Compositional reasoning capabilities are usually considered as fundamental skills to characterize human perception. Recent studies show …

Jin Wang, Shichao Dong, Yapeng Zhu, Kelu Yao, Weidong Zhao, Chao Li, Ping Luo

Towards Understanding the Generalization of Deepfake Detectors from a Game-Theoretical View

This paper aims to explain the generalization of deepfake detectors from the novel perspective of multi-order interactions among visual …

Kelu Yao, Jin Wang, Boyu Diao, Chao Li

Implicit Identity Leakage: The Stumbling Block to Improving Deepfake Detection Generalization

In this paper, we analyse the generalization ability of binary classifiers for the task of deepfake detection. We find that the …

Shichao Dong, Jin Wang, Renhe Ji, Jiajun Liang, Haoqiang Fan, Zheng Ge

Implicit Identity Leakage: The Stumbling Block to Improving Deepfake Detection Generalization