Jin Wang
Jin Wang
Home
Publications
CV
Light
Dark
Automatic
1
FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities
The rapid progress of large language models (LLMs) has catalyzed the emergence of multimodal large language models (MLLMs) that unify …
Jin Wang
,
Yao Lai
,
Aoxue Li
,
Shifeng Zhang
,
Jiacheng Sun
,
Ning Kang
,
Chengyue Wu
,
Zhenguo Li
,
Ping Luo
PDF
Cite
Code
Project
INTER: Mitigating Hallucination in Large Vision-Language Models by Interaction Guidance Sampling
Hallucinations in large vision-language models (LVLMs) pose significant challenges for real-world applications, as LVLMs may generate …
Xin Dong
,
Shichao Dong
,
Jin Wang
,
Jing Huang
,
Li Zhou
,
Zenghui Sun
,
Lihua Jing
,
Jingsong Lan
,
Xiaoyong Zhu
,
Bo Zheng
PDF
Cite
Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models
Recently, the rapid development of AIGC has significantly boosted the diversities of fake media spread in the Internet, posing …
Jin Wang
,
Chenghui Lv
,
Xian Li
,
Shichao Dong
,
Huadong Li
,
Kelu Yao
,
Chao Li
,
Wenqi Shao
,
Ping Luo
PDF
Cite
Code
Dataset
Project
Megactor-sigma: Unlocking flexible mixed-modal control in portrait animation with diffusion transformer
Diffusion models have demonstrated superior performance in portrait animation. However, current approaches relied on either visual or …
Shurong Yang
,
Huadong Li
,
Juhao Wu
,
Minhao Jing
,
Linze Li
,
Renhe Ji
,
Jiajun Liang
,
Haoqiang Fan
,
Jin Wang
PDF
Cite
Code
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
The capability to process multiple images is crucial for Large Vision-Language Models (LVLMs) to develop a more thorough and nuanced …
Fanqing Meng
,
Jin Wang
,
Chuanhao Li
,
Quanfeng Lu
,
Hao Tian
,
Jiaqi Liao
,
Xizhou Zhu
,
Jifeng Dai
,
Yu Qiao
,
Ping Luo
,
Kaipeng Zhang
,
Wenqi Shao
PDF
Cite
Code
Dataset
Project
Sparse Beats Dense: Rethinking Supervision in Radar-Camera Depth Completion
It is widely believed that sparse supervision is worse than dense supervision in the field of depth completion, but the underlying …
Huadong Li
,
Minhao Jing
,
Jin Wang
,
Shichao Dong
,
Jiajun Liang
,
Haoqiang Fan
,
Renhe Ji
PDF
Code
Diagnosing the Compositional Knowledge of Vision Language Models from a Game-Theoretic View
Compositional reasoning capabilities are usually considered as fundamental skills to characterize human perception. Recent studies show …
Jin Wang
,
Shichao Dong
,
Yapeng Zhu
,
Kelu Yao
,
Weidong Zhao
,
Chao Li
,
Ping Luo
PDF
Cite
Code
Project
MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
Large Vision-Language Models (LVLMs) show significant strides in general-purpose multimodal applications such as visual dialogue and …
Kaining Ying
,
Fanqing Meng
,
Jin Wang
,
Zhiqian Li
,
Han Lin
,
Yue Yang
,
Hao Zhang
,
Wenbo Zhang
,
Yuqi Lin
,
Shuo Liu
,
Jiayi Lei
,
Quanfeng Lu
,
Runjian Chen
,
Peng Xu
,
Renrui Zhang
,
Haozhe Zhang
,
Peng Gao
,
Yali Wang
,
Yu Qiao
,
Ping Luo
,
Kaipeng Zhang
,
Wenqi Shao
PDF
Cite
Code
Dataset
Project
Towards Understanding the Generalization of Deepfake Detectors from a Game-Theoretical View
This paper aims to explain the generalization of deepfake detectors from the novel perspective of multi-order interactions among visual …
Kelu Yao
,
Jin Wang
,
Boyu Diao
,
Chao Li
PDF
Cite
Implicit Identity Leakage: The Stumbling Block to Improving Deepfake Detection Generalization
In this paper, we analyse the generalization ability of binary classifiers for the task of deepfake detection. We find that the …
Shichao Dong
,
Jin Wang
,
Renhe Ji
,
Jiajun Liang
,
Haoqiang Fan
,
Zheng Ge
PDF
Cite
Code
Poster
Slides
Video
DOI
»
Cite
×