Jin Wang

Jin Wang

CS PhD Student at HKU

The University of Hong Kong

I am currently a 3rd-year PhD student in Computer Science at The University of Hong Kong (HKU), supervised by Prof. Ping Luo. My research focuses on multimodal foundation models, especially unified systems that connect visual understanding, generation, and evaluation. My earlier work explored deepfake detection and AI interpretability. I have been a research intern at Hunyuan (Tencent Project Up), Huawei Noah’s Ark Lab, and Megvii. Before joining HKU, I obtained my master’s degree from the University of Chinese Academy of Sciences under Prof. Chao Li, and received my bachelor’s degree from Dalian University of Technology (DLUT).

Education

 
 
 
 
 
The University of Hong Kong
PhD in Computer Science
October 2023 – July 2027 Hong Kong
 
 
 
 
 
University of Chinese Academy of Sciences
MEng in Electronic Engineering
September 2020 – June 2023 Beijing
 
 
 
 
 
Dalian University of Technology
BEng in Digital Media Technology
September 2016 – June 2020 Dalian, Liaoning

Publications

Unified Multimodal Understanding & Generation

FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities
Jin Wang* , Yao Lai* , Aoxue Li , Shifeng Zhang , Jiacheng Sun , Ning Kang , Chengyue Wu , Zhenguo Li , Ping Luo * Equal contribution
In NeurIPS 2025 (Spotlight)

Multimodal Generation

TAGRPO: Boosting GRPO on Image-to-Video Generation with Direct Trajectory Alignment
Jin Wang* , Jianxiang Lu* , Guangzheng Xu , Comi Chen , Haoyu Yang , Linqing Wang , Peng Chen , Mingtao Chen , Zhichao Hu , Longhuang Wu , Shuai Shao , Qinglin Lu , Ping Luo * Equal contribution
To appear in ICML 2026
Rotate Your Character: Revisiting Video Diffusion Models for High-Quality 3D Character Generation
Jin Wang* , Jianxiang Lu* , Comi Chen , Guangzheng Xu , Haoyu Yang , Peng Chen , Na Zhang , Yifan Xu , Longhuang Wu , Shuai Shao , Qinglin Lu , Ping Luo * Equal contribution
Under Review
Megactor-sigma: Unlocking flexible mixed-modal control in portrait animation with diffusion transformer
Shurong Yang* , Huadong Li* , Juhao Wu* , Minhao Jing* , Linze Li , Renhe Ji , Jiajun Liang , Haoqiang Fan , Jin Wang * Equal contribution
In AAAI 2025

Multimodal Understanding

Fast-dDrive: Efficient Block-Diffusion VLM for Autonomous Driving
Kewei Zhang* , Jin Wang* , Sensen Gao , Chengyue Wu , Yulong Cao , Songyang Han , Boris Ivanovic , Langechuan Liu , Marco Pavone , Song Han , Daquan Zhou , Enze Xie * Equal contribution
Preprint
Fast-dVLM: Efficient Block-Diffusion VLM via Direct Conversion from Autoregressive VLM
Chengyue Wu* , Shiyi Lan* , Yonggan Fu , Sensen Gao , Jin Wang , Jincheng Yu , Jose M. Alvarez , Pavlo Molchanov , Ping Luo , Song Han , Ligeng Zhu , Enze Xie * Equal contribution
Preprint
Cross-modal Identity Mapping: Minimizing Information Loss in Modality Conversion via Reinforcement Learning
Haonan Jia* , Shichao Dong* , Xin Dong , Zenghui Sun , Jin Wang , Jinsong Lan , Xiaoyong Zhu , Bo Zheng , Kaifu Zhang * Equal contribution
In CVPR 2026
INTER: Mitigating Hallucination in Large Vision-Language Models by Interaction Guidance Sampling
Xin Dong* , Shichao Dong* , Jin Wang* , Jing Huang , Li Zhou , Zenghui Sun , Lihua Jing , Jingsong Lan , Xiaoyong Zhu , Bo Zheng * Equal contribution
In ICCV 2025
Diagnosing the Compositional Knowledge of Vision Language Models from a Game-Theoretic View
Jin Wang , Shichao Dong , Yapeng Zhu , Kelu Yao , Weidong Zhao , Chao Li , Ping Luo
In ICML 2024

Multimodal Evaluation

Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models
Jin Wang* , Chenghui Lv* , Xian Li , Shichao Dong , Huadong Li , Kelu Yao , Chao Li , Wenqi Shao , Ping Luo * Equal contribution
In CVPR 2025
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
Fanqing Meng* , Jin Wang* , Chuanhao Li* , Quanfeng Lu , Hao Tian , Jiaqi Liao , Xizhou Zhu , Jifeng Dai , Yu Qiao , Ping Luo , Kaipeng Zhang , Wenqi Shao * Equal contribution
In ICLR 2025
MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
Kaining Ying* , Fanqing Meng* , Jin Wang* , Zhiqian Li , Han Lin , Yue Yang , Hao Zhang , Wenbo Zhang , Yuqi Lin , Shuo Liu , Jiayi Lei , Quanfeng Lu , Runjian Chen , Peng Xu , Renrui Zhang , Haozhe Zhang , Peng Gao , Yali Wang , Yu Qiao , Ping Luo , Kaipeng Zhang , Wenqi Shao * Equal contribution
In ICML 2024

Deepfake Detection

Implicit Identity Leakage: The Stumbling Block to Improving Deepfake Detection Generalization
Shichao Dong* , Jin Wang* , Renhe Ji , Jiajun Liang , Haoqiang Fan , Zheng Ge * Equal contribution
In CVPR 2023
Explaining Deepfake Detection by Analysing Image Matching
Shichao Dong* , Jin Wang* , Jiajun Liang , Haoqiang Fan , Renhe Ji * Equal contribution
In ECCV 2022

AI Interpretability

Interpretable Generative Adversarial Networks
Chao Li* , Kelu Yao* , Jin Wang* , Boyu Diao , Yongjun Xu , Quanshi Zhang * Equal contribution
In AAAI 2022 (ORAL)

Additional Research

Asymmetric Cross-Modal Knowledge Distillation: Bridging Modalities with Weak Semantic Consistency
Riling Wei* , Kelu Yao* , Chuanguang Yang , Jin Wang , Zhuoyan Gao , Chao Li * Equal contribution
In AAAI 2026
SWIRL: A Staged Workflow for Interleaved Reinforcement Learning in Mobile GUI Control
Quanfeng Lu* , Zhantao Ma* , Shuai Zhong , Jin Wang , Dahai Yu , Michael K. Ng , Ping Luo * Equal contribution
Preprint
Falcon: A remote sensing vision-language foundation model
Kelu Yao* , Nuo Xu* , Rong Yang* , Yingying Xu* , Zhuoyan Gao , Titinunt Kitrungrotsakul , Yi Ren , Pu Zhang , Jin Wang , Ning Wei , Chao Li * Equal contribution
Preprint
Sparse Beats Dense: Rethinking Supervision in Radar-Camera Depth Completion
Huadong Li* , Minhao Jing* , Jin Wang , Shichao Dong , Jiajun Liang , Haoqiang Fan , Renhe Ji * Equal contribution
In ECCV 2024
Towards RGB-NIR Cross-modality Image Registration and Beyond
Huadong Li* , Shichao Dong* , Jin Wang* , Rong Fu* , Minhao Jing , Jiajun Liang , Haoqiang Fan , Renhe Ji * Equal contribution
Preprint