1

TAGRPO: Boosting GRPO on Image-to-Video Generation with Direct Trajectory Alignment

Jin Wang, Jianxiang Lu, Guangzheng Xu, Comi Chen, Haoyu Yang, Linqing Wang, Peng Chen, Mingtao Chen, Zhichao Hu, Longhuang Wu, Shuai Shao, Qinglin Lu, Ping Luo

TAGRPO: Boosting GRPO on Image-to-Video Generation with Direct Trajectory Alignment

Megactor-sigma: Unlocking flexible mixed-modal control in portrait animation with diffusion transformer

Diffusion models have demonstrated superior performance in portrait animation. However, current approaches relied on either visual or …

Shurong Yang, Huadong Li, Juhao Wu, Minhao Jing, Linze Li, Renhe Ji, Jiajun Liang, Haoqiang Fan, Jin Wang

Megactor-sigma: Unlocking flexible mixed-modal control in portrait animation with diffusion transformer

Cross-modal Identity Mapping: Minimizing Information Loss in Modality Conversion via Reinforcement Learning

Haonan Jia, Shichao Dong, Xin Dong, Zenghui Sun, Jin Wang, Jinsong Lan, Xiaoyong Zhu, Bo Zheng, Kaifu Zhang

Cross-modal Identity Mapping: Minimizing Information Loss in Modality Conversion via Reinforcement Learning

Asymmetric Cross-Modal Knowledge Distillation: Bridging Modalities with Weak Semantic Consistency

Riling Wei, Kelu Yao, Chuanguang Yang, Jin Wang, Zhuoyan Gao, Chao Li

Asymmetric Cross-Modal Knowledge Distillation: Bridging Modalities with Weak Semantic Consistency

FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities

The rapid progress of large language models (LLMs) has catalyzed the emergence of multimodal large language models (MLLMs) that unify …

Jin Wang, Yao Lai, Aoxue Li, Shifeng Zhang, Jiacheng Sun, Ning Kang, Chengyue Wu, Zhenguo Li, Ping Luo

FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities

INTER: Mitigating Hallucination in Large Vision-Language Models by Interaction Guidance Sampling

Hallucinations in large vision-language models (LVLMs) pose significant challenges for real-world applications, as LVLMs may generate …

Xin Dong, Shichao Dong, Jin Wang, Jing Huang, Li Zhou, Zenghui Sun, Lihua Jing, Jingsong Lan, Xiaoyong Zhu, Bo Zheng

INTER: Mitigating Hallucination in Large Vision-Language Models by Interaction Guidance Sampling

Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models

Recently, the rapid development of AIGC has significantly boosted the diversities of fake media spread in the Internet, posing …

Jin Wang, Chenghui Lv, Xian Li, Shichao Dong, Huadong Li, Kelu Yao, Chao Li, Wenqi Shao, Ping Luo

Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models

Sparse Beats Dense: Rethinking Supervision in Radar-Camera Depth Completion

It is widely believed that sparse supervision is worse than dense supervision in the field of depth completion, but the underlying …

Huadong Li, Minhao Jing, Jin Wang, Shichao Dong, Jiajun Liang, Haoqiang Fan, Renhe Ji

Diagnosing the Compositional Knowledge of Vision Language Models from a Game-Theoretic View

Compositional reasoning capabilities are usually considered as fundamental skills to characterize human perception. Recent studies show …

Jin Wang, Shichao Dong, Yapeng Zhu, Kelu Yao, Weidong Zhao, Chao Li, Ping Luo