Search

Ping Luo

TAGRPO: Boosting GRPO on Image-to-Video Generation with Direct Trajectory Alignment
Rotate Your Character: Revisiting Video Diffusion Models for High-Quality 3D Character Generation
Fast-dVLM: Efficient Block-Diffusion VLM via Direct Conversion from Autoregressive VLM
FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities
SWIRL: A Staged Workflow for Interleaved Reinforcement Learning in Mobile GUI Control
Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
Diagnosing the Compositional Knowledge of Vision Language Models from a Game-Theoretic View
MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI

Published with Wowchemy — the free, open source website builder that empowers creators.