Fast-dVLM: Efficient Block-Diffusion VLM via Direct Conversion from Autoregressive VLM

Publication
arXiv preprint arXiv:2604.06832
Jin Wang
Jin Wang
CS PhD Student at HKU

My research focuses on multimodal foundation models, especially unified systems that connect visual understanding, generation, and evaluation, with earlier work in deepfake detection and AI interpretability.