李一卓 Yizhuo Li

I am a Ph.D. candidate at the University of Hong Kong (since September 2022), supervised by Prof. Ping Luo, co-supervised by Prof. Wengping Wang and Prof. Xiaoou Tang (In Memoriam). I obtained my Master's degree in Computer Science from Shanghai Jiao Tong University, supervised by Prof. Cewu Lu, and a Bachelor's degree in Electronic Engineering from Tsinghua University.

Email | Google Scholar | GitHub

Research

My research centers on video-language learning, building intelligent systems that perceive, understand, and generate content in our dynamic world. My work spans the full development lifecycle: from large-scale data curation (InternVid) and multi-modal pretraining of foundation models (InternVideo, VideoChat, InternVid), to efficient, user-end deployment (ARC-Hunyuan-Video).

My recent work has pivoted to visual generation, where I developed novel generative frameworks like DiCoDe and explored efficient latent space alignment. Passionate about bridging the gap between perception and creation, my future research will focus on developing unified models and physically-grounded VLMs. The ultimate ambition is to construct sophisticated world models capable of reasoning, prediction, and complex interaction with the physical world.

Selected Papers

Please refer to Google Scholar for a full publication list.