About me
My name is Shukai Gong (龚舒凯). I received my B.S. degree from Renmin University of China in 2026, where I was fortunate to be advised by Prof. Hongteng Xu and Prof. Feng Zhou. I’ll begin my MPhil in Computer Science at Peking University in Fall 2026, advised by Prof. Daquan Zhou.
My current research interest lies in AIGC and Embodied AI. For potential collaboration, please feel free to reach out to me via gongshukai0511[at]gmail.com
News
-
2026.05: StableVLA has been accepted by ICML 2026.
-
2025.09: TPP-SD has been accepted by NeurIPS 2025.
-
2025.01: USPTO-LLM has been accepted by WWW 2025.
Publications

HumanScale: Egocentric Human Video Can Outperform Real-Robot Data for Embodied Pretraining
Juncheng Ma*, Jianxin Bi*, Yufan Deng, Xuanran Zhai, Kewei Zhang, Ye Huang, Bo Liang, Shukai Gong, Jiankai Tu, Xiaotian Tang, Jiaxin Li, Kaiqi Chen, Duomin Wang, Yuqi Wang, Bingyi Kang, Eric Huang, Zhiyang Dou, Zhen Dong, Enze Xie, Wojciech Matusik, Tat-Seng Chua, Daquan Zhou
(* stands for equal contribution)
With a carefully designed filtering and labeling pipeline, we show that egocentric human video is a scalable pretraining source for embodied foundation models that surpasses real-robot data, especially in out-of-distribution generalization.

StableVLA: Towards Robust Vision-Language-Action Models without Extra Data
Yiyang Fu, Chubin Zhang, Shukai Gong, Yufan Deng, Kaiwei Sun, Qiyang Min, Qibin Hou, Yansong Tang, Jianan Wang3†, Daquan Zhou1††
We propose a lightweight adapter module for VLA, IB-Adapter, which selectively filters potential noise from visual inputs. Without requiring any extra data or augmentation strategies, IB-Adapter consistently improves the robostness of VLA, while adding fewer than 10M parameters.

TPP-SD: Accelerating Transformer Point Process Sampling with Speculative Decoding
Shukai Gong*, Yiyang Fu*, Fengyuan Ran*, Quyu Kong, Feng Zhou†
(* stands for equal contribution)
By identifying the structural similarities between thinning algorithms for TPPs and speculative decoding for language models, we develop TPP-SD, which maintains the identical output distribution as autoregressive sampling while achieving 2-6× speedup on both synthetic and real datasets.

USPTO-LLM: A Large Language Model-Assisted Information-enriched Chemical Reaction Dataset
Shen Yuan*, Shukai Gong*, Hongteng Xu†
(* stands for equal contribution)
we construct an information-enriched chemical reaction dataset called USPTO-LLM, which comprises over 247K chemical reactions extracted from the patent documents of USPTO encompassing abundant information on reaction conditions. Experiments show that USPTO-LLM helps pre-train the existing retrosynthesis methods and the condition information in the dataset helps improve the model performance.
Experiences

Research Intern, Bytedance Seed, Beijing, 2026.04 - present
Research on physical video generation and embodied AI.

Research Intern, Pixverse, Beijing, 2025.07 - 2026.01
Research on the acceleration and long-video adaption of visual-audio joint generation.
Honors and Awards
-
2025, Jing Dong Future Scholar, Renmin University of China (¥10000 CNY).
-
2025, The ICBC Award for Outstanding Student in Integrated Innovation (¥10000 CNY).
-
2024, The Jing Dong Premium Scholarship, Renmin University of China (¥10000 CNY).
Educations
-
2026.09 - present, MPhil in Computer Science, Peking University.
-
2022.09 - 2026.06, B.S in Data Science and Economics, Renmin University of China.