About me

My name is Shukai Gong (龚舒凯). I received my B.S. in Data Science from Renmin University of China in 2026, where I was advised by Prof. Hongteng Xu and Prof. Feng Zhou. I’ll begin my MPhil in Computer Science at Peking University in Fall 2026, advised by Prof. Daquan Zhou.

My current research interest lies in AIGC and Embodied AI. For potential collaboration, please feel free to reach out to me via gongshukai0511[at]gmail.com

News

2026.05: StableVLA has been accepted by ICML 2026.
2025.09: TPP-SD has been accepted by NeurIPS 2025.
2025.01: USPTO-LLM has been accepted by WWW 2025.

Publications

Tech Report

HumanScale: Egocentric Human Video Can Outperform Real-Robot Data for Embodied Pretraining

Juncheng Ma*, Jianxin Bi*, Yufan Deng, Xuanran Zhai, Kewei Zhang, Ye Huang, Bo Liang, Shukai Gong, Jiankai Tu, Xiaotian Tang, Jiaxin Li, Kaiqi Chen, Duomin Wang, Yuqi Wang, Bingyi Kang, Eric Huang, Zhiyang Dou, Zhen Dong, Enze Xie, Wojciech Matusik, Tat-Seng Chua, Daquan Zhou

(* stands for equal contribution)

Paper | Code

With a carefully designed filtering and labeling pipeline, we show that egocentric human video is a scalable pretraining source for embodied foundation models that surpasses real-robot data, especially in out-of-distribution generalization.

ICML 2026

StableVLA: Towards Robust Vision-Language-Action Models without Extra Data

Yiyang Fu, Chubin Zhang, Shukai Gong, Yufan Deng, Kaiwei Sun, Qiyang Min, Qibin Hou, Yansong Tang, Jianan Wang3†, Daquan Zhou1††

Paper | Code

We propose a lightweight adapter module for VLA, IB-Adapter, which selectively filters potential noise from visual inputs. Without requiring any extra data or augmentation strategies, IB-Adapter consistently improves the robostness of VLA, while adding fewer than 10M parameters.

NeurIPS 2025

TPP-SD: Accelerating Transformer Point Process Sampling with Speculative Decoding

Shukai Gong*, Yiyang Fu*, Fengyuan Ran*, Quyu Kong, Feng Zhou†

(* stands for equal contribution)

Paper | Code

By identifying the structural similarities between thinning algorithms for TPPs and speculative decoding for language models, we develop TPP-SD, which maintains the identical output distribution as autoregressive sampling while achieving 2-6× speedup on both synthetic and real datasets.

WWW 2025

USPTO-LLM: A Large Language Model-Assisted Information-enriched Chemical Reaction Dataset

Shen Yuan*, Shukai Gong*, Hongteng Xu†

(* stands for equal contribution)

Paper | Dataset

we construct an information-enriched chemical reaction dataset called USPTO-LLM, which comprises over 247K chemical reactions extracted from the patent documents of USPTO encompassing abundant information on reaction conditions. Experiments show that USPTO-LLM helps pre-train the existing retrosynthesis methods and the condition information in the dataset helps improve the model performance.

Experiences

Research Intern, Bytedance Seed, Beijing, 2026.04 - present

Research on physical video generation and embodied AI.

Research Intern, Pixverse, Beijing, 2025.07 - 2026.01

Research on the acceleration and long-video adaption of visual-audio joint generation.

Honors and Awards

2025, Jing Dong Future Scholar, Renmin University of China (¥10000 CNY).
2025, The ICBC Award for Outstanding Student in Integrated Innovation (¥10000 CNY).
2024, The Jing Dong Premium Scholarship, Renmin University of China (¥10000 CNY).

Educations

2026.09 - present, MPhil in Computer Science, Peking University.
2022.09 - 2026.06, B.S in Data Science and Economics, Renmin University of China.