Multimodal Machine Learning (多模态机器学习)
Undergraduate course, Renmin University of China, 2025
This is a comprehensive course on multimodal machine learning lectured by Prof. Wenbing Huang from GSAI, Renmin University of China. This course predominantly focuses on recent advances in Computer Vision, Vision Language Model (VLM) and Multimodal Large Language Model (MLLM).
The slides and assignments of this course are provided below. In our final project, our team has implemented text-to-image and image-to-text models using multimodal machine learning techniques learnt in this course.