I am a Principal AI Scientist at Zoom AI, where I focus on Agentic Memory and LLM Post-training.
Currently, I am passionate about building the next generation platform where humans and agents collaborate seamlessly—from incubating ideas to enterprise-scale automation.
Before joining Zoom, I was at Snap Research and Microsoft Azure AI, where I worked on multimodal generation, video synthesis, and knowledge-grounded language models.
studyfang AT gmail.com | Redmond, Washington
Google Scholar / LinkedIn / Github / Twitter
ACL 2024
LoCoMo: A comprehensive benchmark for evaluating long-term conversational memory capabilities of large language model agents across extended dialogues.
CVPR 2024
A large-scale dataset of 70 million video clips with high-quality captions generated using multiple cross-modality teacher models for video-language understanding.
CVPR 2024
A scalable transformer architecture for high-quality text-to-video generation, enabling controllable and coherent video synthesis from textual descriptions.
EMNLP 2024
A framework for controllable video generation using multi-modal instructions to ground the generation process with precise spatial and temporal control.
AAAI 2023
A unified framework for integrative multimodal learning across vision, language, and speech modalities with composable pre-training objectives.
EMNLP 2020
A hierarchical graph neural network approach for multi-hop reasoning in question answering, achieving state-of-the-art results on HotpotQA.