Research Scientist in Multimodal Interaction and World ...
ByteDance - San Jose, CA
Apply NowJob Description
About the team The Seed Multimodal Interaction and World Model team is dedicated to developing models that have human-level multimodal understanding and interaction capabilities. The team is working to advance the exploration and development of multimodal assistant products. Responsibilities - Develop multimodal foundation models integrating vision, language, audio, and environment signals. - Design and optimize world models for reasoning, planning, and interaction. - Build training pipelines including data curation, alignment, and reinforcement learning. - Improve agent capabilities such as perception, memory, decision-making, and tool use. - Explore next-generation interaction paradigms between humans and intelligent systems. Minimum Qualifications: - Currently pursuing a PhD in computer science, mathematics, engineering, or a related field, with an expected graduation date in 2027 and the ability to commit to an onboarding date by the end of 2027. - Excellent coding ability, data structures, and fundamental algorithm skills, proficient in C/C++ or Python, etc. - Experience in multimodal learning, reinforcement learning, or agent systems. - Familiarity with large-scale model training or simulation environments. Preferred Qualifications: - Strong research track record in relevant areas. - Strong problem-solving and collaboration skills.
Created: 2026-04-19