This paper introduces a framework for synthesizing multi-stage scene-aware interaction motions
and a comprehensive language-annotated MoCap dataset (LINGO).
LINGO Dataset coming soon.
Code coming soon.
Our proposed method generates realistic character motion in 3D scenes based on a single textual instruction and target location.
Our method uses an auto-regressive diffusion model with a dual voxel scene encoder and an autonomous scheduler. The text instructions are encoded with the time frame for time-specific semantic guidance.
We qualitatively compare our method with baseline results. Our method generates characters that actively avoid penetrating the scene and exibit natural cues of scene awareness.
we present a comprehensive MoCap dataset comprising 16 hours of motion sequences in 120 indoor scenes covering 40 types of motions, each annotated with precise language descriptions.
LINGO leverages a "synthetic vision" projected into a VR headset worn by the motion actor. We utilize the advanced VICON MoCap system ensuring a high motion data quality.
We design a custom Blender addon to facilitate the MoCap process. The addon provides a first-person and a third-person view that follows the Actor's movement.
We create awesome templates for you.