Previous motion generation methods are limited to the pre-rigged 3D human model, hindering their
applications in the animation of various non-rigged characters. In this work, we present TapMo, a
Text-driven Animation Pipeline for synthesizing Motion in a broad spectrum of skeleton-free 3D characters.
The pivotal innovation in TapMo is its use of shape deformation-aware features as a condition to guide the
diffusion model, thereby enabling the generation of mesh-specific motions for various characters.
Specifically, TapMo comprises two main components - Mesh Handle Predictor and Shape-aware Diffusion Module.
Mesh Handle Predictor predicts the skinning weights and clusters mesh vertices into adaptive handles for
deformation control, which eliminates the need for traditional skeletal rigging. Shape-aware Motion
Diffusion synthesizes motion with mesh-specific adaptations. This module employs text-guided motions and
mesh features extracted during the first stage, preserving the geometric integrity of the animations by
accounting for the character's shape and deformation. Trained in a weakly-supervised manner, TapMo can
accommodate a multitude of non-human meshes, both with and without associated text motions. We demonstrate
the effectiveness and generalizability of TapMo through rigorous qualitative and quantitative experiments.
Our results reveal that TapMo consistently outperforms existing auto-animation methods, delivering
superior-quality animations for both seen or unseen heterogeneous 3D characters.