Under review
TL;DR: MMFP can generate motions from text inputs, capturing complex text dependencies in motion distributions while simultaneously addressing the challenges posed by high-dimensional trajectory data and small dataset sizes.
Developing text-based robot trajectory generation models is made particularly difficult by the small dataset size, high dimensionality of the trajectory space, and the inherent complexity of the text-conditional motion distribution. Recent manifold learning-based methods have partially addressed the dimensionality and dataset size issues, but struggle with the complex text-conditional distribution. In this paper we propose a text-based trajectory generation model that attempts to address all three challenges while relying on only a handful of demonstration trajectory data. Our key idea is to leverage recent flow-based models capable of capturing complex conditional distributions, not directly in the high-dimensional trajectory space, but rather in the low-dimensional latent coordinate space of the motion manifold, with deliberately designed regularization terms to ensure smoothness of motions and robustness to text variations. We show that our Motion Manifold Flow Primitive (MMFP) framework can accurately generate qualitatively distinct motions for a wide range of text inputs, significantly outperforming existing methods.
!
A naive application of existing algorithms directly in the
high-dimensional trajectory space
significantly fails, given only a handful of demonstration data.
!
The conditional autoencoder architecture, where the decoder takes both the latent variable \(z\) and text \(\tau\) as inputs,
struggles to capture complex text dependencies in motion distributions while addressing issues related to dimensionality and dataset size.
References:
1. Task-Conditioned Variational Autoencoders for Learning Movement Primitives (Noseworthy et al., CoRL 2019)
2. Equivariant Motion Manifold Primitives (Lee et al., CoRL 2023)
@article{lee2024motion,
title={Motion Manifold Flow Primitives for Language-Guided Trajectory Generation},
author={Lee, Yonghyeon and Lee, Byeongho and Kim, Seungyeon and Park, Frank C},
journal={arXiv preprint arXiv:2407.19681},
year={2024}
}