Abstract
Part segmentation and motion estimation are two fundamental problems for articulated object motion analysis.
In this paper, we present a method to solve these two problems jointly from a sequence of observed point clouds of a single articulated object.
The main challenge in our problem setting is that the point clouds are not generated by a fixed set of moving points.
Instead, each point cloud in the sequence is an arbitrary sampling of the object surface at that particular time step.
Such scenarios occur when the object undergoes major occlusions, or if the dataset is collected using measurements
from multiple sensors asynchronously. In these scenarios, methods that rely on tracking point correspondences are not appropriate.
We present an alternative approach based on a compact but effective representation where we
represent the object as a collection of simple building blocks modeled as 3D Gaussians.
We parameterize the Gaussians with time-dependent rotations, translations, and scales that are shared across all time steps.
With our representation, part segmentation can be achieved by building correspondences between
the observed points and the Gaussians. Moreover, the transformation of each point across time can be achieved
by following the poses of the assigned Gaussian. Experiments show that our method outperforms existing methods
that solely rely on finding point correspondences. Additionally, we extend existing datasets
to emulate real-world scenarios by considering viewpoint occlusions. We further demonstrate that our method
is more robust to missing points as compared to existing approaches on these challenging datasets,
even when some parts are not always visible. Notably, our part segmentation performance
outperforms the state-of-the-art method by 13% on point clouds with occlusions.