Input

Per-frame Part Segmentation and Motion Estimation

Kinematic Structure

Kinematic Structure Estimation

Abstract

Part segmentation and motion estimation are two fundamental problems for articulated object motion analysis. In this paper, we present a method to solve these two problems jointly from a sequence of observed point clouds of a single articulated object. The main challenge in our problem setting is that the point clouds are not generated by a fixed set of moving points. Instead, each point cloud in the sequence is an arbitrary sampling of the object surface at that particular time step. Such scenarios occur when the object undergoes major occlusions, or if the dataset is collected using measurements from multiple sensors asynchronously. In these scenarios, methods that rely on tracking point correspondences are not appropriate. We present an alternative approach based on a compact but effective representation where we represent the object as a collection of simple building blocks modeled as 3D Gaussians. We parameterize the Gaussians with time-dependent rotations, translations, and scales that are shared across all time steps. With our representation, part segmentation can be achieved by building correspondences between the observed points and the Gaussians. Moreover, the transformation of each point across time can be achieved by following the poses of the assigned Gaussian. Experiments show that our method outperforms existing methods that solely rely on finding point correspondences. Additionally, we extend existing datasets to emulate real-world scenarios by considering viewpoint occlusions. We further demonstrate that our method is more robust to missing points as compared to existing approaches on these challenging datasets, even when some parts are not always visible. Notably, our part segmentation performance outperforms the state-of-the-art method by 13% on point clouds with occlusions.

Method Overview

Method Overview

Our method represents articulated objects as a collection of 3D Gaussians with time-dependent transformations. Each Gaussian is parameterized with rotations, translations, and scales that are shared across all time steps, enabling joint part segmentation and motion estimation from point cloud sequences. During optimization, we randomly sample a point cloud from the sequence in every iteration. For the sampled point cloud, we segment the points by assigning them to the Gaussian with the smallest Mahalanobis distance. We then transform the segmented points to all other time steps using the corresponding Gaussian poses, and enforce similarity by comparing the transformed point clouds with the observed ones at those time steps.

Result on clean point clouds

Input Pred Gaussians Pred Segmentation G.T. Input vs Pred Motion

Result on occluded point clouds

Input Pred Gaussians Pred Segmentation G.T. Input vs Pred Motion

Video