Optimal 6D Object Pose Estimation with Commodity Depth Sensors

Landau, Michael, Systems Engineering - School of Engineering and Applied Science, University of Virginia
Beling, Peter, Department of Systems and Information Engineering, University of Virginia

Accurate 6D object pose estimation, as well as other model-based shape matching objectives such as object detection, classification, and shape inspection, is prominent and necessary in a large number of domains and applications. This includes household and robotics applications to automate various tasks, where input depth measurements are aligned to a corresponding model of the object, such as a CAD that is composed of several interconnected parts, depending on the model's level of detail and complexity. Industrial metrology systems also compare the shapes of engineered components and equipment to their corresponding 3D models for quality assessment and anomaly detection; this often requires very costly and high resolution projection and detection mechanisms. Structured-light laser scanners have, however, emerged as a viable option to provide cheap and reasonably accurate depth estimates, which have recently become commodity-priced within this decade due to the maturity of the technology. These sensors are also robust for semi-transparent and reflective surfaces, dynamic scenery, ambient background light, and temperature drift. Certain structured-light light coding methods unfortunately need to implement a complicated, non-invertible transformation of a distorted light pattern to generate the pixel-by-pixel depth estimates, thereby leading to a loss of information. This can result in poorly estimated or even missing depth data, especially when the object has small 3D features or non-ideal surface properties. Moreover, the informative rigid body constraint is blurred when the nonlinear 2D depth image to 3D pseudo-measurement point cloud transformation is applied, which is the domain where most model-based shape matching methods operate.

Alternatively, estimating pose from information early in the structured-light sensing/processing chain has the potential to alleviate errors induced in subsequent nonlinear processing steps that are in fact locally scene dependent. This motivates one of the main contributions from this dissertation, which presents an asymptotically optimal maximum likelihood estimation method that operates directly on the raw output IR images. This is made possible by an extensive study on a commonly utilized commodity-priced structured-light sensor, which contributed to the proposed high-fidelity IR and depth image predictor and simulator that models the physics of the transmit and receive optics, the unique IR pattern, and the statistical speckle and detector noise distributions. A new method is also formulated to compute the Fisher information contained in the IR images of the unique structured-light measurement data in order to establish the Cramer-Rao bound, i.e. the lower bound on error for any unbiased pose estimator. The proposed shape-based matching method is shown to outperform cutting edge point set registration methods by an order of magnitude in the respective mean square errors when applied to object pose estimation, and also to approximately attain the Cramer-Rao bound, thereby demonstrating near optimality. This method is additionally shown to produce nearly identical cost evaluation times as a function of model complexity, and to consistently converge in the neighborhood of the true pose parameter, regardless of the number of pixels on target or initialized global optimization error bound.

PHD (Doctor of Philosophy)
Cramer-Rao Bound, Efficiency, Fisher Information Matrix, Industrial Metrology, Maximum Likelihood Estimation, Structured-Light Depth Sensor, Uniformly Minimum Variance Unbiased Estimator
All rights reserved (no additional license for public reuse)
Issued Date: