作者: Charles R. Qi , Or Litany , Kaiming He , Leonidas Guibas
关键词:
摘要: Current 3D object detection methods are heavily influenced by 2D detectors. In order to leverage architectures in detectors, they often convert point clouds regular grids (i.e., voxel or bird's eye view images), rely on images propose boxes. Few works have attempted directly detect objects clouds. this work, we return first principles construct a pipeline for cloud data and as generic possible. However, due the sparse nature of -- samples from manifolds space face major challenge when predicting bounding box parameters scene points: centroid can be far any surface thus hard regress accurately one step. To address challenge, VoteNet, an end-to-end network based synergy deep set networks Hough voting. Our model achieves state-of-the-art two large datasets real scans, ScanNet SUN RGB-D with simple design, compact size high efficiency. Remarkably, VoteNet outperforms previous using purely geometric information without relying color images.