...

Registration of Noisy Point Clouds using Virtual Interest Points

by user

on
Category:

physics

137

views

Report

Comments

Transcript

Registration of Noisy Point Clouds using Virtual Interest Points
Registration of Noisy Point Clouds using Virtual Interest Points
Mirza Tahir Ahmed∗ , Mustafa Mohamad† , Joshua A. Marshall∗ , and Michael Greenspan∗
Department of Electrical and Computer Engineering∗
School of Computing†
Queen’s University, Kingston ON, Canada
Email: {81mta, 8mm104, joshua.marshall, [email protected]
Abstract—A new method is presented for robustly and
efficiently registering two noisy point clouds. The registration is
driven by establishing correspondences of virtual interest points,
which do not exist in the original point cloud data, and are
defined by the intersection of parametric surfaces extracted
from the data. Parametric surfaces, such as planes, exist in
abundance in both natural and artificial scenes, and can lead
to regions in the data of relatively low noise. This in turn
leads to repeatable virtual interest points, with stable locations
across overlapping images. Experiments were run using virtual
interest points defined by the intersection of three implicit
planes, applied to data sets of four environments comprising
100 point clouds. The proposed method outperformed the
Iterative Closest Point, Generalized Iterative Closest Point, and
a 2.5D SIFT-based RANSAC method in registering overlapping
images with a higher success rate, and more efficiently.
Keywords-registration; point cloud; Virtual Interest Point;
Key Point; Parametric Surfaces; Planes;
I. I NTRODUCTION
Registration is the process of rigidly transforming the
coordinate reference frames of two or more 3D point clouds,
such that their intersecting regions overlap correctly. Therefore, detection of the area of overlap between two point
clouds plays an important role. This is done by finding
corresponding points between the two point clouds, using
one of two approaches. The first approach assumes that the
point clouds are initially approximately aligned, and applies
a nearest neighbor search to establish point correspondences
based on their proximity in 3D space. The registration
process is completed when a convergence criterion is satisfied, such as a minimal threshold of the average separation
between correspondences, or a maximum threshold on the
number of iterations. Such techniques can be termed nonfeature based registration. Iterative Closest Point (ICP) [1]
methods [2] are based on this approach.
The second approach is feature based registration, in
which the data sets are processed to extract interest points
and feature descriptors. In this approach, there is no requirement for an initial approximate alignment of the data
sets, and the correspondences are established by determining
matches in descriptor space [3], [4]. A transformation is
calculated using the correspondences. Typically, this process
is wrapped in a robust framework such as RANSAC, to
accommodate any false correspondences (i.e., outliers) that
result from occasional and inevitable mismatches in descriptor space.
The existence of noise in point cloud data is one source
of outliers, and presents challenges in determining correspondences for both feature based and non-feature based
registration. Noise is present in the data for several reasons,
such as the fundamental physics and measurement technique
upon which the sensor is based, which is referred to collectively as sensor-specific noise [5]. Sensor-specific noise
includes pixel position error, axial noise, and quantization
error. Alternately, scene-specific noise is a result of the
sensor’s limitation in correctly observing certain areas in
the scene such as corners, edges, and reflective objects
[6]. Feature based registration techniques generally identify
interest points in regions with large shape variations. Unfortunately, these regions of large variation are among those
which are the most highly affected by scene-specific noise,
thereby increasing the chances of feature mismatches and
registration inaccuracies and failures.
In this paper a novel method is proposed for registering
two noisy point clouds. The main idea is to observe the
underlying structure of the scene, extract implicit parametric surfaces (such as planes), identify the intersection
of multiple parametric surfaces as virtual interest points,
and then use these virtual interest points to register the
original point clouds. Parametric surfaces such as planes
are the most stable areas in the scene because they contain
minimal shape variation, and are therefore less susceptible
to both sensor-specific and scene-specific noise [5]. The
stability of the underlying parametric surfaces can result in
stable and repeatable virtual interest points, which leads to
accurate registration results. In addition, only one correct
correspondence of virtual interest points is sufficient to
register two point clouds, which significantly improves the
computational efficiency.
A. Related Work
The most widely-used approach for non-feature based
registration of two point clouds is the ICP algorithm [1], and
many variants of ICP have been developed that use different
error functions and point selection strategies. The three most
commonly used distance functions are point-to-point, pointto-plane, and plane-to-plane [1], [7], [8]. The transformation
is iteratively refined by applying the computed transformation and recalculating the correspondences. For large initial
offsets between two noisy point clouds, however, ICP based
approaches will fail to converge [8].
For feature based registration, there exist a number of
different 3D interest point operators that are based on different properties of the data, and yet extracting interest points
repeatably among multiple point clouds is a challenge. The
Local Surface Patch (LSP) approach [9] searches for areas
with large shape variations as measured by shape indices,
which are calculated through principal curvature. Intrinsic
Shape Signatures (ISS) [10] addresses the detection of viewinvariant interest points. 2.5D SIFT [11] detects interest
points using an enhanced version of the Lowe’s Scale Invariant Feature Transform (SIFT) algorithm [12]. A discrete
scale-space representation by using Gaussian smoothing and
Difference Of Gaussian (DOG) pyramid techniques is first
created, and maxima are detected within the DOG scalespace. The interest points are finally validated and located
by comparing the ratio of the principal curvatures with a
predefined threshold.
Once extracted, the support region for each interest point
is used to compute a unique feature descriptor. Again there
are a number of methods for feature descriptor computation.
Signature based methods describe the local neighborhood
of an interest point by encoding one or more geometric
measures computed individually at each point of a subset of
the neighborhood [13]. Histogram based methods describe
the local neighbourhood of an interest point by accumulating geometric or topological measurements into histograms
[14]. A comprehensive overview of a number of interest
point and feature descriptor techniques is given in [15].
Given two sets of feature descriptors from two acquired
scans, correspondences are computed to find overlapping
parts in the data. For matching in feature descriptor space,
brute force matching and k-d tree nearest-neighbour search
(e.g., FLANN) have been applied [16]. Since incorrect
correspondences can negatively affect the estimation of the
final transformation, some outlier rejection method such
as RANSAC is required. The last step is to compute the
transformation based on the best correspondences, typically
minimizing some least square measure of error.
The proposed method falls under the category of feature
based registration. Virtual interest points (VIPs) are identified by observing the underlying structure of the scene.
The use of the term virtual highlights that these points
do not exist in the original point cloud data; rather, they
are 3D points that are injected at distinct locations based
on a computation. The VIPs between two point clouds are
further annotated, based on the properties of their supporting
implicit parametric surfaces, and are matched in feature
descriptor space. A rigid transformation to register two
point clouds can be computed from a single true VIP
correspondence. As discussed in Section III, experimenta-
(a) Point cloud 1
(b) Point cloud 2
(c) Registered point clouds
Figure 1. Registration of two point clouds. Extracted planes are clustered
in different colors and VIPs are in large squares in (a) Point cloud 1 and
(b) Point Cloud 2. (c) Point cloud 1 and Point cloud 2 aligned together
using matched VIPs.
tion has shown improvements in both registration success
and computational efficiency over ICP [1], GICP [7], and
conventional feature based registration techniques [11].
The remainder of this paper is organized as follows: In
Section II, the plane extraction technique is described, as
is the criteria to compute VIPs, and the steps of registration using VIPs. In Section III, experimental evidence
demonstrates that the method registers noisy point clouds
accurately and efficiently compared to other well known
techniques. Finally, Section V concludes the paper with a
summary and discussion of future work.
II. V IRTUAL I NTEREST P OINTS FOR R EGISTRATION
Repeatable interest point detection presents a challenge
in feature based registration due to the presence of noise
in the data. To address this, the proposed method limits the
search space to detect interest points by avoiding noisy areas
where there are large shape variations. Specifically, planar
parametric surfaces that contain minimal shape variations
are extracted in the point cloud data. Planar regions are less
affected by the sensor-specific and scene-specific noise, and
are thus considered as the most stable regions [17]. Once
such planar surfaces are extracted, the intersection of three
non-parallel parametric surfaces result in VIPs.
A. Plane Detection
There are multiple plane extraction algorithms, available
such as RANSAC [18], Hough transform [19], and region
growing segmentation [20]. We used region growing segmentation. It is based on characterizing the surface curvature
of the points in the point cloud. Neighbouring points are
merged together as a region if they are similar enough under
the smoothness constraints.
First, all points are sorted by their curvature values so
that the region begins its growth from the point that has
the minimum curvature value. The surface curvature γ for a
point p is computed as
γ=
λ0
,
λ0 + λ1 + λ2
(1)
where λi are the eigenvalues of the covariance matrix for
a support region of 50 nearest points in the neighbourhood
of p. The point with the minimum curvature is identified as
residing in a flat (i.e., planar) region, and is used as a seed
point.
For every seed point, the algorithm finds the support
region around the seed. The following steps are repeated
until all the points in point cloud are tested.
1) Every neighbour point is tested for the angle between
its normal and normal of the current seed point. If the
angle is less than 5 degrees then the current point is
added to the current region.
2) If the surface curvature γ of the neighbour point is
less than a threshold, it is added to the set of seeds.
This helps to grow the region beyond the current
neighbourhood of the seed point.
3) When all the neighbours of the current seed point are
tested, the seed point is removed from the set of seeds.
The output of the algorithm is a set of regions, as shown
in Fig. 1 (a) and (b). Seed points are shown in different
colors, depicting each planar region. After region growing,
points with large curvature values are rejected. Each region
is processed further to find out the plane coefficients using
eigenvalue decomposition.
Once the plane coefficients are computed, the planes
database is populated with the following information:
• Group index: Parallel planes are grouped together so
that spurious calculations can be avoided.
• Plane coefficients: Coefficients of the plane equations,
that are used to compute the VIPs.
• Number of points Ps : A trust measure, such that a
greater value of Ps indicates a more reliable plane.
The following two matrices are
system of planar equations

A1 B1
Mc =  A2 B2
A3 B3

A1 B1
Ma =  A2 B2
A3 B3
derived from the above

C1
C2 
C3

D1
D2 
D3
(3)
(4)
where Mc and Ma are the coefficient matrix and the augmented matrix, respectively. Three non-parallel planes can
only intersect at a point if both Mc and Ma are full rank. Mc
contains the row unit normals for each plane, and the closer
the determinant of Mc is to 1, the closer the three planes are
to being mutually perpendicular.
If Mc and Ma are full rank, then the point of common
intersection, which is a candidate VIP, is calculated using
(2) as
  
−1 

x
A1 B1 C1
−D1
 y  =  A2 B2 C2   −D2 
(5)
−D3
z
A3 B3 C3
In total, each set of three planes has to satisfy two
conditions to generate a qualified VIPs. First, the rank of
Mc and Ma must be 3. Second, det(Mc ) > σ , where σ is
a threshold that is close to 1 and indicates the degree of
orthogonality of the three planes.
Qualified VIPs are added to a database along with their
associated feature descriptors, which are composed of the
following properties:
1) Angles between the three planes: α, β , and γ,
2) Degree of orthogonality: det(Mc ),
3) Trust factor: t f , which is calculated as
Ps + Ps2 + Ps3
(6)
tf = 1
Ts
where Ps1 , Ps2 , and Ps3 are the number of points of three
associated planes and Ts is the size of point cloud. The higher
the value of t f , the more reliable the VIP.
C. VIP Correspondences
B. VIPs and Feature Descriptors
The plane database is processed next to extract VIPs,
as shown in Fig. 1 with brown colored blocks. Three
planes, each with a different group index (and therefore
non-parallel), are processed to find a common point of
intersection. The following system of equations of the planes
are established to calculate the degree of orthogonality of
three planes, and their intersection points
A1 x + B1 y +C1 z + D1 = 0
A2 x + B2 y +C2 z + D2 = 0
A3 x + B3 y +C3 z + D3 = 0
(2)
There a number of methods available for feature matching to establish a set of VIP correspondences. A nearestneighbour search mechanism is employed to match the VIPs
in two point clouds. The k-d tree is populated where each
property of the feature descriptor serves as a dimension.
A source VIP is matched with the nearest target VIP in
descriptor space. An obvious selection of the best correspondence is the one with minimum distance in feature space.
We only need one VIP true correspondence to register two
point clouds. However, it is possible that a selection based
on only minimum distance in descriptor space might be an
outlier. Therefore, it is important to derive a mechanism that
helps to select the most probable correct match.
We group the correspondences using the geometric consistency clustering algorithm that enforces simple geometric
constraints between pairs of correspondences [9]. Possible
corresponding pairs are filtered based on geometric constraint,
dc1,c2 = kds1,s2 − dm1,m2 k < ε,
(7)
(a) Original point cloud pair
where ds1,s2 and dm1,m2 are the Euclidean distances between
the two VIPs in two point clouds. The constraint guarantees
that the distances ds1,s2 and dm1,m2 are consistent, for two
correspondences c1 = (s1, m1) and c2 = (s2, m2). Therefore,
potential corresponding pairs are grouped together. The
larger the group is, the more likely it contains the true
correspondences. We use RANSAC to find the single best
correspondence among the largest group. A transformation
is computed for a random correspondence using the method
explained in Section II-D, and the residual error is minimized. The correspondence with the minimum residual error
is considered to be the best correspondence.
(b) Ground truth
(c) VIP
D. Registration
The 2D registration process using a single true correspondence is given by matching VIPs vq and vm , as shown in
Fig. 1 (c). It is a three-step registration process, with one
translation T followed by two rotations. The translation is
calculated trivially as the difference between the two VIPs;
i.e., T = vqxyz − vmxyz . To calculate the two rotations, first it is
recognized that the two VIPs have three matched planes, so
that three angles between the matched planes are computed
as
!
(n
·
n
)
q
m
i
i
(8)
αi = cos−1 p
(nqi · nqi ) ∗ (nmi · nmi )
where αi is the angle between the ith normals nqi and nmi
of the two matched VIPs.
We select the two largest angles and calculate the rotation matrix R using Rodrigues’ rotation formula, in which
rotation in one axis is given by
R = I + sin α1 [k]× + (1 − cos α1 )(kkT − I).
(9)
where k = (nq1 × nm1 ) is the cross product of two normals
and [k]× is a cross product matrix of k.
Two point clouds are registered together using the translation matrix T and the rotation matrix R. Hence using only
one true correspondence two point clouds are registered.
This significantly improves the computational efficiency of
the registration process.
III. E XPERIMENTAL R ESULTS
Four point cloud data sets were collected, using different
sensors, comprising of both artificial and natural structures to
compare the proposed method against ICP, GICP, and 2.5D
SIFT for registration. The sensors used to acquire data were
(d) VIP-ICP
Figure 2. Comparison of Ground truth, VIP, and VIP-ICP with an original
point cloud pair. (a) Raw point clouds are overlaid on each other, shows
translation and rotation of sensor in side and top views. (b) Ground truth
registration computed using ICP with 500 iterations. (c) Registration with
VIP. (d) VIP registration refined with ICP.
Microsoft Kinect (version 1), Asus Xtion, and SwissRanger
SR4030. The Kinect is an inexpensive sensor and exhibits
significant quantization error as the sensor moves farther
away from the scene. The Asus Xtion uses infrared sensor
and adaptive depth detection technology. It provides data
of 1 cm depth accuracy in the field of view at 30 frames
per second. The SR4030 is a time-of-flight sensor. For an
object at a given distance, the range measurement results in
a distribution of measured values centred on a mean value.
This introduces a significant amount of noise in the data
at the corners and edges in the scene. Each SR4030 scan
consists of 25344 points with depth information.
The description of each data set is given as:
•
Panel is captured by the Asus Xtion. The sensor is
moved in half a meter height along a zig-zag structure
built from wooden panels. It consists of 15 frames. This
data set is available online [21]. The ground truth of this
data set is generated using ICP with 500 iterations and
verified through visual inspection. ICP registers frames
very accurately because of two reasons. First, the inter-
•
•
•
frame offset is very small and secondly, this data set
has low noise due to short distance between the sensor
and the surface.
Lab consists of 28 consecutive scans of a small area of
a laboratory environment, using the Kinect. This data
set was captured at a very small distance of average
1.5 m to reduce noise effects. The translation between
subsequent scans is very small and contains a large area
of overlap.
Office consists of 12 scans captured of an office environment by rotating the Kinect sensor approximately
around its vertical axis through a 360◦ arc. This data
set was captured at a distance of on average 3 m to
5 m, and so it contains a higher amount of noise and
quantization error than the Lab data set. The interscan
displacement is also larger than that of the Lab data set,
but still contains a reasonable area of overlap between
successive scans.
Rocks represents a natural structure. It consists of
50 consecutive scans of a rock cut, captured near a
highway overpass using an SR4030 sensor. This data set
was captured at a very small distance of on average 1
m. The observed rock cut was composed of an exposed
granite outcrop, which is part of the Canadian shield
formation. During the construction of highways, these
rocks are blasted and thus the rock faces are revealed
to exhibit their underlying semi-regular approximately
planar surfaces.
VIP with and without ICP refinement is compared to the
ground truth Panel data set. In the case of VIP refined with
ICP (VIP-ICP), the iterations of ICP are set to 5 so that
the computation efficiency of registration does not degrade.
The proposed technique converged to a global minimum for
93% of frame pairs. Camera origin and orientation for 15
frames are tracked with VIP, VIP-ICP, and ground truth.
Thus, ground truth is compared to VIP and VIP-ICP for
translation in x, y, and z axes as well as for rotation about
three perpendicular axes, termed as pitch, yaw and roll. To
quantify the transformation error with respect to the size of
err )∗100
,
the point clouds, percentage error is computed as (Tdia(PC)
where Terr is the translation error relative to the ground truth
and dia(PC) is the average diameter of point clouds, PC, of
the data set. The percentage error for translation for VIP and
VIP-ICP are (1.76, 0.55, 0.46) and (1.00, 0.28, 0.16), respectively. The average rotational error in degrees is computed
as (0.64, 1.76, 1.14) and (0.69, 0.74, 0.81) for VIP and VIPICP, respectively. It can be seen that the error in each of
the 6Dof is very low for both VIP and VIP-ICP, thus the
quality of registration is good. It can be visually inspected
in Fig. 2. Similarly, there is not a significant improvement
if ICP refinement is added as a post processing step to the
proposed technique. However, the average computation time
per pair for VIP-ICP, 4.27 s, is almost double than VIP,
(a) Original Clouds
(b) ICP
(c) GICP
(d) 2.5D SIFT
(d) VIP
Figure 3. Lab data set registration of two point clouds. (a) Two raw point
clouds are displayed from a bird’s eye view without registration to illustrate
the displacement between them. (b) Registration using ICP method that
takes 100 iteration to converge to a global minimum. (c) GICP registers
better with 100 iteration but still not accurate . (d) 2.5D SIFT fails to
align two point clouds. (e) VIP registers better than other techniques, with
registration performed with only a single correspondence.
2.41 s. Therefore, VIP is used for registration for the rest of
the data sets.
Registration applied to three data sets using four different
techniques. Two variants of iterative closest point algorithm,
ICP and GICP, were compared as non-feature based registration methods. The number of iterations are limited to a
maximum of 100 since convergence is typically achieved before this point, if at all. The rate of convergence predictably
dropped by reducing the number of iterations in ICP and
GICP. 2.5D SIFT is used for feature based registration, as
is the VIP method described above. The feature descriptor,
Fast Point Feature Histograms (FPFH) [14], is used for
feature matching of the extracted 2.5D SIFT interest points.
The success of each method was evaluated qualitatively by
visually comparing the convergence to a global minimum.
The efficiency of each method was compared on an Intel
Core i7 machine with 6 GB of RAM.
On the Lab data set using Kinect, ICP and GICP converged to a global minimum for 86.2 % and 89.6 % of
image pairs, respectively. However, 2.5D SIFT converged
only for 18.5 % of pairs. The 2.5D SIFT interest points
(a) RGB Image
(b) Original Clouds
(c) ICP
(d) GICP
(e) 2.5D SIFT
(f) VIP
(a) Original Clouds
(b) ICP
(c) GICP
(d) 2.5D SIFT
(d) VIP
Figure 4. Rocks data set registration of two point clouds. There is no color
information in SR4030, therefore points are assigned different colors for
visual understanding. (a) RGB image of rocks. (b) Two raw point clouds
are displayed from a bird’s eye view without registration to illustrate the
displacement between them. (c) Registration using ICP method that takes
100 iteration to converge to global minimum which is better than SIFT3D.
(d) GICP takes 100 iteration but convergence is not good. (e) 2.5D SIFT
fails to align two point clouds. (f) VIP registers better than other techniques.
Figure 5. Office data set registration of two point clouds. (a) Two raw
point clouds are displayed from a bird’s eye view without registration to
illustrate the displacement between them. (b) Registration using ICP method
that takes 100 iteration to converge to a global minimum which is better
than GICP and SIFT3D. (c) GICP takes 100 iteration but still not converge.
(d) 2.5D SIFT fails to align two point clouds. (e) VIP registers better than
other techniques.
were not highly repeatable because it searched for interest
points at regions of large shape variation that suffer due to
scene-specific noise. For this reason, the registration failed
even with significant overlap between the point clouds. On
the other hand, VIP successfully registered all 27 pairs of
images (100 %). Fig. 3 illustrates the registration results
using the four techniques on a pair of scans. The average
processing time of VIP is 2.81 s per pair. 2.5D SIFT has an
average processing time of 25.0 s per pair. ICP with 77.1 s
per pair is slightly more efficient than GICP with 171.3 s
per pair. Therefore, VIP significantly outperformed the other
three techniques in computational efficiency.
On the Office data set using the Kinect sensor, ICP and
GICP converged to a global minimum for 40.9 % and 27.3 %
of image pairs out of 11 pairs, respectively. However, 2.5D
SIFT converged for only 0.09 % of pairs. On the other hand,
VIP successfully converged for 90.9 % of 11 pairs, failing
where the plane detection was erroneous. This is a noisy
data set due to the quantization error that resulted from the
acquisition procedure. Fig. 5 illustrates the registration using
the four techniques on a pair of scans. 2.5D SIFT converged
only for a single pair with a runtime of 45.3 s per pair.
The computation time for ICP and GICP was 169.91 and
567.45 s per pair, respectively. ICP and GICP converged very
slowly due to the large scan to scan displacement. Whereas
VIP registered efficiently with an average processing time
of 0.6 s per pair.
The Rocks data set was acquired using SR4030 sensor.
Since only depth information is extracted in this data set, a
RGB image of a rock is shown for illustration in Fig. 4(a).
This data set contained a large amount of noise and due to
the lack of color information, it is difficult to distinguish between two point clouds. Therefore, the two clouds are shown
in different colors. ICP and GICP converged to a global
minimum for 85.7 % and 87.7 % of pairs, respectively. 2.5D
SIFT failed to converge due to non-repeatability of interest
point because of noise in the data set. It converged for only
0.12 % of pairs. On the other hand, VIP performed well on
this data, as it converged for 93.9 % of pairs. The efficiency
performance of all the algorithms was better for this data
set due to the fewer number of points in the point clouds.
VIP shows slight improvement in performance as compared
to others because the computation is based on parametric
surfaces rather than number of points in the point cloud.
The average processing time of VIP is 2.32 s. The results
are summarized in Table I.
Registration normally fails due to noise in the data
Table I
F OUR TECHNIQUES ARE COMPARED ON THREE DATA SETS . C ONVERGENCE (C ON . (%)) AND AVERAGE COMPUTATION TIME (AVG . T IME ) ARE
SHOWN . F OR ICP AND GICP NUMBER OF ITERATIONS ARE ALSO GIVEN (I TR ).
Data set
Lab
Office
Rocks
Pairs
27
11
49
Con.(%)
86.20
40.90
85.71
ICP
Avg. Time
77.13
169.91
7.84
Itr.
100
100
100
Con.(%)
89.65
27.27
87.75
GICP
Avg. Time
171.29
567.45
12.83
Itr.
100
100
100
2.5D SIFT
Con.(%) Avg. Time
18.51
25.04
0.09
45.3
0.12
2.26
Con.(%)
100
90.9
93.87
VIP
Avg. Time
2.81
2.62
2.32
(a) Original
(b) ICP - 0.0001
(c) SIFT - 0.0001
(d) VIP - 0.0001
(e) ICP - 0.001
(f) SIFT - 0.001
(g) VIP - 0.001
(h) ICP - 0.01
(i) SIFT - 0.01
(J) VIP - 0.01
Figure 6. Convergence degradation due to noise. Noise standard deviations σ = 0.0001, 0.001, and 0.01 are added to two point clouds and tested with
ICP, 2.5D SIFT, and VIP.
sets. Another experiment was performed to test the convergences in the presence of noise. We added Gaussian
white noise with zero mean and standard deviation σ =
0.0001, 0.001, 0.01 m to the two point clouds shown in Fig. 1
and performed registration using ICP, 2.5D SIFT, and VIP.
The results are given in Fig. 6. It can be noticed that the
structure is not visible when σ = 0.01. It is noticed that 2.5D
SIFT failed to converge even for noise standard deviation
of σ = 0.0001 because features are not repeatable in the
presence of noise. However, ICP, which is non-feature based
registration technique, converged to a global minimum for
σ = 0.0001, 0.001. However, an offset between two point
clouds can be seen, if observed closely, when the σ = 0.01.
On the other hand, our technique, which is feature based,
converged to a global minimum for σ = 0.0001, 0.001, 0.01.
The reason for the convergence of VIP is that the saliency
measure of extracting parametric surfaces is less affected by
noise.
IV. D ISCUSSION
VIP outperformed all other tested techniques in terms of
convergence and efficiency in all three data sets. In the noisy
data sets where measurement noise and quantization error
was very high, VIP was the most successful because of its
saliency measure of searching for stable areas in the point
cloud. Parametric surfaces absorb the affect of measurement
noise and quantization error. Furthermore VIPs are not based
on a single planar surface but a group of three, which is
the reason for the good repeatability of VIPs even in the
presence of a significant amount of noise. However, the
limitation of VIP is that it requires at least three non-parallel
planes in the point cloud to compute a single VIP. In the
presence of only parallel planes, it normally fails which
is an issue to be addressed in future work. Nevertheless,
point clouds normally acquired from natural and artificial
environments are rich in areas that exhibit planar surfaces.
Using ICP as a post processing step for further refinement
improves the quality of registration and may also register
only parallel planes but decreases the runtime of the algorithm. Only parallel planes cases may be solved by adopting
the planes correspondences determination technique [22].
For ICP and GICP, runtime increases almost linearly with
the size of the point cloud, as shown in Table I. Convergence decreases when the displacement is large between
the point clouds. On the other hand, interest point detection
techniques, such as 2.5D SIFT that use large shape variations
as saliency measure, cannot successfully register noisy point
clouds because such areas contain significant amount of
noise due to the sensors’ limitations.
V. C ONCLUSION
A novel robust feature based registration method is presented. The method is robust to noise because avoiding noisy
areas is one of the key aspect considered in design criteria.
The VIPs are computed by observing the most stable areas
in the scene, therefore even in the presence of noise the
performance of the algorithm does not degrade. Furthermore,
the feature descriptors are computed in such a way that
only one true correspondence is required for registration
of noisy point clouds. As a result, compared to other
commonly used registration methods, the proposed method
is more stable and computationally efficient. The method
was experimentally evaluated and shown to outperform the
PCL versions of ICP, GICP, and 2.5D SIFT, in terms of
convergence behaviour and efficiency.
In future work, this method will be expanded to solve the
aforementioned cases by including other types of parametric
surfaces, such as lines, spheres, and conics. The technique
will also be tested against other natural scenes, and generalized for object recognition purposes.
R EFERENCES
[1] P. Besl and N. D. McKay, “A method for registration of 3D shapes,” Pattern Analysis and Machine Intelligence, IEEE
Transactions on, vol. 14, no. 2, pp. 239–256, Feb 1992.
[2] Z. Zhang, “Iterative point matching for registration of freeform curves and surfaces,” International Journal of Computer
Vision, vol. 13, no. 2, pp. 119–152, 1994.
[3] P. Forsman and A. Halme, “Feature based registration of
range images for mapping of natural outdoor environments,”
in 3D Data Processing, Visualization and Transmission, 2004.
3DPVT 2004. Proceedings. 2nd International Symposium on,
Sept 2004, pp. 542–549.
[4] F. Bowen, E. Du, and J. Hu, “New region feature descriptorbased image registration method,” in Systems, Man, and
Cybernetics (SMC), 2012 IEEE International Conference on,
Oct 2012, pp. 2489–2494.
[5] M. Andersen, T. Jensen, P. Lisouski, A. Mortensen,
M. Hansen, T. Gregersen, and P. Ahrendt, “Kinect depth
sensor evaluation for computer vision applications,” Electrical
and Computer Engineering, Aarhus University, Tech. Rep.,
2012.
[6] R. Lange, “3D Time-Of-Flight distance measurement with
custom solid-state image sensors in CMOS/CCD-technology,”
Ph.D. dissertation, University of Siegen, 2000.
[7] A. Segal, D. Haehnel, and S. Thrun, “Generalized-icp,” in
Proceedings of Robotics: Science and Systems, Seattle, USA,
June 2009.
[8] N. J. Mitra, N. Gelfand, H. Pottmann, and L. Guibas, “Registration of point cloud data from a geometric optimization
perspective,” in Symposium on Geometry Processing, 2004,
pp. 23–31.
[9] H. Chen and B. Bhanu, “3D free-form object recognition in
range images using local surface patches,” Pattern Recogn.
Lett., vol. 28, no. 10, pp. 1252–1262, 2007.
[10] Y. Zhong, “Intrinsic shape signatures: A shape descriptor for
3d object recognition,” in Computer Vision Workshops (ICCV
Workshops), 2009 IEEE 12th International Conference on,
2009, pp. 689–696.
[11] T.-W. R. Lo and J. P. Siebert, “Local feature extraction and
matching on range images: 2.5d {SIFT},” Computer Vision
and Image Understanding, vol. 113, no. 12, pp. 1235 – 1250,
2009, special issue on 3D Representation for Object and
Scene Recognition.
[12] D. G. Lowe, “Distinctive image features from scale-invariant
keypoints,” Int. J. Comput. Vision, vol. 60, no. 2, pp. 91–110,
Nov. 2004.
[13] F. Tombari, S. Salti, and L. Di Stefano, “Unique signatures
of histograms for local surface description,” in Proceedings
of the 11th European conference on computer vision conference on Computer vision: Part III, ser. ECCV’10. Berlin,
Heidelberg: Springer-Verlag, 2010, pp. 356–369.
[14] R. B. Rusu, N. Blodow, and M. Beetz, “Fast Point Feature
Histograms (FPFH) for 3D Registration,” in Proceedings of
the IEEE International Conference on Robotics and Automation (ICRA), Kobe, Japan, May 12-17 2009.
[15] F. Sohel, J. Wan, M. Lu, and M. Bennamoun, “3d object
recognition in cluttered scenes with local surface features: A
survey,” IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 99, no. PrePrints, p. 1, 2014.
[16] M. Muja and D. G. Lowe, “Fast approximate nearest neighbors with automatic algorithm configuration,” in International
Conference on Computer Vision Theory and Application
VISSAPP’09). INSTICC Press, 2009, pp. 331–340.
[17] L. A. Alexandre, “3D descriptors for object and category
recognition: a comparative evaluation,” in Workshop on
Color-Depth Camera Fusion in Robotics at the IEEE/RSJ
International Conference on Intelligent Robots and Systems
(IROS), Vilamoura, Portugal, Oct 2012.
[18] R. Schnabel, R. Wahl, and R. Klein, “Efficient ransac for
point-cloud shape detection,” in Computer graphics forum,
vol. 26. Wiley Online Library, 2007, pp. 214–226.
[19] F. Tarsha-Kurdi, T. Landes, and P. Grussenmeyer, “HoughTransform and Extended RANSAC Algorithms for Automatic
Detection of 3D Building Roof Planes from Lidar Data,”
in ISPRS Workshop on Laser Scanning 2007 and SilviLaser
2007, vol. XXXVI, Espoo, Finland, 2007, pp. 407–412.
[Online]. Available: https://halshs.archives-ouvertes.fr/halshs00264843
[20] T. Rabbani, F. A. van den Heuvel, and G. Vosselmann,
“Segmentation of point clouds using smoothness constraint,”
in IEVM06, 2006.
[21] J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers, “A benchmark for the evaluation of rgb-d slam systems,” in Proc. of the International Conference on Intelligent
Robot Systems (IROS), Oct. 2012.
[22] K. Pathak, A. Birk, N. Vaskevicius, and J. Poppinga, “Fast
Registration Based on Noisy Planes With Unknown Correspondences for 3-D Mapping,” IEEE Transactions on
Robotics, vol. 26, no. 3, pp. 424 –441, June 2010.
Fly UP