Shape matching using Cauchy-Schwarz divergence


Experiments & Results

First, analyzed the effect of kernel bandwidth, \sigma, used for PDF estimations on the convergence rate. A simple experiment of matching two shapes against a wide range of kernel sizes was run as shown in figure below. The results demonstrated two things: 1) the kernel bandwidth is associated with the dynamic range of the data, and 2) the fastest convergence can be achieved by controlling (annealing) the kernel size through the iterations.


Analysis on 2D fish data

Non-rigid Evaluation

The template was warped through a series of non-rigid transformations to test the algorithm’s performance on solving different degrees of deformation. The method was tested against robust point matching (RPM) and coherent point drift (CPD)¬†algorithms. The figure below shows an example where the three algorithms are tested against two clean point sets. All three algorithms produce accurate results for non-rigid registration on clean sets.


Noise and outlier presence

The registration problem becomes more challenging, with the addition of random noise to the reference point set. In addition, the head of the fish in the reference point set and the tail of the one in the template point set are removed. The figure below shows the results of this experiment. Our method is robust against both missing points and corrupted data. CPD performs well against noise but it is affected by the missing data points on the template. While around the head of the fish it performs very well, its approximation on the tail is far off from the reference point set. RPM is affected by both the missing points and the noise. Both, the head of the template fish is bent inwards and the tails of the two point sets are not aligned with each other.


Affine transformation

In this case, both affine and non-rigid transformations are applied to the reference point set to test the algorithms capability to approximate both the global and local transformations. All three algorithms perform fairly well with various degrees of performance depending on the difficulty of the rotation. The figure below shows an example of rotation of 35^{\circ} and some translation along with the non-rigid transformation. In this case, all three algorithms perform well, but the mapping is not as accurate as that of experiment 1. When the rotation angle is above 60^{\circ}, all three algorithms perform poorly. A simple way to overcome this problem would be to have multiple initial orientations for the template point set. We designed a more specific update algorithm where the affine transformation matrix, A, was assumed to be rotation + translation. In this case, rather than solving for each element of the transformation matrix, we solved for the rotation angle, \theta. Setting this constraint on the elements of the transformation matrix allowed for a faster and more accurate convergence up to \pm85^{\circ}. Higher angles would result in a flip of the template.



Analytical Results

To provide analytical results on the robustness of the method to the degree of non-rigid warping and noise in the data, we compare it against the best up-to-date method, the coherent point drift. The figure to the left shows a comparison of the two methods on various degrees of non-rigid warping. The average error rate and its standard deviation are shown for various degrees of warping. Our method performs very well and the results are very close to the CPD, with some improvement in some cases. In the figure to the right, we compare the two methods on different levels of noise. The warping degree was kept at 0.06 (referring to the results mentioned above), and the noise power was increased from 0.02 to 0.10. The results show that our method performs better than the CPD on cases of high noise.


Analysis on 3D bunny data

The extension of the algorithm to 3D is straightforward. The figure below (left) shows two objects: bun000 and bun045 borrowed from the Stanford bunny dataset. The algorithm produces a visually correct registration as shown on the figure (right).

Similar to the 2D case, the algorithm performs well for rotations up to 60^{\circ} in each axis, where the error is zero. The combination of rotations on multiple axes becomes difficult to identify a limitation boundary as the combination of different angles will have different effects depending on the structure of the object and also the order in which the rotations are taken. In the figure below, the performance results for rotations on all three axes for the bunny are shown with the rotation order: rotX*rotY*rotZ. To account for higher rotation angles, we need to place this requirement on the affine transformation matrix by constraining its elements to be prone to rotation behavior first and then scaling so that the system does not get stuck into a local minimum.


Evaluation on MPEG-7 shape database

The final experiment involves the MPEG-7 shape database, specifically the CE-Shape-1 part B. MPEG-7 is a standard database used to test shape classification and retrieval methods. The database consists of 1400 silhouette images: 70 categories with 20 examples per category. The performance is measured using the “bull’s eye test” where every shape in the database is compared to all other shapes and the number of shapes from the same category among the 40 most similar shapes is reported. Since some of the categories have shapes that appear flipped horizontally or vertically and our algorithm cannot address these types of variations in the shape, we modified the cost function to include comparisons against the original shape, flipped horizontally, and flipped vertically:

\mathcal{D_{CS}}(Y \| X) = \min\left\{ \mathcal{D}(Y \| X), \mathcal{D}(Y \| X_h), \mathcal{D}(Y \| X_v) \right\}.

The results that we obtain have a 83 \% retrieval rate. The reason for the low performance is the difficulty of datasamples within a class, as shown in the figure below (left), as well as the similarity of samples among different classes, as shown in the figure (right). The left figure shows a few examples from ‘device 9′ class where the details inside the object make it difficult to obtain low error rate for objects of the same class. The right figure shows examples from two different classes, guitar (top) and spoon (bottom), which have similar samples resulting in mixups during retrieval.