# Shape Matching

Abstract

Describe several algorithms that provide both rigid and non-rigid point-set registration. The point sets are represented as probability density functions and the registration problem is treated as distribution alignment. Using the PDFs instead of the points provides a more robust way of dealing with outliers and noise, and it mitigates the need to establish a correspondence between the points in the two sets. The algorithms operate on the distance between the two PDFs to recover the spatial transformation function needed to register the two point sets. The distance measures used are based on various famous inequalities/divergence measures. The algorithms have been shown to be robust to noise and outliers, and perform well in varying degrees of transformations and noise.

Cauchy-Schwarz Divergence

Given to density functions $f$ and $g$, Cauchy-Schwarz divergence ($\mathcal{D_{CS}}$) measures the similarity between them in similar fashion to the Kullback-Leibler divergence. The measure is derived from the Cauchy-Schwarz inequality

$\sqrt{\int{f^2(x)dx}\int{g^2(x)dx}} \geq \int{f(x)g(x)dx}$,

where, for pdfs, the equality holds if and only if $f(x)=Cg(x)$ with $C = 1$. To simplify the calculations, we take its square and the Cauchy-Schwarz divergence of two PDFs is defined as

$\mathcal{D_{CS}}(f \| g) = -\log\frac{ \displaystyle \left(\int{f(x)g(x)dx}\right)^2}{ \displaystyle \int{f^2(x)dx}\int{g^2(x)dx}}$.

$\mathcal{D_{CS}}(f \| g)$ is a symmetric measure and $\mathcal{D_{CS}}(f,g)\geq0$, where the equality holds if and only if latex $f(x)=g(x)$. However, the triangle inequality property does not hold, so it cannot be considered a metric. $\mathcal{D_{CS}}(f,g)$ can be broken down to and rewritten as:

$\mathcal{D_{CS}}(f \| g) = -2\log\int{f(x)g(x)dx} + \log\int{f^2(x)dx} + \log\int{g^2(x)dx}$.

The argument of the first term, $\int{f(x)g(x)dx}$, estimates the interactions on locations within the support of $f(x)$ when exposed to the potential created by $g(x)$ (or viceversa). This term measures the similarity (distance) between the two PDFs and it is Renyi’s quadratic cross entropy. It can also be interpreted as the information gain from observing $g$ with respect to the “true” density $f$. The other two terms are the negative Renyi’s quadratic entropies of the respective PDFs and are considered as simply normalizing terms that act as regularizers. Thus the Cauchy-Schwarz divergence can be interpreted as:

$\mathcal{D_{CS}}(f \| g) = 2\mathcal{H}_2(f;g) - \mathcal{H}_2(f) - \mathcal{H}_2(g)$.

Matching Algorithm

Suppose we have two point sets ${\bf X} = ({\bf x}_1,\dots,{\bf x}_N)^T$ and ${\bf Y} = ({\bf y}_1,\dots,{\bf y}_M)^T$, where ${\bf x}_i,{\bf y}_j\in \mathbb{R}^d$. The kernel (Parzen) estimate of the PDF of an arbitrary point ${\bf x}$ using an arbitrary kernel function $\kappa(\cdot)$ is given by $\hat{P}_{\bf X} = \frac{1}{N}\sum^N_{i=1}{\kappa\left(\frac{{\bf x}-{\bf x}_i}{\sigma}\right)}$, where $\sigma$ is the bandwidth parameter. The Gaussian kernel, $G_\sigma(x,y) = \frac{1}{\sqrt{2\pi}\sigma} \exp\left(-\frac{\left\|x-y\right\|^2}{2\sigma^2}\right)$, is considered as the kernel of choice for its properties: symmetric, positive definite, approaches zero as points move away from the center, and the decaying factor can be controlled via the kernel bandwidth.

Substituting the Gaussian kernel PDF estimator in the Cauchy-Schwarz divergence and performing some straightforward manipulations (the integral of the product of two Gaussians is exactly evaluated as the value of the Gaussian computed at the difference of the arguments and whose variance is the sum of the variances of the two original Gaussian functions results in the estimator:

${\small \mathcal{D_{CS}}(P({\bf Y}) \| P({\bf X})) = -2\log\displaystyle\sum_{i=1}^{M}\displaystyle\sum_{j=1}^{N}G({\bf y}_{i}-{\bf x}_{j}) + \log\displaystyle\sum_{i=1}^{M}\displaystyle\sum_{j=1}^{M}G({\bf y}_{i}-{\bf y}_{j}) + \log\displaystyle\sum_{i=1}^{N}\displaystyle\sum_{j=1}^{N}G({\bf x}_{i}-{\bf x}_{j}) }$.

To align the two mixtures $P({\bf Y})$ and $P({\bf X})$, the density function $P({\bf X})$ needs to undergo some transformation which can be broken down into two parts: 1) affine transformation, which is the first step in aligning the point sets and accounts for the major differences between the two PDFs caused due to rotation, scaling, translation, etc. and 2) non-rigid transformation, which accounts for deformations due to nonrigid motions that the objects have gone through. The transformations can then be expressed as:

$P({\bf X}) \stackrel{\mathcal{A}}{\longrightarrow} P_\mathcal{_A}({\bf X}) \stackrel{\mathcal{N_R}}{\longrightarrow} P_\mathcal{_{N_R}}({\bf X})$

Rigid and Affine Point Matching

A basic transformation function involving only rotation and translation applied to a point ${\bf x}_i$ can be written as follows $f({\bf x}, \theta, {\bf t}) = R(\theta)\cdot {\bf x} + {\bf t}$, where $\theta$ is the rotation angle, $R(\theta)$ is the rotation matrix and ${\bf t}$ is the translation vector. To represent both the translation and affine transformations in a matrix form, we consider homogeneous coordinates where the original vector points are extended to $(d+1)$ dimensions with the last element being one. Taking this into consideration and considering that the second term depends only on the desired point set and its value never changes nor does it affect the transformation function, thus removing it from future considerations of the cost function, our cost function becomes:

$\mathcal{D_{CS}}(P({\bf Y})\| P_\mathcal{_A}({\bf X})) = -2\log\displaystyle\sum_{i=1}^{M}\displaystyle\sum_{j=1}^{N}G({\bf y}_{i}-{\bf A}{\bf x}_{j}) + \log\displaystyle\sum_{i=1}^{N}\displaystyle\sum_{j=1}^{N}G({\bf A}({\bf x}_{i} - {\bf x}_{j}))$.

Differentiating with respect to the transformation matrix ${\bf A}$ and equating the gradient to zero results in the fixed point update rule shown below. Notice that the transformation matrix ${\bf A}$ is present in both sides of the equation. While there exist many ways of performing the fixed point update, this particular solution was chosen because of its smooth behavior on ${\bf A}$ across iterations.

${\bf A}^{*} = \left( 2\frac{\sum\limits_{i=1}^{M}\sum\limits_{j=1}^{N}G({\bf y}_{i}-{\bf Ax}_{j})({\bf y}_i {\bf x}_j^T)}{\sum_{i=1}^{M}\sum_{j=1}^{N}G({\bf y}_{i}-{\bf A}{\bf x}_{j})} + {\bf A}\frac{\sum\limits_{i=1}^{N}\sum\limits_{j=1}^{N}G({\bf A}({\bf x}_{i}-{\bf x}_{j}))({\bf x}_{i}-{\bf x}_{j}) ({\bf x}_{i}-{\bf x}_{j})^T} {\sum_{i=1}^{N}\sum_{j=1}^{N}G({\bf A}({\bf x}_{i}-{\bf x}_{j}))} \right) \left[ 2\frac{\sum\limits_{i=1}^{M}\sum\limits_{j=1}^{N}G({\bf y}_{i}-{\bf Ax}_{j})({\bf x}_i {\bf x}_j^T)} {\sum_{i=1}^{M}\sum_{j=1}^{N}G({\bf y}_{i}-{\bf A}{\bf x}_{j})} \right]^{-1}$

Non-rigid Point Matching

The point matching problem becomes more challenging when non-rigid transformation is also required. To represent the non-rigid deformation, we use radial basis functions (RBF) with the Gaussian kernel as its type. The Gaussian kernel, as mentioned above, is chosen for its symmetric property and because it quickly approaches zero as the distance increases with respect to the bandwidth parameter. Controlling the bandwidth parameter gives us the flexibility to control the decaying factor of the kernel function and thus the rigidity of the transformation. The transformation function becomes:

$f({\bf x}, {\bf A}, {\bf W}) = {\bf A}\cdot {\bf x} + {\bf W}\cdot \phi({\bf x}_i)$,

where ${\bf A}$ is the $(d+1)\times(d+1)$ matrix representing the affine transformation, ${\bf W}$ is a $(d+1)\times N$ warping coefficient matrix representing the non-rigid transformation, and $\phi({\bf x}_i)$ is RBF with size $N \times 1$.

In the non-rigid case, there exist many ways of mapping one point set to another. Thus, a smoothness constraint is required to discourage arbitrary mappings and restrict the flexibility of the correspondence between the two point sets. To control the behavior of the mapping, a smoothness measure is required. The TPS-RPM algorithm uses the space integral of the square of the second order derivatives of the mapping function, the thin-plate spline. The CPD algorithm controls the “oscillatory behavior” by using a low-pass filtering function. In our algorithm, we control the smoothness by constraining the non-rigid freedom with respect to the affine transformation. The cost function becomes:

$\mathcal{E} = \mathcal{D_{CS}}(P({\bf Y})\| P_\mathcal{_{N_R}}({\bf X})) + \lambda \mathcal{D_{CS}}(P_\mathcal{_A}({\bf X}) \| P_\mathcal{_{N_R}}({\bf X}))$.

Since the correspondence between the random variables ${\bf X}_\mathcal{_A}$ and ${\bf X}_\mathcal{_{N_R}}$ is known, we can measure how similar they are in the neighborhood of the joint space by using another measure known as correntropy [1], which is a generalized correlation involving high-order statistics. Cross-correntropy between two random variables ${\bf X}_\mathcal{_A}$ and ${\bf X}_\mathcal{_{N_R}}$ is defined as $v({\bf X}_\mathcal{_A},{\bf X}_\mathcal{_{N_R}}) = \mathbf{E}\left[\kappa_\sigma({\bf X}_\mathcal{_A}-{\bf X}_\mathcal{_{N_R}})\right]$, where $\kappa_\sigma({\bf X}_\mathcal{_A}-{\bf X}_\mathcal{_{N_R}})$ can be any non-negative definite kernel. The cost function then becomes:

$\mathcal{E} = \mathcal{D_{CS}}(P({\bf Y})\| P_\mathcal{_{N_R}}({\bf X})) + \lambda v(P_\mathcal{_A}({\bf X}), P_\mathcal{_{N_R}}({\bf X}))$.

Using an estimator of correntropy based on the data available, we get the non-rigid cost function:

${\tiny \mathcal{E} = -2\log\displaystyle\sum_{i=1}^{M}\displaystyle\sum_{j=1}^{N}G({\bf y}_{i}-{\bf Ax}_{j}-{\bf Wk}_{j}) + \log\displaystyle\sum_{i=1}^{N}\displaystyle\sum_{j=1}^{N}G({\bf A}({\bf x}_{i}-{\bf x}_{j}) + {\bf W}({\bf k}_{i}-{\bf k}_{j})) + \lambda \displaystyle\sum_{i=1}^{N}G({\bf Ax}_{i}-({\bf Ax}_{i}+{\bf Wk}_i)) }$

where ${\bf K}$ is a $N\times N$ symmetric Gram matrix of initial point sets with elements $k_{ij}=\exp(-\frac{1}{2}\|\frac{x_{0i}-x_{0j}}{\beta}\|)^2$. Taking the derivative with respect to ${\bf A}$ and ${\bf W}$ and solving for each of the transformation matrices results in an update rule similar to the fixed point update rule above.

Experiments and Results

Browse the links below to see details and results of shape matching using $\mathcal{D_{CS}}$:

(to be continued)