Correntropy based Matched Filtering



Introduce a nonlinear extension to the matched filter using a new metric called correntropy which features high order moments in the decision statistic showing improvements in classification especially in the presence of noise. Correlation and MSE are second order statistics which fully quantify random variables that are Gaussian distributed. Correntropy generalizes correlation; it quantifies higher order moments of the PDF yielding solutions that are more accurate in non-Gaussian and nonlinear signal processing. This is important when there is high noise level in the data, the object contours may have partially occluded areas, or the boundaries may not have been extracted correctly (as is the case with our data).



Correntropy is a generalized correlation measure involving high-order statistics which measures a nonlinear similarity between random variables [1]. In this work, correntropy is useful because it is directly related to the probability of how similar two random variables are in a neighborhood of the joint space controlled by the kernel bandwidth. Controlling the kernel bandwidth allows us to adjust the ‘observation window’ in which similarity is assessed resulting in an effective mechanism to eliminate the detrimental effect of outliers; thus, making this technique robust against impulsive noise, which may occur in the form of occlusions or other imperfections on the images.

Cross correntropy between two random variables X and Y is defined as V_{\sigma}(X,Y) = {\bf E} \left[ \kappa_{\sigma}(X-Y) \right], where \kappa_{\sigma}(X-Y) can be any non-negative definite kernel. In our case, the Gaussian kernel is used: \kappa_{\sigma}(X-Y) = \frac{1}{\sqrt{2\pi}\sigma} exp \left( - \frac{(X-Y)^2}{2\sigma^2} \right). In practice, the joint PDF is unknown and only a finite number of data \left\{ (x_i, y_i) \right\}^{N}_{i=1} are available, leading to the sample estimator of correntropy \hat{V}_{N,\sigma}(X,Y) = \frac{1}{N} \sum^N_{i=1} \kappa_{\sigma}(x_i-y_i).


Point Matching using Correntropy

Here, we describe the matching problem using correntropy and explain the transformation matrix to the most fundamental affine operations: rotation, scale, and translation. Focusing on the attributes of these operations helps us constrain the transformation matrix space, and in doing so, speed up the convergence rate. So, we consider the object of interest as a rotated, scaled and translated version of one of the existing patterns. Through a fixed number of parameter updates, given {\bf Y} we can identify if the {\bf X} pattern is present. The parameters are updated based on maximizing the correntropy between the object of interest and the ‘transformed’ pattern using as a cost function

J({\bf A},{\bf d}) = \sum^N_{i=1} e^{-\frac{ \displaystyle \left\| {\bf y}_i - {\bf Ax}_i - {\bf d} \right\|^2}{2\sigma^2}}, where {\footnotesize {\bf A} = \left[ \begin{array}{ c c } a & -b \\ b & a \end{array} \right] }

is the rotation and scaling matrix, and {\bf y}_i and {\bf x}_i are the i^{th} column entries of {\bf Y} and {\bf X} respectively. To update the matrix {\bf A}, we take into consideration the repetition of the elements a and b and differentiate with respect to the vector \left[ a, b \right]^T instead resulting in the following fixed point update rule:

\left[ \begin{array}{ c } a \\ b \end{array} \right] \leftarrow \frac{ \sum^N_{i=1} {\bf z}_i e^{-\frac{ \left\| {\bf y}_i - {\bf Ax}_i - {\bf d} \right\|^2}{2\sigma^2}}} { \sum^N_{i=1} \left\|{\bf x}_i\right\|^2 e^{-\frac{ \left\| {\bf y}_i - {\bf Ax}_i - {\bf d} \right\|^2}{2\sigma^2}}}, with {\scriptsize {\bf z}_i = \left[ \begin{array}{ c } x_{i1}y_{i1} + x_{i2}y_{i2} \\ x_{i1}y_{i2} + x_{i2}y_{i1} \end{array} \right] },

where x_{ik} and y_{ik} are the k^{th} components of the {\bf x}_i and {\bf y}_i respectively. Similarly by differentiating with respect to {\bf d} and equating the gradient to zero we get the update rule for {\bf d}:

{\bf d} \leftarrow \frac{ \sum^N_{i=1} ({\bf y}_i - {\bf Ax}_i) e^{-\frac{ \left\| {\bf y}_i - {\bf Ax}_i - {\bf d} \right\|^2}{2\sigma^2}}} { \sum^N_{i=1} e^{-\frac{ \left\| {\bf y}_i - {\bf Ax}_i - {\bf d} \right\|^2}{2\sigma^2}}}.

Considering the transformation matrix {\bf A} as rotation + scaling is important in applications dealing primarily with rigid transformation, such as industrial inspection. Differentiating with respect to the vector \left[ a, b \right]^T constrains the possible transformations that an object could take to the ones specific to the application and both improves accuracy and speeds the convergence rate. To demonstrate the benefits of restricted transformation to application specific requirements, below we provide an example of how template matching is modified to the requirements of a real world application.


Application: Side-scan Sonar Imagery Retrieval

Improvements in sonar technology have allowed for large portions of the sea bottom to be scanned quickly. Nevertheless, identifying and classifying objects lying on the sea-bed is still a daunting task for a human operator. Special interest is shown in automating these techniques. During the survey of the sea bottom, the sensor(s) may detect the same object multiple times due to overlapping tracks – the zig-zag nature of the surveys. Thus, over the full course, the same object may be detected many times resulting in numerous observations that need to be fused into a single target location label.

The original sonar images have already been segmented into snippets containing an object highlight and its shadow, which arise from the presence or lack of acoustic reverberations, along with some of the background. The images belonging to one of the four categories: cube, cylinder, cone, and rectangular cuboid. They are taken in a range of 10 to 106 meters, and aspect angle of 0 to 360 degrees. The figure below shows four different example of the same object viewed from different angle and range.

Simple template matching techniques do not work in this scenario. Extracting the contours of the object highlight and shadow from different snippets and matching them is not feasible. There are no affine transformations that could match the shadow contours of two different snippets. This is due to the shadow being created from a 3D object whereas the contours are representations in 2D. To overcome this problem, we designed 3D models for each of the categories. The shadow contour was then generated by projecting the object onto the ground based on a point of view as depicted in the figure below. The point of view projection follows the actual synthesis of sonar images (synthesized one line at a time as the sonar fish follows the path). Similarly, we use a line rather than a single point as our projection viewpoint. Since a snippet contains information from both the object and its shadow, we use both sets of contours during the template matching.

 The shadow contour matching is more difficult than highlight matching because in order to create the correct shadow shape we need to consider 3D object templates and project them into 2D based on a point of view representing the sonar sensor hitting the object. To simplify the problem, we consider separate cost functions for updating the rotation matrix and the point of view. To update the rotation/scaling matrix, the correntropy cost function is computed in terms of the shadow contour {\bf sh} = \left[ {\bf sh}_x, {\bf sh}_y \right]^T and the template projected shadow, which is a function of the point of view {\bf p} = \left[ {\bf p}_x, {\bf p}_y, {\bf p}_z \right]^T, template object {\bf c} = \left[ {\bf c}_x, {\bf c}_y, {\bf c}_z \right]^T, and the rotation/scaling matrix {\bf A} as shown below:

J({\bf A}) = \sum^N_{i=1} e^{-\frac{ \displaystyle \left\| {\bf sh}_i - f({\bf p}_i,{\bf Ac}_i) \right\|^2}{2\sigma^2}}, where {\scriptsize {\bf A} = \left[ \begin{array}{ c c c } a & -b & 0 \\ b & a & 0 \\ 0 & 0 & r \end{array} \right]}

has an additional row/column counting for the third dimension. The element r represents the scaling factor in a and b. Again, differentiating with respect to the vector [a, b]^T and equating the gradient to zero results in the following fixed point update rule:

\left[ \begin{array}{ c } a \\ b \end{array} \right] \leftarrow \frac{ \sum^N_{i=1} {\bf z}_i e^{-\frac{ \left\| {\bf sh}_i - f({\bf p}_i,{\bf Ac}_i) \right\|^2}{2\sigma^2}}} { \sum^N_{i=1} {\bf p}_{zi}( {\bf c}^2_{xi} + {\bf c}^2_{yi} ) e^{-\frac{ \left\| {\bf sh}_i - f({\bf p}_i,{\bf Ac}_i) \right\|^2}{2\sigma^2}}}, with {\scriptsize {\bf z}_i = \left[ \begin{array}{ c } ({\bf sh}_x{\bf c}_x + {\bf sh}_y{\bf c}_y)({\bf p}_z - r{\bf c}_z) + r{\bf c}_z({\bf p}_x{\bf c}_x + {\bf p}_y{\bf c}_y) \\ ({\bf sh}_x{\bf c}_x - {\bf sh}_y{\bf c}_y)({\bf p}_z - r{\bf c}_z) + r{\bf c}_z({\bf p}_x{\bf c}_x - {\bf p}_y{\bf c}_y) \end{array} \right] }.

The r is updated using \sqrt{a^2 + b^2}. To update the point of view, the cost function then becomes

J({\bf A},{\bf d}) = \sum^N_{i=1} e^{-\frac{ \displaystyle \left\| {\bf sh}_i - {\bf Mc}_i - {\bf d} \right\|^2}{2\sigma^2}}, where {\tiny{\bf M} = \left[ \begin{array}{ c c c c } -{\bf p}_z & 0 & {\bf p}_x & 0 \\ 0 & -{\bf p}_z & {\bf p}_y & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & -{\bf p}_z \end{array} \right] }

is the projection matrix, and {\bf sh}_i = \left[ {\bf sh}_x, {\bf sh}_y \right]^T_i and {\bf c}_i = \left[ {\bf c}_x, {\bf c}_y, {\bf c}_z \right]^T_i are the i^{th} column entries of the shadow contour and object template respectively. Notice how the two vectors have different dimensions representing the two different spaces. In addition, the vector {\bf c}_i has a fourth dimension which is needed to homogenize the coordinates. The resulting fixed point update is:

{\bf p} \leftarrow \sum^N_{i=1} {\bf z}_i e^{-\frac{ \left\| {\bf sh}_i - {\bf Mc}_i - {\bf d} \right\|^2}{2\sigma^2}}, where {\scriptsize {\bf z}_i = \left[ \begin{array}{ c } \frac{\left( {\bf c}_x{\bf p}_z - {\bf sh}_x({\bf p}_z - {\bf c}_z) \right)}{{\bf c}_z} \\ \frac{\left( {\bf c}_y{\bf p}_z - {\bf sh}_y({\bf p}_z - {\bf c}_z) \right)}{{\bf c}_z} \\ {\bf c}_z \cdot \frac{ ({\bf p}_x - {\bf c}_x)({\bf sh}_x - {\bf p}_z) + ({\bf p}_y - {\bf c}_y)({\bf sh}_y - {\bf p}_y) } { ({\bf p}_x - {\bf c}_x)({\bf sh}_x - {\bf c}_z) + ({\bf p}_y - {\bf c}_y)({\bf sh}_y - {\bf c}_y) } \end{array} \right] }.


Experiments and Results

Test the robustness of correntropy against noise by comparing it against correlation on an example of the object highlight with noise. The figure below shows the results of template matching using correlation (on the left) and correntropy (on the right). The template is shown in red, the object highlight along with noise is shown in blue circles, and the point correspondence in black. Correlation attempts to accommodate all the samples whereas correntropy works on a window size and ignores points that are distant (e.g. outliers).

Test the highlight and shadow template matching based on the improved 3D model. The figure below shows an example of a snippet where the object highlight and shadow contours are extracted (on the left) and the final 3D template matching (on the right).


The figure below shows the results of comparing an object against two templates: in (a) against the correct template and in (b) against a wrong template. The incorrect template deforms to adjust to the shadow shape but due to constraints in its shape, it can match well only on the sides resulting in poor match at the end. The poor match is also noticed on the highlight where to accommodate as much of the shape as possible the template rotates into a diagonal position, resulting in a suboptimal solution.


The tables below show the classification results of the entire database for both the highlight and the shadow template matching. It can be seen that the shadow matching improves the classification due to its consistency in preserving the object shape. However, both cases suffer in distinguishing between classes B and D. This is due to the similarity of these two classes; class B is the cylinder category and class D is the rectangular cuboid category.