2017-04-04 Salient Region Detection via High-Dimensional Color Transform and Local Spatial Support

Paper Learning

The learning of "Salient Region Detection via High-Dimensional Color Transform and Local Spatial Support" (J. Kim, D. Han, Y.-W. Tai, and J Kim. IEEE Transactions on Image Processing Jan. 2016, pp. 9-23)

1.1.1 Theories

1) On the basis of HDCT proposed in 2014, this paper improves the model, and get a better result.

2) Color is a very important visual cue to human, and HDCT is built upon distinctive color detection from an image. Since HDCT uses only color information, it can be easily affected by texture and noise.

3) If a superpixel is closer to the foreground regions than the background regions, it has higher chance to be a salient region. Based on this assumption, this paper uses local saliency estimation to obtain another saliency map.

1.1.2 Model overview

1) Get superpixel saliency features, including global features local features HOG features, etc.

2) Use a random forest classification to generate an initial saliency trimap. Trimap divides the image into salient, non-salient and ambiguous regions.

3) Combine several representative color spaces such as RGB, CLELab, and HSV together with different power-law transformations to enrich the representative power of the HDCT space. Then utilize the foreground candidate and background candidate color samples in trimap to estimate an optimal linear combination of color coefﬁcients to separate the salient region color and background color.

4) Train a random forest classiﬁer to evaluate the saliency of an ambiguous superpixel by comparing the distance and color contrast of a superpixel to the K-nearest foreground superpixels and the K-nearest background superpixels.

5) Get weight of combination function by training. Repeat the optimization process with randomly initialized variables several times, and the ﬁnal solution for the objective function.

1.1.3Worth learning highlights

1) Feature descriptions: The element distribution of the color features and singular value feature.

(1) The element distribution of the color features. (F. Perazzi, P. Krahenbuhl, Y. Pritch, and A. Hornung, “Saliency ﬁlters: Contrast based ﬁltering for salient region detection,” in Proc. IEEE CVPR, Jun. 2012, pp. 733–740.)

Low variance indicates a spatially compact object which should be considered more salient than spatially widely distributed elements.

Where wij(c)describes the similarity of color ci and color cj of segments i and j. The greater the value, the more similar. Respectively, pj is again the position of segment j, and ui=j=1Nwij(c)pj defines the weighted mean position of color ci.

(2) Singular value feature. (B. Su, S. Lu, and C. L. Tan, “Blurred image region detection and classiﬁcation,” in Proc. ACM Int. Conf. Multimedia, 2011, pp. 1397–1400.)

Singular value feature that can be used as a blur metric to detect image blur effectively and accurately.

Given an image I, its singular value decomposition (SVD) can be represented by I = UΛVT where U, V are orthogonal matrices and Λ is a diagonal matrix that is composed of multiple singular values arranged in decreasing order. The image can therefore be decomposed into multiple rank 1 matrices (which are also called eigen-images). The singular value decomposition actually decomposes an image into a weighted summation of a number of eigen-images where the weights are exactly the singular values themselves. Those eigen-images provide different scale-space analysis of the image. Those with a small singular value which often capture detailed information are instead discarded.

Where λi denotes the singular value that is evaluated within a local image patch for each image pixel. The singular feature is actually the ratio between the ﬁrst k most signiﬁcant singular value and all singular values. Blurred

image regions have a higher blur degree compared with clear image regions with no blurs.

2) Random Forest Classification and Regression.

In this paper, use the random forest to classify the feature vectors for every superpixel.

A random forest is an ensemble method that operates by constructing multiple decision trees at training time and decides the class by examining each tree’s leaf response value at test time. This method combines the bootstrap aggregating idea and random feature selection to minimize the generalization error.

When classifying a new object based on certain attributes. Every tree in the random forest will choose to give their own classification, and "vote" according to the classification results of each tree. The output of the forest as a whole will be the largest number of votes. In the regression problem, the output of random forest will be the average of all trees.

3) Power-law transformation.

To further enrich the representative power of our HDCT space, the paper applies power-law transformation to each color coefﬁcient after normalizing the coefﬁcient between [0, 1]. It stretches/compresses the intensity contrast within different ranges of color coefﬁcients.

s=crγ

γ<1, enhance low pixel value.

γ>1, enhance high pixel value.