# 采用改进的基于余量的算法实现人脸识别中最佳gabor特征的选择.pdf

第 33 卷第 9 期 光电工程 Vol.33, No.9 2006 年 9 月 Opto-Electronic Engineering Sep, 2006 文章编号1003-501X(2006)09-0085-06 Optimal Gabor features selection for face recognition using an improved margin- based algorithm LI Fen-lan1CAO Xiao-hui2ZUO Kun-long1XU Ke-xin1 ( 1. State Key Laboratory of Precision Measuring Technology and Instruments, Tianjin University, Tianjin 300072,China; 2. School of Jet Propulsion, Beijing University of Aeronautics Face recognition; Iterative search margin-based algorithm (Simba); Conjugated gradient margin-based algorithm (Cgmba) 采用改进的基于余量的算法实现人脸 识别中最佳 Gabor 特征的选择 李粉兰 1 曹霄辉 2 左坤隆 1 徐可欣 1 ( 1. 天津大学 精密测试技术及仪器国家重点实验室天津 300072 2. 北京航空航天大学 能源与动力工程系北京 100083 ) 摘要 基于 2D Gabor 变换的人脸特征描述已经受到了很多人的关注 然而现有的 Gabor 特征维数 较高而且具有冗余性因此选择最佳的 Gabor 特征用于人脸识别显得尤为的重要利用最大余 量原理的特征选择算法在目前的机器学习研究中已经占据了重要的地位本文在基于余量的迭代 搜索法(Simba)的基础上引入了一种新的选择算法基于余量的共轭梯度法(Cgmba)它只需较 少次迭代就可以找到最佳解我们在 IMM 人脸库上进行了实验实验结果表明尽管只使用了 一半不到的特征但 Cgmba 和 Simba 的识别率却分别提高了 3.75 和 1.25 个百分点同时也证实 收稿日期2006-03-12收到修改稿日期2006-07-24 作者简介李粉兰(1978-)女(汉族江苏江都人博士研究生主要从事图像处理研究E-mail: orchidli@126.com 万方数据 光电工程 第 33 卷第 9 期 86 了我们提出的 Cgmba 明显优于 Simba最后我们对 Cgmba 选择的 Gabor 特征的分布情况进行了 分析可以看出较大尺度的特征相对于较小尺度的特征对于分辩人脸的细微差别具有同等的重要 性而且在垂直135°方向的特征具有更强的分辩能力 关键词Gabor从脸识别基于余量的迭代搜索法(Simba)基于余量的共轭梯度法(Cgmba) 中图分类号TP391.4 文献标识码A Introduction A good face recognition methodology should consider representation as well as classification issues. There are mainly two approaches for face representation: holistic and feature based. In the holistic approaches, statistical me- thods are used to represent face images as a whole. However, feature based methods try to extract local features fr- om face images. Often, feature based approaches require the localization of fiducial points. Early attempts use typ- ical geometrical features such as the location and relative positions of facial features (i.e. eyes, nose). Recently, 2D Gabor wavelet-based methods are used in reference [1, 2] for local feature-based human face representation. In 2D Gabor wavelet-based method, sparse sampling at the fiducial points of face images is carried out where local features are extracted using multi-scales and multi-orientation Gabor kernels. Typically, the features extract- ed by this method always have large dimensionality. In order to reduce original features redundancy and find optimal features to achieve higher performance, feature selection task appears inevitable. A category feature selection methods, like Principal Component Analysis (PCA)[3,4] or Independent Component Analysis (ICA)[5,6], are in fact unsupervised feature extraction algorithms, where the obtained lower dimensions features are not necessarily subsets of the original coordinates. Another category methods such as genetic algorithm[7,8], sequential forward selection (SFS)[9], achieve the optimal feature subset through an evaluation function using a search method in order to select a set that maximizes this function. However, performing an exhaustive search is usually intractable due to the large number of original features. In this paper, we introduce the idea of margin-based feature selection algorithm which measures the quality of selected subset of features by the margin they induce and works with large margin principle for feature selection. Among the margin-based selection algorithms, Simba[10] is frequently used, in which the gradient ascent of evaluation function is derived in order to maximize it. However, when the optimization process is close to extreme point, the value of the gradient drops slowly which is adverse to selection process. So, based on the idea of Simba, we propose an improved selection algorithm-Cgmba, which can find the optimal Gabor features with less iteration. Afterwards, according to the selected Gabor features, we statistically analyze the contribution of each scale and orientation to face recognition task, since each individual Gabor feature is constructed by a combination of scale and orientation pair. 1 Facial features extraction As main local facial features, eyes, nose and mouth often show the most distinguishable information of a given individual. However it is very hard for computers to form a stable geometrical representation as we describe a face in our daily life. In our work, fiducial points are annotated in advance and 2-D Gabor wavelet is applied to create a representation of local facial features. The Gabor kernel can be defined as follows: 22 , 2 2 2 , 2 ,, ))2exp()i)(exp(2exp()(σσσϕ−−−=zkzkkz vuvuvuvu (1) where u and v define the orientation and scale of the Gabor kernels. z=(x, y), ||•|| denotes the norm operator, and the wave vector ku, v is defined as: )iexp( ,uvvu kkφ= (2) 万方数据 2006 年 9 月 LI Fen-lan et al: Optimal Gabor features selection for face recognition using an improved margin- based algorithm 87 where kv=kmax/f v and Фu=πu /8. kmax is the maximum frequency and f is the spacing factor between kernels in the frequency domain. Here we would use Gabor wavelets of five different scales, v∈{0, 1,…4}, and eight orientations, u∈{0, 1,…7}.Other parameters are taken as: σ=2π, kmax=π/2 and f =1.414. The Gabor wavelet representation of an image is the convolution of the image with a family of Gabor kernels. Let I(x, y) be the gray level distribution of an image. The convolution of image I and Gabor kernel φu,v is defined as follows: )(*)()( ,, zzIzO vuvu ϕ= (3) where z=(x, y),*denotes the convolution operator, and Ou,v is the convolution result corresponding to the Gabor kernel at orientation u and scale v. we form a set Q ={Ou,v (z): v∈{0, 1,…4}, u∈{0, 1,…7}}. To facilitate our future work on feature selection, for each Ou,v (z), we construct a vector out of the Ou,v (z) by concatenating all fiducial points and then derive an augmented feature vector Ω. ). ( 7,40 , 10 , 0 OOOΩ = (4) In brief, each element in Ω is a vector which encompasses all fiducial points’ Gabor feature in current orientation and scale. Feature vector Ω is referred to as local facial features. 2 Conjugated gradient margin-based algorithm (Cgmba) In modern machine learning research, Margins play a crucial role. They measure the classifier confidence when making its decision. Generally speaking, the larger Margin is, the better the classification performance is. In order to apply the Margin concept to the feature selection, we introduce an evaluation function, which assigns a score to sets of features according to the margin they induce. Firstly, we formulate the margin as a function of the selected features. Let P be a set of points, x be an instance and w be a weight vector over the feature set, then the margin of x is: 2/ ])()([)( w w w P xnearhitxxnearmissxx−−−=θ (5) where ∑ = i ii w zwz 22 , nearhit (x) and nearmiss(x) denote the nearest point to x in P with the same and different label respectively. Since θλw(x) =|λ|θw(x) for any scalar λ, it is natural to introduce some normalization factor. The natural normalization is to require max wi2=1, since it guaranties that z w≤ z, where the right hand side is the Euclidean norm of z. Now we turn to define the evaluation function. The building blocks of this function are the margins of all the sample points. The margin of each instance x is calculated with respect to the sample excluding x. Given a training set S and a weigh vector w, the evaluation function is: ∑ ∈ = Sx w xS xwe)()( \ θ (6) where S\{x} denotes the samples excluding x in S. We first find the weight vector w that maximizes evaluation function e(w) and then use a threshold to get the optimal feature subset. As described in reference[10], Simba is presented in which the gradient ascent of evaluation function is derived in order to maximize it. The gradient of e (w) when evaluated on a sample S is: i w ii Sx w ii Sx ii i w xnearhitx xnearhitx xnearmissx xnearmissx w x w we we} )( ])([ )( ])([ { 2 1)()( ))(( 22 − − − − − = ∂ ∂ = ∂ ∂ =∇ ∑∑ ∈∈ θ (7) When initial point is far away from extreme point, the gradient of e(w) in Simba will drop drastically at first a few iterations. However, when close to extreme point, the condition of evaluation function is approximatively con- sidered as quadratic function and the dropping speed will slow down fleetly. So, Simba algorithm is only employed at prophase of optimization. 万方数据 光电工程 第 33 卷第 9 期 88 However, according to the fact that the evaluation function will be close to a quadratic function at extreme point, Cgmba we proposed can find optimal solution at less iteration by calculating the direction of conjugated gradient, even though the optimization process is approaching to extreme point. The algorithm is given below. 1) initialize w=(1,1,…,1); 2) for t=1…T (a) pick randomly an instance x from S; (b) calculate nearmiss(x) and nearhit(x) with respect to S\{x} and the weight vector w. (c) for i=1…N calculate i w ii w iii wxnearhitxxnearhitxxnearmissxxnearmissx})(])([)(])([ {)21 ( 22 −−−−−×=∆ (d) if t=1, Rt=Δ(1); else Rt =Δ(t)+βt-1 Rt-1; where βt-1=Δ(t)/Δ(t-1); (e) calculate the step length: )()( 3 )( )/( ttt t Dh∆ ⋅ ′ ∆∆=, where D is the Hessian matrix of objective function. Because, the calculation of Hessian matrix is time-consuming, ht is replaced with a constant or a function whose value decrease gradually with the number of iteration. In our work, we take ht=1/ [1+exp (t/ λT )], where λ is learning speed rate and takes from [0.1 0.5]; (f) wt+1=wt+ ht Rt ; 3) ∞ = 22 www It can be found that the increment of w is represented as a linearly combination of the measure at current iteration with the measure at previous iteration. This is the main idea of our Cgmba. 3 Experiment results and analysis 3.1 Cgmba against Simba To demonstrate our improved algorithm Cgmba against Simba and the quality of the margin based evaluation function, we take the data set as used in [10] for experiments. The problem consists of 1000 sample points with 10 real valued features. The target concept is a xor function over the first 3 features. Hence, the first 3 features are relevant while the other features are irrelevant. If the weights of the first 3 features are bigger than those of others after all the iterations run, we are done. In our experiments, the number of iteration is 200. At first 100 iterations, the Simba is run. Afterward the Cgmba is used. The experimental results are shown in figure 1. Figure 1(a) shows the relationship between the square of Euclidian norm of the increment of weightdw 2and the number of iterations t. Obviously, at first 100 iterations, the gradient changes drastically with the number of iterations increasing, but the trend to decrease gradually is inconspicuous. It make out thatdw 2computed by Simba may be wabbly, even cannot be converged. However, when the number of iterations is greater than100 and Cgmba is run, the value of dw 2 decreases remarkably, and gradually. This phenomenon indicates that when near to extreme points, Cgmba can adaptively modify the step length and approach to the extreme points reposefully. In figure 1(b), the relationship between the sum of all training samples’ margin θwp and t is manifested. In the phase of Simba, θwp fluctuates Fig.1 Experimental results (at first 100 iterations, Simba is run, then Cgmba) (b) Relationship between θwp and t 万方数据 2006 年 9 月 LI Fen-lan et al: Optimal Gabor features selection for face recognition using an improved margin- based algorithm 89 widely while changes gently in the phase of Cgmba. In summary, at initial stage, though Simba has the advantage in decreasing trend with the number of iterations increasing, Cgmba is obviously of predominance in finding optimal solutions. The weight vector we get is w=[1.0000, 0.9512, 0.7548, 0.0421, 0.0324, 0.0032, 0.0058, 0.0487, 0.0327, 0.0022], where the first 3 features have bigger weight. We can claim that Cgmba is converged. 3.2 Comparison results on IMM face database 3.2.1 IMM Face database We applied our algorithms to the IMM face database, which is a collection of digital color images of males and females with various facial expression, illumination conditions, pose and scale. Examples of the images are shown in figure 2. It contains 240 images of 40 persons (6 images per person). The size of original image is 480×640. Each image has a corresponding .asf file, which records the coordinates of 58 fiducial points. The position of 58 fiducial points marked on a face is shown in figure 3. 3 images per person are selected for training and all the images are applied for testing. 3.2.2 Features representation The facial features are denoted by the Gabor features of 58 fiducial points, namely expressed as the feature vector Ω in equation (4). 3.2.3 Comparison results of Simba with Cgmba on the face database As for feature selection, we generally set a threshold according to the weight of each feature. If the weight of a feature is greater than the threshold, the feature is selected, otherwise is ignored. Here the threshold is set 0.5. The classifier used is a 1-NN classifier. The experimental results are list in table 1. From the results, we know that Cgmba and Simba can provide 3.75, 1.25 percent improvement in classification rate respectively, though less than half of all features are used. Furthermore, Cgmba is obviously superior. In short, Table 1 Comparision results of the two algorithms based on margin Methods The number of features selected for each fiducial point Classification rate Cgmba 17 93.75% Simba 17 91.25% None 40 90.00% Fig.2 Examples in IMM face database Fig.3 Annotated fiducial points in the image (b) Distribution of selected features in different scales (c) Distribution of selected features in different orientations Fig.4 Distribution of selected Gabor features by Cgmba (a) Selected Gabor features by Cgmba Gabor features index 0.5 40 35 30 25 20 15 10 5 1 0 1.0 0 1 2 3 4 0 0.05 0.1 0.15 0.2 0.25 0 0.05 0.1 0.15 0.2 0.25 1 0 2 3 4 5 6 7 万方数据 光电工程 第 33 卷第 9 期 90 it is safe to say that Cgmba can find the optimal Gabor features and choose the most significant components in the orientations and scales used more effectively than Simba. The distribution of orientation and scales of the selected 17 features by Cgmba is shown in figure 4 (a). From figure 4(b), we can infer that the Gabor features in each scale have approximatively equal discriminative power for face recognition. Although it is put forward in some literatures that the features in smaller scales are desirable for distinguishing the nuance of faces, it is demonstrated here that the features in larger scales are also most significant. As illustrated in figure 4 (c), the features in vertical (u=2), 135°(u=3) orientations may contain the most distinguishable information. However, the features in other orientations have relatively faintish discriminative power, especially in 345°(u=7). 4 Conclusions In this study, we ha