特征选择（5）-递归消除法

18,245次阅读

共计 953 个字符，预计需要花费 3 分钟才能阅读完成。

上一篇文章使用最大信息系数筛选特征，本篇使用递归消除法筛选特征。

递归消除法，首先递归肯定是要循环执行多次来筛选特征，原理上首先要指定一个基模型，这个模型可以是lr或者decisionTree都可以，套用sklearn官方的说法

First, the estimator is trained on the initial set of features and the importance of each feature is obtained either through a coef_ attribute or through a feature_importances_ attribute.

这个基模型需要一个coef_属性，这个属性是用来描述特征的权重也就是其重要性，这样在每一轮训练过程中消除掉一些权重较小的特征直至训练结束，这就诠释了递归消除法

sklearn函数剖析

from sklearn.feature_selection import RFE
from sklearn.linear_model import LogisticRegression

#递归特征消除法，返回特征选择后的数据
#参数estimator为基模型
#参数n_features_to_select为选择的特征个数
RFE(estimator=LogisticRegression(), n_features_to_select=2).fit_transform(iris.data, iris.target)

输出

array([[ 3.5,  0.2],
       [ 3. ,  0.2],
       [ 3.2,  0.2],
       [ 3.1,  0.2],
       [ 3.6,  0.2],
       [ 3.9,  0.4],

这种基模型处理lr之外还可以是其他模型，只要满足之前描述的要求即可

from sklearn.svm import  SVC
svc = SVC(kernel="linear", C=1)
RFE(estimator=svc, n_features_to_select=2).fit_transform(irisdata.data, irisdata.target)

输出

array([[ 1.4,  0.2],
       [ 1.4,  0.2],
       [ 1.3,  0.2],
       [ 1.5,  0.2],
       [ 1.4,  0.2],
       [ 1.7,  0.4],
       [ 1.4,  0.3],
       [ 1.5,  0.2],
       [ 1.4,  0.2],

正文完

请博主喝杯咖啡吧！