Iris

简介

使用 sklearn 中鸢尾花数据集对特征处理功能进行说明。

IRIS数据集由Fisher在1936年整理,包含4个特征(Sepal.Length(花萼长度)、Sepal.Width(花萼宽度)、Petal.Length(花瓣长度)、Petal.Width(花瓣宽度)),特征值都为正浮点数,单位为厘米。目标值为鸢尾花的分类(Iris Setosa(山鸢尾)、Iris Versicolour(杂色鸢尾),Iris Virginica(维吉尼亚鸢尾))。

参考

加载数据

from sklearn.datasets import load_iris

# 导入数据集
iris = load_iris()

# 特征矩阵
print(type(iris.data))
print(iris.data[:5])

# 目标向量
print(type(iris.target))
print(iris.target)
<class 'numpy.ndarray'>
[[ 5.1  3.5  1.4  0.2]
 [ 4.9  3.   1.4  0.2]
 [ 4.7  3.2  1.3  0.2]
 [ 4.6  3.1  1.5  0.2]
 [ 5.   3.6  1.4  0.2]]
<class 'numpy.ndarray'>
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2]

特征预处理

无量纲化

标准化

使用preproccessing库的StandardScaler类对数据进行标准化的代码如下。

from sklearn.preprocessing import StandardScaler

# 标准化,返回值为标准化后的数据
StandardScaler().fit_transform(iris.data)[:10]
array([[-0.90068117,  1.03205722, -1.3412724 , -1.31297673],
       [-1.14301691, -0.1249576 , -1.3412724 , -1.31297673],
       [-1.38535265,  0.33784833, -1.39813811, -1.31297673],
       [-1.50652052,  0.10644536, -1.2844067 , -1.31297673],
       [-1.02184904,  1.26346019, -1.3412724 , -1.31297673],
       [-0.53717756,  1.95766909, -1.17067529, -1.05003079],
       [-1.50652052,  0.80065426, -1.3412724 , -1.18150376],
       [-1.02184904,  0.80065426, -1.2844067 , -1.31297673],
       [-1.74885626, -0.35636057, -1.3412724 , -1.31297673],
       [-1.14301691,  0.10644536, -1.2844067 , -1.4444497 ]])

区间缩放法

使用preproccessing库的MinMaxScaler类对数据进行区间缩放的代码如下。

from sklearn.preprocessing import MinMaxScaler

#区间缩放,返回值为缩放到[0, 1]区间的数据
MinMaxScaler().fit_transform(iris.data)[:10]
array([[ 0.22222222,  0.625     ,  0.06779661,  0.04166667],
       [ 0.16666667,  0.41666667,  0.06779661,  0.04166667],
       [ 0.11111111,  0.5       ,  0.05084746,  0.04166667],
       [ 0.08333333,  0.45833333,  0.08474576,  0.04166667],
       [ 0.19444444,  0.66666667,  0.06779661,  0.04166667],
       [ 0.30555556,  0.79166667,  0.11864407,  0.125     ],
       [ 0.08333333,  0.58333333,  0.06779661,  0.08333333],
       [ 0.19444444,  0.58333333,  0.08474576,  0.04166667],
       [ 0.02777778,  0.375     ,  0.06779661,  0.04166667],
       [ 0.16666667,  0.45833333,  0.08474576,  0.        ]])

标准化与归一化的区别

使用preproccessing库的Normalizer类对数据进行归一化的代码如下。

from sklearn.preprocessing import Normalizer

#归一化,返回值为归一化后的数据
Normalizer().fit_transform(iris.data)[:10]
array([[ 0.80377277,  0.55160877,  0.22064351,  0.0315205 ],
       [ 0.82813287,  0.50702013,  0.23660939,  0.03380134],
       [ 0.80533308,  0.54831188,  0.2227517 ,  0.03426949],
       [ 0.80003025,  0.53915082,  0.26087943,  0.03478392],
       [ 0.790965  ,  0.5694948 ,  0.2214702 ,  0.0316386 ],
       [ 0.78417499,  0.5663486 ,  0.2468699 ,  0.05808704],
       [ 0.78010936,  0.57660257,  0.23742459,  0.0508767 ],
       [ 0.80218492,  0.54548574,  0.24065548,  0.0320874 ],
       [ 0.80642366,  0.5315065 ,  0.25658935,  0.03665562],
       [ 0.81803119,  0.51752994,  0.25041771,  0.01669451]])

对定量特征二值化

使用preproccessing库的Binarizer类对数据进行二值化的代码如下。

from sklearn.preprocessing import Binarizer

#二值化,阈值设置为3,返回值为二值化后的数据
Binarizer(threshold=3).fit_transform(iris.data)[:5]
array([[ 1.,  1.,  0.,  0.],
       [ 1.,  0.,  0.,  0.],
       [ 1.,  1.,  0.,  0.],
       [ 1.,  1.,  0.,  0.],
       [ 1.,  1.,  0.,  0.]])

对定性特征哑编码

由于IRIS数据集的特征皆为定量特征,故使用其目标值进行哑编码(实际上是不需要的)。使用preproccessing库的OneHotEncoder类对数据进行哑编码的代码如下。

from sklearn.preprocessing import OneHotEncoder

# 哑编码,对IRIS数据集的目标值,返回值为哑编码后的数据
OneHotEncoder().fit_transform(iris.target.reshape((-1,1)))

' with 150 stored elements in Compressed Sparse Row format>

缺失值计算

由于IRIS数据集没有缺失值,故对数据集新增一个样本,4个特征均赋值为NaN,表示数据缺失。使用preproccessing库的Imputer类对数据进行缺失值计算的代码如下。

from numpy import vstack, array, nan
from sklearn.preprocessing import Imputer

#缺失值计算,返回值为计算缺失值后的数据
#参数missing_value为缺失值的表示形式,默认为NaN
#参数strategy为缺失值填充方式,默认为mean(均值)
Imputer().fit_transform(vstack((array([nan, nan, nan, nan]), iris.data)))
array([[ 5.84333333,  3.054     ,  3.75866667,  1.19866667],
       [ 5.1       ,  3.5       ,  1.4       ,  0.2       ],
       [ 4.9       ,  3.        ,  1.4       ,  0.2       ],
       [ 4.7       ,  3.2       ,  1.3       ,  0.2       ],
       [ 4.6       ,  3.1       ,  1.5       ,  0.2       ],
       [ 5.        ,  3.6       ,  1.4       ,  0.2       ],
       [ 5.4       ,  3.9       ,  1.7       ,  0.4       ],
       [ 4.6       ,  3.4       ,  1.4       ,  0.3       ],
       [ 5.        ,  3.4       ,  1.5       ,  0.2       ],
       [ 4.4       ,  2.9       ,  1.4       ,  0.2       ],
       [ 4.9       ,  3.1       ,  1.5       ,  0.1       ],
       [ 5.4       ,  3.7       ,  1.5       ,  0.2       ],
       [ 4.8       ,  3.4       ,  1.6       ,  0.2       ],
       [ 4.8       ,  3.        ,  1.4       ,  0.1       ],
       [ 4.3       ,  3.        ,  1.1       ,  0.1       ],
       [ 5.8       ,  4.        ,  1.2       ,  0.2       ],
       [ 5.7       ,  4.4       ,  1.5       ,  0.4       ],
       [ 5.4       ,  3.9       ,  1.3       ,  0.4       ],
       [ 5.1       ,  3.5       ,  1.4       ,  0.3       ],
       [ 5.7       ,  3.8       ,  1.7       ,  0.3       ],
       [ 5.1       ,  3.8       ,  1.5       ,  0.3       ],
       [ 5.4       ,  3.4       ,  1.7       ,  0.2       ],
       [ 5.1       ,  3.7       ,  1.5       ,  0.4       ],
       [ 4.6       ,  3.6       ,  1.        ,  0.2       ],
       [ 5.1       ,  3.3       ,  1.7       ,  0.5       ],
       [ 4.8       ,  3.4       ,  1.9       ,  0.2       ],
       [ 5.        ,  3.        ,  1.6       ,  0.2       ],
       [ 5.        ,  3.4       ,  1.6       ,  0.4       ],
       [ 5.2       ,  3.5       ,  1.5       ,  0.2       ],
       [ 5.2       ,  3.4       ,  1.4       ,  0.2       ],
       [ 4.7       ,  3.2       ,  1.6       ,  0.2       ],
       [ 4.8       ,  3.1       ,  1.6       ,  0.2       ],
       [ 5.4       ,  3.4       ,  1.5       ,  0.4       ],
       [ 5.2       ,  4.1       ,  1.5       ,  0.1       ],
       [ 5.5       ,  4.2       ,  1.4       ,  0.2       ],
       [ 4.9       ,  3.1       ,  1.5       ,  0.1       ],
       [ 5.        ,  3.2       ,  1.2       ,  0.2       ],
       [ 5.5       ,  3.5       ,  1.3       ,  0.2       ],
       [ 4.9       ,  3.1       ,  1.5       ,  0.1       ],
       [ 4.4       ,  3.        ,  1.3       ,  0.2       ],
       [ 5.1       ,  3.4       ,  1.5       ,  0.2       ],
       [ 5.        ,  3.5       ,  1.3       ,  0.3       ],
       [ 4.5       ,  2.3       ,  1.3       ,  0.3       ],
       [ 4.4       ,  3.2       ,  1.3       ,  0.2       ],
       [ 5.        ,  3.5       ,  1.6       ,  0.6       ],
       [ 5.1       ,  3.8       ,  1.9       ,  0.4       ],
       [ 4.8       ,  3.        ,  1.4       ,  0.3       ],
       [ 5.1       ,  3.8       ,  1.6       ,  0.2       ],
       [ 4.6       ,  3.2       ,  1.4       ,  0.2       ],
       [ 5.3       ,  3.7       ,  1.5       ,  0.2       ],
       [ 5.        ,  3.3       ,  1.4       ,  0.2       ],
       [ 7.        ,  3.2       ,  4.7       ,  1.4       ],
       [ 6.4       ,  3.2       ,  4.5       ,  1.5       ],
       [ 6.9       ,  3.1       ,  4.9       ,  1.5       ],
       [ 5.5       ,  2.3       ,  4.        ,  1.3       ],
       [ 6.5       ,  2.8       ,  4.6       ,  1.5       ],
       [ 5.7       ,  2.8       ,  4.5       ,  1.3       ],
       [ 6.3       ,  3.3       ,  4.7       ,  1.6       ],
       [ 4.9       ,  2.4       ,  3.3       ,  1.        ],
       [ 6.6       ,  2.9       ,  4.6       ,  1.3       ],
       [ 5.2       ,  2.7       ,  3.9       ,  1.4       ],
       [ 5.        ,  2.        ,  3.5       ,  1.        ],
       [ 5.9       ,  3.        ,  4.2       ,  1.5       ],
       [ 6.        ,  2.2       ,  4.        ,  1.        ],
       [ 6.1       ,  2.9       ,  4.7       ,  1.4       ],
       [ 5.6       ,  2.9       ,  3.6       ,  1.3       ],
       [ 6.7       ,  3.1       ,  4.4       ,  1.4       ],
       [ 5.6       ,  3.        ,  4.5       ,  1.5       ],
       [ 5.8       ,  2.7       ,  4.1       ,  1.        ],
       [ 6.2       ,  2.2       ,  4.5       ,  1.5       ],
       [ 5.6       ,  2.5       ,  3.9       ,  1.1       ],
       [ 5.9       ,  3.2       ,  4.8       ,  1.8       ],
       [ 6.1       ,  2.8       ,  4.        ,  1.3       ],
       [ 6.3       ,  2.5       ,  4.9       ,  1.5       ],
       [ 6.1       ,  2.8       ,  4.7       ,  1.2       ],
       [ 6.4       ,  2.9       ,  4.3       ,  1.3       ],
       [ 6.6       ,  3.        ,  4.4       ,  1.4       ],
       [ 6.8       ,  2.8       ,  4.8       ,  1.4       ],
       [ 6.7       ,  3.        ,  5.        ,  1.7       ],
       [ 6.        ,  2.9       ,  4.5       ,  1.5       ],
       [ 5.7       ,  2.6       ,  3.5       ,  1.        ],
       [ 5.5       ,  2.4       ,  3.8       ,  1.1       ],
       [ 5.5       ,  2.4       ,  3.7       ,  1.        ],
       [ 5.8       ,  2.7       ,  3.9       ,  1.2       ],
       [ 6.        ,  2.7       ,  5.1       ,  1.6       ],
       [ 5.4       ,  3.        ,  4.5       ,  1.5       ],
       [ 6.        ,  3.4       ,  4.5       ,  1.6       ],
       [ 6.7       ,  3.1       ,  4.7       ,  1.5       ],
       [ 6.3       ,  2.3       ,  4.4       ,  1.3       ],
       [ 5.6       ,  3.        ,  4.1       ,  1.3       ],
       [ 5.5       ,  2.5       ,  4.        ,  1.3       ],
       [ 5.5       ,  2.6       ,  4.4       ,  1.2       ],
       [ 6.1       ,  3.        ,  4.6       ,  1.4       ],
       [ 5.8       ,  2.6       ,  4.        ,  1.2       ],
       [ 5.        ,  2.3       ,  3.3       ,  1.        ],
       [ 5.6       ,  2.7       ,  4.2       ,  1.3       ],
       [ 5.7       ,  3.        ,  4.2       ,  1.2       ],
       [ 5.7       ,  2.9       ,  4.2       ,  1.3       ],
       [ 6.2       ,  2.9       ,  4.3       ,  1.3       ],
       [ 5.1       ,  2.5       ,  3.        ,  1.1       ],
       [ 5.7       ,  2.8       ,  4.1       ,  1.3       ],
       [ 6.3       ,  3.3       ,  6.        ,  2.5       ],
       [ 5.8       ,  2.7       ,  5.1       ,  1.9       ],
       [ 7.1       ,  3.        ,  5.9       ,  2.1       ],
       [ 6.3       ,  2.9       ,  5.6       ,  1.8       ],
       [ 6.5       ,  3.        ,  5.8       ,  2.2       ],
       [ 7.6       ,  3.        ,  6.6       ,  2.1       ],
       [ 4.9       ,  2.5       ,  4.5       ,  1.7       ],
       [ 7.3       ,  2.9       ,  6.3       ,  1.8       ],
       [ 6.7       ,  2.5       ,  5.8       ,  1.8       ],
       [ 7.2       ,  3.6       ,  6.1       ,  2.5       ],
       [ 6.5       ,  3.2       ,  5.1       ,  2.        ],
       [ 6.4       ,  2.7       ,  5.3       ,  1.9       ],
       [ 6.8       ,  3.        ,  5.5       ,  2.1       ],
       [ 5.7       ,  2.5       ,  5.        ,  2.        ],
       [ 5.8       ,  2.8       ,  5.1       ,  2.4       ],
       [ 6.4       ,  3.2       ,  5.3       ,  2.3       ],
       [ 6.5       ,  3.        ,  5.5       ,  1.8       ],
       [ 7.7       ,  3.8       ,  6.7       ,  2.2       ],
       [ 7.7       ,  2.6       ,  6.9       ,  2.3       ],
       [ 6.        ,  2.2       ,  5.        ,  1.5       ],
       [ 6.9       ,  3.2       ,  5.7       ,  2.3       ],
       [ 5.6       ,  2.8       ,  4.9       ,  2.        ],
       [ 7.7       ,  2.8       ,  6.7       ,  2.        ],
       [ 6.3       ,  2.7       ,  4.9       ,  1.8       ],
       [ 6.7       ,  3.3       ,  5.7       ,  2.1       ],
       [ 7.2       ,  3.2       ,  6.        ,  1.8       ],
       [ 6.2       ,  2.8       ,  4.8       ,  1.8       ],
       [ 6.1       ,  3.        ,  4.9       ,  1.8       ],
       [ 6.4       ,  2.8       ,  5.6       ,  2.1       ],
       [ 7.2       ,  3.        ,  5.8       ,  1.6       ],
       [ 7.4       ,  2.8       ,  6.1       ,  1.9       ],
       [ 7.9       ,  3.8       ,  6.4       ,  2.        ],
       [ 6.4       ,  2.8       ,  5.6       ,  2.2       ],
       [ 6.3       ,  2.8       ,  5.1       ,  1.5       ],
       [ 6.1       ,  2.6       ,  5.6       ,  1.4       ],
       [ 7.7       ,  3.        ,  6.1       ,  2.3       ],
       [ 6.3       ,  3.4       ,  5.6       ,  2.4       ],
       [ 6.4       ,  3.1       ,  5.5       ,  1.8       ],
       [ 6.        ,  3.        ,  4.8       ,  1.8       ],
       [ 6.9       ,  3.1       ,  5.4       ,  2.1       ],
       [ 6.7       ,  3.1       ,  5.6       ,  2.4       ],
       [ 6.9       ,  3.1       ,  5.1       ,  2.3       ],
       [ 5.8       ,  2.7       ,  5.1       ,  1.9       ],
       [ 6.8       ,  3.2       ,  5.9       ,  2.3       ],
       [ 6.7       ,  3.3       ,  5.7       ,  2.5       ],
       [ 6.7       ,  3.        ,  5.2       ,  2.3       ],
       [ 6.3       ,  2.5       ,  5.        ,  1.9       ],
       [ 6.5       ,  3.        ,  5.2       ,  2.        ],
       [ 6.2       ,  3.4       ,  5.4       ,  2.3       ],
       [ 5.9       ,  3.        ,  5.1       ,  1.8       ]])

数据变换

使用preproccessing库的PolynomialFeatures类对数据进行多项式转换的代码如下。

from sklearn.preprocessing import PolynomialFeatures

#多项式转换
#参数degree为度,默认值为2
PolynomialFeatures().fit_transform(iris.data)
array([[  1.  ,   5.1 ,   3.5 , ...,   1.96,   0.28,   0.04],
       [  1.  ,   4.9 ,   3.  , ...,   1.96,   0.28,   0.04],
       [  1.  ,   4.7 ,   3.2 , ...,   1.69,   0.26,   0.04],
       ..., 
       [  1.  ,   6.5 ,   3.  , ...,  27.04,  10.4 ,   4.  ],
       [  1.  ,   6.2 ,   3.4 , ...,  29.16,  12.42,   5.29],
       [  1.  ,   5.9 ,   3.  , ...,  26.01,   9.18,   3.24]])

基于单变元函数的数据变换可以使用一个统一的方式完成,使用preproccessing库的FunctionTransformer对数据进行对数函数转换的代码如下。

from numpy import log1p
from sklearn.preprocessing import FunctionTransformer

#自定义转换函数为对数函数的数据变换
#第一个参数是单变元函数
FunctionTransformer(log1p).fit_transform(iris.data)[:10]
array([[ 1.80828877,  1.5040774 ,  0.87546874,  0.18232156],
       [ 1.77495235,  1.38629436,  0.87546874,  0.18232156],
       [ 1.74046617,  1.43508453,  0.83290912,  0.18232156],
       [ 1.7227666 ,  1.41098697,  0.91629073,  0.18232156],
       [ 1.79175947,  1.5260563 ,  0.87546874,  0.18232156],
       [ 1.85629799,  1.58923521,  0.99325177,  0.33647224],
       [ 1.7227666 ,  1.48160454,  0.87546874,  0.26236426],
       [ 1.79175947,  1.48160454,  0.91629073,  0.18232156],
       [ 1.68639895,  1.36097655,  0.87546874,  0.18232156],
       [ 1.77495235,  1.41098697,  0.91629073,  0.09531018]])

特征选择

Filter

方差选择法

from sklearn.feature_selection import VarianceThreshold

#方差选择法,返回值为特征选择后的数据
#参数threshold为方差的阈值
VarianceThreshold(threshold=3).fit_transform(iris.data)
array([[ 1.4],
       [ 1.4],
       [ 1.3],
       [ 1.5],
       [ 1.4],
       [ 1.7],
       [ 1.4],
       [ 1.5],
       [ 1.4],
       [ 1.5],
       [ 1.5],
       [ 1.6],
       [ 1.4],
       [ 1.1],
       [ 1.2],
       [ 1.5],
       [ 1.3],
       [ 1.4],
       [ 1.7],
       [ 1.5],
       [ 1.7],
       [ 1.5],
       [ 1. ],
       [ 1.7],
       [ 1.9],
       [ 1.6],
       [ 1.6],
       [ 1.5],
       [ 1.4],
       [ 1.6],
       [ 1.6],
       [ 1.5],
       [ 1.5],
       [ 1.4],
       [ 1.5],
       [ 1.2],
       [ 1.3],
       [ 1.5],
       [ 1.3],
       [ 1.5],
       [ 1.3],
       [ 1.3],
       [ 1.3],
       [ 1.6],
       [ 1.9],
       [ 1.4],
       [ 1.6],
       [ 1.4],
       [ 1.5],
       [ 1.4],
       [ 4.7],
       [ 4.5],
       [ 4.9],
       [ 4. ],
       [ 4.6],
       [ 4.5],
       [ 4.7],
       [ 3.3],
       [ 4.6],
       [ 3.9],
       [ 3.5],
       [ 4.2],
       [ 4. ],
       [ 4.7],
       [ 3.6],
       [ 4.4],
       [ 4.5],
       [ 4.1],
       [ 4.5],
       [ 3.9],
       [ 4.8],
       [ 4. ],
       [ 4.9],
       [ 4.7],
       [ 4.3],
       [ 4.4],
       [ 4.8],
       [ 5. ],
       [ 4.5],
       [ 3.5],
       [ 3.8],
       [ 3.7],
       [ 3.9],
       [ 5.1],
       [ 4.5],
       [ 4.5],
       [ 4.7],
       [ 4.4],
       [ 4.1],
       [ 4. ],
       [ 4.4],
       [ 4.6],
       [ 4. ],
       [ 3.3],
       [ 4.2],
       [ 4.2],
       [ 4.2],
       [ 4.3],
       [ 3. ],
       [ 4.1],
       [ 6. ],
       [ 5.1],
       [ 5.9],
       [ 5.6],
       [ 5.8],
       [ 6.6],
       [ 4.5],
       [ 6.3],
       [ 5.8],
       [ 6.1],
       [ 5.1],
       [ 5.3],
       [ 5.5],
       [ 5. ],
       [ 5.1],
       [ 5.3],
       [ 5.5],
       [ 6.7],
       [ 6.9],
       [ 5. ],
       [ 5.7],
       [ 4.9],
       [ 6.7],
       [ 4.9],
       [ 5.7],
       [ 6. ],
       [ 4.8],
       [ 4.9],
       [ 5.6],
       [ 5.8],
       [ 6.1],
       [ 6.4],
       [ 5.6],
       [ 5.1],
       [ 5.6],
       [ 6.1],
       [ 5.6],
       [ 5.5],
       [ 4.8],
       [ 5.4],
       [ 5.6],
       [ 5.1],
       [ 5.1],
       [ 5.9],
       [ 5.7],
       [ 5.2],
       [ 5. ],
       [ 5.2],
       [ 5.4],
       [ 5.1]])

相关系数法

用feature_selection库的SelectKBest类结合相关系数来选择特征的代码如下。

from sklearn.feature_selection import SelectKBest
from scipy.stats import pearsonr

# 选择K个最好的特征,返回选择特征后的数据
# 第一个参数为计算评估特征是否好的函数,该函数输入特征矩阵和目标向量,输出二元组(评分,P值)的数组,数组第i项为第i个特征的评分和P值,在此定义为计算相关系数。
# 参数k为选择的特征个数

def get_pearsonr(X, y):
    m = map(lambda x: pearsonr(x, y), X.T)
    res = array(list(m)).T
    return (res[0], res[1])

SelectKBest(get_pearsonr, k=2).fit_transform(iris.data, iris.target)
# SelectKBest(lambda X, Y: array(list(map(lambda x: pearsonr(x, Y)[0], X.T))).T, k=2).fit_transform(iris.data, iris.target)
array([[ 1.4,  0.2],
       [ 1.4,  0.2],
       [ 1.3,  0.2],
       [ 1.5,  0.2],
       [ 1.4,  0.2],
       [ 1.7,  0.4],
       [ 1.4,  0.3],
       [ 1.5,  0.2],
       [ 1.4,  0.2],
       [ 1.5,  0.1],
       [ 1.5,  0.2],
       [ 1.6,  0.2],
       [ 1.4,  0.1],
       [ 1.1,  0.1],
       [ 1.2,  0.2],
       [ 1.5,  0.4],
       [ 1.3,  0.4],
       [ 1.4,  0.3],
       [ 1.7,  0.3],
       [ 1.5,  0.3],
       [ 1.7,  0.2],
       [ 1.5,  0.4],
       [ 1. ,  0.2],
       [ 1.7,  0.5],
       [ 1.9,  0.2],
       [ 1.6,  0.2],
       [ 1.6,  0.4],
       [ 1.5,  0.2],
       [ 1.4,  0.2],
       [ 1.6,  0.2],
       [ 1.6,  0.2],
       [ 1.5,  0.4],
       [ 1.5,  0.1],
       [ 1.4,  0.2],
       [ 1.5,  0.1],
       [ 1.2,  0.2],
       [ 1.3,  0.2],
       [ 1.5,  0.1],
       [ 1.3,  0.2],
       [ 1.5,  0.2],
       [ 1.3,  0.3],
       [ 1.3,  0.3],
       [ 1.3,  0.2],
       [ 1.6,  0.6],
       [ 1.9,  0.4],
       [ 1.4,  0.3],
       [ 1.6,  0.2],
       [ 1.4,  0.2],
       [ 1.5,  0.2],
       [ 1.4,  0.2],
       [ 4.7,  1.4],
       [ 4.5,  1.5],
       [ 4.9,  1.5],
       [ 4. ,  1.3],
       [ 4.6,  1.5],
       [ 4.5,  1.3],
       [ 4.7,  1.6],
       [ 3.3,  1. ],
       [ 4.6,  1.3],
       [ 3.9,  1.4],
       [ 3.5,  1. ],
       [ 4.2,  1.5],
       [ 4. ,  1. ],
       [ 4.7,  1.4],
       [ 3.6,  1.3],
       [ 4.4,  1.4],
       [ 4.5,  1.5],
       [ 4.1,  1. ],
       [ 4.5,  1.5],
       [ 3.9,  1.1],
       [ 4.8,  1.8],
       [ 4. ,  1.3],
       [ 4.9,  1.5],
       [ 4.7,  1.2],
       [ 4.3,  1.3],
       [ 4.4,  1.4],
       [ 4.8,  1.4],
       [ 5. ,  1.7],
       [ 4.5,  1.5],
       [ 3.5,  1. ],
       [ 3.8,  1.1],
       [ 3.7,  1. ],
       [ 3.9,  1.2],
       [ 5.1,  1.6],
       [ 4.5,  1.5],
       [ 4.5,  1.6],
       [ 4.7,  1.5],
       [ 4.4,  1.3],
       [ 4.1,  1.3],
       [ 4. ,  1.3],
       [ 4.4,  1.2],
       [ 4.6,  1.4],
       [ 4. ,  1.2],
       [ 3.3,  1. ],
       [ 4.2,  1.3],
       [ 4.2,  1.2],
       [ 4.2,  1.3],
       [ 4.3,  1.3],
       [ 3. ,  1.1],
       [ 4.1,  1.3],
       [ 6. ,  2.5],
       [ 5.1,  1.9],
       [ 5.9,  2.1],
       [ 5.6,  1.8],
       [ 5.8,  2.2],
       [ 6.6,  2.1],
       [ 4.5,  1.7],
       [ 6.3,  1.8],
       [ 5.8,  1.8],
       [ 6.1,  2.5],
       [ 5.1,  2. ],
       [ 5.3,  1.9],
       [ 5.5,  2.1],
       [ 5. ,  2. ],
       [ 5.1,  2.4],
       [ 5.3,  2.3],
       [ 5.5,  1.8],
       [ 6.7,  2.2],
       [ 6.9,  2.3],
       [ 5. ,  1.5],
       [ 5.7,  2.3],
       [ 4.9,  2. ],
       [ 6.7,  2. ],
       [ 4.9,  1.8],
       [ 5.7,  2.1],
       [ 6. ,  1.8],
       [ 4.8,  1.8],
       [ 4.9,  1.8],
       [ 5.6,  2.1],
       [ 5.8,  1.6],
       [ 6.1,  1.9],
       [ 6.4,  2. ],
       [ 5.6,  2.2],
       [ 5.1,  1.5],
       [ 5.6,  1.4],
       [ 6.1,  2.3],
       [ 5.6,  2.4],
       [ 5.5,  1.8],
       [ 4.8,  1.8],
       [ 5.4,  2.1],
       [ 5.6,  2.4],
       [ 5.1,  2.3],
       [ 5.1,  1.9],
       [ 5.9,  2.3],
       [ 5.7,  2.5],
       [ 5.2,  2.3],
       [ 5. ,  1.9],
       [ 5.2,  2. ],
       [ 5.4,  2.3],
       [ 5.1,  1.8]])

卡方检验

用feature_selection库的SelectKBest类结合卡方检验来选择特征的代码如下:

from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2

#选择K个最好的特征,返回选择特征后的数据
SelectKBest(chi2, k=2).fit_transform(iris.data, iris.target)
array([[ 1.4,  0.2],
       [ 1.4,  0.2],
       [ 1.3,  0.2],
       [ 1.5,  0.2],
       [ 1.4,  0.2],
       [ 1.7,  0.4],
       [ 1.4,  0.3],
       [ 1.5,  0.2],
       [ 1.4,  0.2],
       [ 1.5,  0.1],
       [ 1.5,  0.2],
       [ 1.6,  0.2],
       [ 1.4,  0.1],
       [ 1.1,  0.1],
       [ 1.2,  0.2],
       [ 1.5,  0.4],
       [ 1.3,  0.4],
       [ 1.4,  0.3],
       [ 1.7,  0.3],
       [ 1.5,  0.3],
       [ 1.7,  0.2],
       [ 1.5,  0.4],
       [ 1. ,  0.2],
       [ 1.7,  0.5],
       [ 1.9,  0.2],
       [ 1.6,  0.2],
       [ 1.6,  0.4],
       [ 1.5,  0.2],
       [ 1.4,  0.2],
       [ 1.6,  0.2],
       [ 1.6,  0.2],
       [ 1.5,  0.4],
       [ 1.5,  0.1],
       [ 1.4,  0.2],
       [ 1.5,  0.1],
       [ 1.2,  0.2],
       [ 1.3,  0.2],
       [ 1.5,  0.1],
       [ 1.3,  0.2],
       [ 1.5,  0.2],
       [ 1.3,  0.3],
       [ 1.3,  0.3],
       [ 1.3,  0.2],
       [ 1.6,  0.6],
       [ 1.9,  0.4],
       [ 1.4,  0.3],
       [ 1.6,  0.2],
       [ 1.4,  0.2],
       [ 1.5,  0.2],
       [ 1.4,  0.2],
       [ 4.7,  1.4],
       [ 4.5,  1.5],
       [ 4.9,  1.5],
       [ 4. ,  1.3],
       [ 4.6,  1.5],
       [ 4.5,  1.3],
       [ 4.7,  1.6],
       [ 3.3,  1. ],
       [ 4.6,  1.3],
       [ 3.9,  1.4],
       [ 3.5,  1. ],
       [ 4.2,  1.5],
       [ 4. ,  1. ],
       [ 4.7,  1.4],
       [ 3.6,  1.3],
       [ 4.4,  1.4],
       [ 4.5,  1.5],
       [ 4.1,  1. ],
       [ 4.5,  1.5],
       [ 3.9,  1.1],
       [ 4.8,  1.8],
       [ 4. ,  1.3],
       [ 4.9,  1.5],
       [ 4.7,  1.2],
       [ 4.3,  1.3],
       [ 4.4,  1.4],
       [ 4.8,  1.4],
       [ 5. ,  1.7],
       [ 4.5,  1.5],
       [ 3.5,  1. ],
       [ 3.8,  1.1],
       [ 3.7,  1. ],
       [ 3.9,  1.2],
       [ 5.1,  1.6],
       [ 4.5,  1.5],
       [ 4.5,  1.6],
       [ 4.7,  1.5],
       [ 4.4,  1.3],
       [ 4.1,  1.3],
       [ 4. ,  1.3],
       [ 4.4,  1.2],
       [ 4.6,  1.4],
       [ 4. ,  1.2],
       [ 3.3,  1. ],
       [ 4.2,  1.3],
       [ 4.2,  1.2],
       [ 4.2,  1.3],
       [ 4.3,  1.3],
       [ 3. ,  1.1],
       [ 4.1,  1.3],
       [ 6. ,  2.5],
       [ 5.1,  1.9],
       [ 5.9,  2.1],
       [ 5.6,  1.8],
       [ 5.8,  2.2],
       [ 6.6,  2.1],
       [ 4.5,  1.7],
       [ 6.3,  1.8],
       [ 5.8,  1.8],
       [ 6.1,  2.5],
       [ 5.1,  2. ],
       [ 5.3,  1.9],
       [ 5.5,  2.1],
       [ 5. ,  2. ],
       [ 5.1,  2.4],
       [ 5.3,  2.3],
       [ 5.5,  1.8],
       [ 6.7,  2.2],
       [ 6.9,  2.3],
       [ 5. ,  1.5],
       [ 5.7,  2.3],
       [ 4.9,  2. ],
       [ 6.7,  2. ],
       [ 4.9,  1.8],
       [ 5.7,  2.1],
       [ 6. ,  1.8],
       [ 4.8,  1.8],
       [ 4.9,  1.8],
       [ 5.6,  2.1],
       [ 5.8,  1.6],
       [ 6.1,  1.9],
       [ 6.4,  2. ],
       [ 5.6,  2.2],
       [ 5.1,  1.5],
       [ 5.6,  1.4],
       [ 6.1,  2.3],
       [ 5.6,  2.4],
       [ 5.5,  1.8],
       [ 4.8,  1.8],
       [ 5.4,  2.1],
       [ 5.6,  2.4],
       [ 5.1,  2.3],
       [ 5.1,  1.9],
       [ 5.9,  2.3],
       [ 5.7,  2.5],
       [ 5.2,  2.3],
       [ 5. ,  1.9],
       [ 5.2,  2. ],
       [ 5.4,  2.3],
       [ 5.1,  1.8]])

互信息法

使用feature_selection库的SelectKBest类结合最大信息系数法来选择特征的代码如下:

from sklearn.feature_selection import SelectKBest
from minepy import MINE

# 由于MINE的设计不是函数式的,定义mic方法将其为函数式的
def mic(x, y):
    m = MINE()
    m.compute_score(x, y)
    return m.mic()

#选择K个最好的特征,返回特征选择后的数据
SelectKBest(lambda X, Y: array(list(map(lambda x: mic(x, Y), X.T))).T, k=2).fit_transform(iris.data, iris.target)
array([[ 1.4,  0.2],
       [ 1.4,  0.2],
       [ 1.3,  0.2],
       [ 1.5,  0.2],
       [ 1.4,  0.2],
       [ 1.7,  0.4],
       [ 1.4,  0.3],
       [ 1.5,  0.2],
       [ 1.4,  0.2],
       [ 1.5,  0.1],
       [ 1.5,  0.2],
       [ 1.6,  0.2],
       [ 1.4,  0.1],
       [ 1.1,  0.1],
       [ 1.2,  0.2],
       [ 1.5,  0.4],
       [ 1.3,  0.4],
       [ 1.4,  0.3],
       [ 1.7,  0.3],
       [ 1.5,  0.3],
       [ 1.7,  0.2],
       [ 1.5,  0.4],
       [ 1. ,  0.2],
       [ 1.7,  0.5],
       [ 1.9,  0.2],
       [ 1.6,  0.2],
       [ 1.6,  0.4],
       [ 1.5,  0.2],
       [ 1.4,  0.2],
       [ 1.6,  0.2],
       [ 1.6,  0.2],
       [ 1.5,  0.4],
       [ 1.5,  0.1],
       [ 1.4,  0.2],
       [ 1.5,  0.1],
       [ 1.2,  0.2],
       [ 1.3,  0.2],
       [ 1.5,  0.1],
       [ 1.3,  0.2],
       [ 1.5,  0.2],
       [ 1.3,  0.3],
       [ 1.3,  0.3],
       [ 1.3,  0.2],
       [ 1.6,  0.6],
       [ 1.9,  0.4],
       [ 1.4,  0.3],
       [ 1.6,  0.2],
       [ 1.4,  0.2],
       [ 1.5,  0.2],
       [ 1.4,  0.2],
       [ 4.7,  1.4],
       [ 4.5,  1.5],
       [ 4.9,  1.5],
       [ 4. ,  1.3],
       [ 4.6,  1.5],
       [ 4.5,  1.3],
       [ 4.7,  1.6],
       [ 3.3,  1. ],
       [ 4.6,  1.3],
       [ 3.9,  1.4],
       [ 3.5,  1. ],
       [ 4.2,  1.5],
       [ 4. ,  1. ],
       [ 4.7,  1.4],
       [ 3.6,  1.3],
       [ 4.4,  1.4],
       [ 4.5,  1.5],
       [ 4.1,  1. ],
       [ 4.5,  1.5],
       [ 3.9,  1.1],
       [ 4.8,  1.8],
       [ 4. ,  1.3],
       [ 4.9,  1.5],
       [ 4.7,  1.2],
       [ 4.3,  1.3],
       [ 4.4,  1.4],
       [ 4.8,  1.4],
       [ 5. ,  1.7],
       [ 4.5,  1.5],
       [ 3.5,  1. ],
       [ 3.8,  1.1],
       [ 3.7,  1. ],
       [ 3.9,  1.2],
       [ 5.1,  1.6],
       [ 4.5,  1.5],
       [ 4.5,  1.6],
       [ 4.7,  1.5],
       [ 4.4,  1.3],
       [ 4.1,  1.3],
       [ 4. ,  1.3],
       [ 4.4,  1.2],
       [ 4.6,  1.4],
       [ 4. ,  1.2],
       [ 3.3,  1. ],
       [ 4.2,  1.3],
       [ 4.2,  1.2],
       [ 4.2,  1.3],
       [ 4.3,  1.3],
       [ 3. ,  1.1],
       [ 4.1,  1.3],
       [ 6. ,  2.5],
       [ 5.1,  1.9],
       [ 5.9,  2.1],
       [ 5.6,  1.8],
       [ 5.8,  2.2],
       [ 6.6,  2.1],
       [ 4.5,  1.7],
       [ 6.3,  1.8],
       [ 5.8,  1.8],
       [ 6.1,  2.5],
       [ 5.1,  2. ],
       [ 5.3,  1.9],
       [ 5.5,  2.1],
       [ 5. ,  2. ],
       [ 5.1,  2.4],
       [ 5.3,  2.3],
       [ 5.5,  1.8],
       [ 6.7,  2.2],
       [ 6.9,  2.3],
       [ 5. ,  1.5],
       [ 5.7,  2.3],
       [ 4.9,  2. ],
       [ 6.7,  2. ],
       [ 4.9,  1.8],
       [ 5.7,  2.1],
       [ 6. ,  1.8],
       [ 4.8,  1.8],
       [ 4.9,  1.8],
       [ 5.6,  2.1],
       [ 5.8,  1.6],
       [ 6.1,  1.9],
       [ 6.4,  2. ],
       [ 5.6,  2.2],
       [ 5.1,  1.5],
       [ 5.6,  1.4],
       [ 6.1,  2.3],
       [ 5.6,  2.4],
       [ 5.5,  1.8],
       [ 4.8,  1.8],
       [ 5.4,  2.1],
       [ 5.6,  2.4],
       [ 5.1,  2.3],
       [ 5.1,  1.9],
       [ 5.9,  2.3],
       [ 5.7,  2.5],
       [ 5.2,  2.3],
       [ 5. ,  1.9],
       [ 5.2,  2. ],
       [ 5.4,  2.3],
       [ 5.1,  1.8]])

Wrapper

递归特征消除法

使用feature_selection库的RFE类来选择特征的代码如下。

from sklearn.feature_selection import RFE
from sklearn.linear_model import LogisticRegression

# 递归特征消除法,返回特征选择后的数据
# 参数estimator为基模型
# 参数n_features_to_select为选择的特征个数
RFE(estimator=LogisticRegression(), n_features_to_select=2).fit_transform(iris.data, iris.target)
array([[ 3.5,  0.2],
       [ 3. ,  0.2],
       [ 3.2,  0.2],
       [ 3.1,  0.2],
       [ 3.6,  0.2],
       [ 3.9,  0.4],
       [ 3.4,  0.3],
       [ 3.4,  0.2],
       [ 2.9,  0.2],
       [ 3.1,  0.1],
       [ 3.7,  0.2],
       [ 3.4,  0.2],
       [ 3. ,  0.1],
       [ 3. ,  0.1],
       [ 4. ,  0.2],
       [ 4.4,  0.4],
       [ 3.9,  0.4],
       [ 3.5,  0.3],
       [ 3.8,  0.3],
       [ 3.8,  0.3],
       [ 3.4,  0.2],
       [ 3.7,  0.4],
       [ 3.6,  0.2],
       [ 3.3,  0.5],
       [ 3.4,  0.2],
       [ 3. ,  0.2],
       [ 3.4,  0.4],
       [ 3.5,  0.2],
       [ 3.4,  0.2],
       [ 3.2,  0.2],
       [ 3.1,  0.2],
       [ 3.4,  0.4],
       [ 4.1,  0.1],
       [ 4.2,  0.2],
       [ 3.1,  0.1],
       [ 3.2,  0.2],
       [ 3.5,  0.2],
       [ 3.1,  0.1],
       [ 3. ,  0.2],
       [ 3.4,  0.2],
       [ 3.5,  0.3],
       [ 2.3,  0.3],
       [ 3.2,  0.2],
       [ 3.5,  0.6],
       [ 3.8,  0.4],
       [ 3. ,  0.3],
       [ 3.8,  0.2],
       [ 3.2,  0.2],
       [ 3.7,  0.2],
       [ 3.3,  0.2],
       [ 3.2,  1.4],
       [ 3.2,  1.5],
       [ 3.1,  1.5],
       [ 2.3,  1.3],
       [ 2.8,  1.5],
       [ 2.8,  1.3],
       [ 3.3,  1.6],
       [ 2.4,  1. ],
       [ 2.9,  1.3],
       [ 2.7,  1.4],
       [ 2. ,  1. ],
       [ 3. ,  1.5],
       [ 2.2,  1. ],
       [ 2.9,  1.4],
       [ 2.9,  1.3],
       [ 3.1,  1.4],
       [ 3. ,  1.5],
       [ 2.7,  1. ],
       [ 2.2,  1.5],
       [ 2.5,  1.1],
       [ 3.2,  1.8],
       [ 2.8,  1.3],
       [ 2.5,  1.5],
       [ 2.8,  1.2],
       [ 2.9,  1.3],
       [ 3. ,  1.4],
       [ 2.8,  1.4],
       [ 3. ,  1.7],
       [ 2.9,  1.5],
       [ 2.6,  1. ],
       [ 2.4,  1.1],
       [ 2.4,  1. ],
       [ 2.7,  1.2],
       [ 2.7,  1.6],
       [ 3. ,  1.5],
       [ 3.4,  1.6],
       [ 3.1,  1.5],
       [ 2.3,  1.3],
       [ 3. ,  1.3],
       [ 2.5,  1.3],
       [ 2.6,  1.2],
       [ 3. ,  1.4],
       [ 2.6,  1.2],
       [ 2.3,  1. ],
       [ 2.7,  1.3],
       [ 3. ,  1.2],
       [ 2.9,  1.3],
       [ 2.9,  1.3],
       [ 2.5,  1.1],
       [ 2.8,  1.3],
       [ 3.3,  2.5],
       [ 2.7,  1.9],
       [ 3. ,  2.1],
       [ 2.9,  1.8],
       [ 3. ,  2.2],
       [ 3. ,  2.1],
       [ 2.5,  1.7],
       [ 2.9,  1.8],
       [ 2.5,  1.8],
       [ 3.6,  2.5],
       [ 3.2,  2. ],
       [ 2.7,  1.9],
       [ 3. ,  2.1],
       [ 2.5,  2. ],
       [ 2.8,  2.4],
       [ 3.2,  2.3],
       [ 3. ,  1.8],
       [ 3.8,  2.2],
       [ 2.6,  2.3],
       [ 2.2,  1.5],
       [ 3.2,  2.3],
       [ 2.8,  2. ],
       [ 2.8,  2. ],
       [ 2.7,  1.8],
       [ 3.3,  2.1],
       [ 3.2,  1.8],
       [ 2.8,  1.8],
       [ 3. ,  1.8],
       [ 2.8,  2.1],
       [ 3. ,  1.6],
       [ 2.8,  1.9],
       [ 3.8,  2. ],
       [ 2.8,  2.2],
       [ 2.8,  1.5],
       [ 2.6,  1.4],
       [ 3. ,  2.3],
       [ 3.4,  2.4],
       [ 3.1,  1.8],
       [ 3. ,  1.8],
       [ 3.1,  2.1],
       [ 3.1,  2.4],
       [ 3.1,  2.3],
       [ 2.7,  1.9],
       [ 3.2,  2.3],
       [ 3.3,  2.5],
       [ 3. ,  2.3],
       [ 2.5,  1.9],
       [ 3. ,  2. ],
       [ 3.4,  2.3],
       [ 3. ,  1.8]])

Embedded

基于惩罚项的特征选择法

使用feature_selection库的SelectFromModel类结合带L1惩罚项的逻辑回归模型,来选择特征的代码如下:

from sklearn.feature_selection import SelectFromModel
from sklearn.linear_model import LogisticRegression

#带L1惩罚项的逻辑回归作为基模型的特征选择
SelectFromModel(LogisticRegression(penalty="l1", C=0.1)).fit_transform(iris.data, iris.target)
array([[ 5.1,  3.5,  1.4],
       [ 4.9,  3. ,  1.4],
       [ 4.7,  3.2,  1.3],
       [ 4.6,  3.1,  1.5],
       [ 5. ,  3.6,  1.4],
       [ 5.4,  3.9,  1.7],
       [ 4.6,  3.4,  1.4],
       [ 5. ,  3.4,  1.5],
       [ 4.4,  2.9,  1.4],
       [ 4.9,  3.1,  1.5],
       [ 5.4,  3.7,  1.5],
       [ 4.8,  3.4,  1.6],
       [ 4.8,  3. ,  1.4],
       [ 4.3,  3. ,  1.1],
       [ 5.8,  4. ,  1.2],
       [ 5.7,  4.4,  1.5],
       [ 5.4,  3.9,  1.3],
       [ 5.1,  3.5,  1.4],
       [ 5.7,  3.8,  1.7],
       [ 5.1,  3.8,  1.5],
       [ 5.4,  3.4,  1.7],
       [ 5.1,  3.7,  1.5],
       [ 4.6,  3.6,  1. ],
       [ 5.1,  3.3,  1.7],
       [ 4.8,  3.4,  1.9],
       [ 5. ,  3. ,  1.6],
       [ 5. ,  3.4,  1.6],
       [ 5.2,  3.5,  1.5],
       [ 5.2,  3.4,  1.4],
       [ 4.7,  3.2,  1.6],
       [ 4.8,  3.1,  1.6],
       [ 5.4,  3.4,  1.5],
       [ 5.2,  4.1,  1.5],
       [ 5.5,  4.2,  1.4],
       [ 4.9,  3.1,  1.5],
       [ 5. ,  3.2,  1.2],
       [ 5.5,  3.5,  1.3],
       [ 4.9,  3.1,  1.5],
       [ 4.4,  3. ,  1.3],
       [ 5.1,  3.4,  1.5],
       [ 5. ,  3.5,  1.3],
       [ 4.5,  2.3,  1.3],
       [ 4.4,  3.2,  1.3],
       [ 5. ,  3.5,  1.6],
       [ 5.1,  3.8,  1.9],
       [ 4.8,  3. ,  1.4],
       [ 5.1,  3.8,  1.6],
       [ 4.6,  3.2,  1.4],
       [ 5.3,  3.7,  1.5],
       [ 5. ,  3.3,  1.4],
       [ 7. ,  3.2,  4.7],
       [ 6.4,  3.2,  4.5],
       [ 6.9,  3.1,  4.9],
       [ 5.5,  2.3,  4. ],
       [ 6.5,  2.8,  4.6],
       [ 5.7,  2.8,  4.5],
       [ 6.3,  3.3,  4.7],
       [ 4.9,  2.4,  3.3],
       [ 6.6,  2.9,  4.6],
       [ 5.2,  2.7,  3.9],
       [ 5. ,  2. ,  3.5],
       [ 5.9,  3. ,  4.2],
       [ 6. ,  2.2,  4. ],
       [ 6.1,  2.9,  4.7],
       [ 5.6,  2.9,  3.6],
       [ 6.7,  3.1,  4.4],
       [ 5.6,  3. ,  4.5],
       [ 5.8,  2.7,  4.1],
       [ 6.2,  2.2,  4.5],
       [ 5.6,  2.5,  3.9],
       [ 5.9,  3.2,  4.8],
       [ 6.1,  2.8,  4. ],
       [ 6.3,  2.5,  4.9],
       [ 6.1,  2.8,  4.7],
       [ 6.4,  2.9,  4.3],
       [ 6.6,  3. ,  4.4],
       [ 6.8,  2.8,  4.8],
       [ 6.7,  3. ,  5. ],
       [ 6. ,  2.9,  4.5],
       [ 5.7,  2.6,  3.5],
       [ 5.5,  2.4,  3.8],
       [ 5.5,  2.4,  3.7],
       [ 5.8,  2.7,  3.9],
       [ 6. ,  2.7,  5.1],
       [ 5.4,  3. ,  4.5],
       [ 6. ,  3.4,  4.5],
       [ 6.7,  3.1,  4.7],
       [ 6.3,  2.3,  4.4],
       [ 5.6,  3. ,  4.1],
       [ 5.5,  2.5,  4. ],
       [ 5.5,  2.6,  4.4],
       [ 6.1,  3. ,  4.6],
       [ 5.8,  2.6,  4. ],
       [ 5. ,  2.3,  3.3],
       [ 5.6,  2.7,  4.2],
       [ 5.7,  3. ,  4.2],
       [ 5.7,  2.9,  4.2],
       [ 6.2,  2.9,  4.3],
       [ 5.1,  2.5,  3. ],
       [ 5.7,  2.8,  4.1],
       [ 6.3,  3.3,  6. ],
       [ 5.8,  2.7,  5.1],
       [ 7.1,  3. ,  5.9],
       [ 6.3,  2.9,  5.6],
       [ 6.5,  3. ,  5.8],
       [ 7.6,  3. ,  6.6],
       [ 4.9,  2.5,  4.5],
       [ 7.3,  2.9,  6.3],
       [ 6.7,  2.5,  5.8],
       [ 7.2,  3.6,  6.1],
       [ 6.5,  3.2,  5.1],
       [ 6.4,  2.7,  5.3],
       [ 6.8,  3. ,  5.5],
       [ 5.7,  2.5,  5. ],
       [ 5.8,  2.8,  5.1],
       [ 6.4,  3.2,  5.3],
       [ 6.5,  3. ,  5.5],
       [ 7.7,  3.8,  6.7],
       [ 7.7,  2.6,  6.9],
       [ 6. ,  2.2,  5. ],
       [ 6.9,  3.2,  5.7],
       [ 5.6,  2.8,  4.9],
       [ 7.7,  2.8,  6.7],
       [ 6.3,  2.7,  4.9],
       [ 6.7,  3.3,  5.7],
       [ 7.2,  3.2,  6. ],
       [ 6.2,  2.8,  4.8],
       [ 6.1,  3. ,  4.9],
       [ 6.4,  2.8,  5.6],
       [ 7.2,  3. ,  5.8],
       [ 7.4,  2.8,  6.1],
       [ 7.9,  3.8,  6.4],
       [ 6.4,  2.8,  5.6],
       [ 6.3,  2.8,  5.1],
       [ 6.1,  2.6,  5.6],
       [ 7.7,  3. ,  6.1],
       [ 6.3,  3.4,  5.6],
       [ 6.4,  3.1,  5.5],
       [ 6. ,  3. ,  4.8],
       [ 6.9,  3.1,  5.4],
       [ 6.7,  3.1,  5.6],
       [ 6.9,  3.1,  5.1],
       [ 5.8,  2.7,  5.1],
       [ 6.8,  3.2,  5.9],
       [ 6.7,  3.3,  5.7],
       [ 6.7,  3. ,  5.2],
       [ 6.3,  2.5,  5. ],
       [ 6.5,  3. ,  5.2],
       [ 6.2,  3.4,  5.4],
       [ 5.9,  3. ,  5.1]])

实际上,L1惩罚项降维的原理在于保留多个对目标值具有同等相关性的特征中的一个,所以没选到的特征不代表不重要。故,可结合L2惩罚项来优化。具体操作为:若一个特征在L1中的权值为1,选择在L2中权值差别不大且在L1中权值为0的特征构成同类集合,将这一集合中的特征平分L1中的权值,故需要构建一个新的逻辑回归模型:

from sklearn.linear_model import LogisticRegression

class LR(LogisticRegression):
    def __init__(self, threshold=0.01, dual=False, tol=1e-4, C=1.0,
                 fit_intercept=True, intercept_scaling=1, class_weight=None,
                 random_state=None, solver='liblinear', max_iter=100,
                 multi_class='ovr', verbose=0, warm_start=False, n_jobs=1):

        #权值相近的阈值
        self.threshold = threshold
        LogisticRegression.__init__(self, penalty='l1', dual=dual, tol=tol, C=C,
                 fit_intercept=fit_intercept, intercept_scaling=intercept_scaling, class_weight=class_weight,
                 random_state=random_state, solver=solver, max_iter=max_iter,
                 multi_class=multi_class, verbose=verbose, warm_start=warm_start, n_jobs=n_jobs)
        #使用同样的参数创建L2逻辑回归
        self.l2 = LogisticRegression(penalty='l2', dual=dual, tol=tol, C=C, fit_intercept=fit_intercept, intercept_scaling=intercept_scaling, class_weight = class_weight, random_state=random_state, solver=solver, max_iter=max_iter, multi_class=multi_class, verbose=verbose, warm_start=warm_start, n_jobs=n_jobs)

    def fit(self, X, y, sample_weight=None):
        #训练L1逻辑回归
        super(LR, self).fit(X, y, sample_weight=sample_weight)
        self.coef_old_ = self.coef_.copy()
        #训练L2逻辑回归
        self.l2.fit(X, y, sample_weight=sample_weight)

        cntOfRow, cntOfCol = self.coef_.shape
        #权值系数矩阵的行数对应目标值的种类数目
        for i in range(cntOfRow):
            for j in range(cntOfCol):
                coef = self.coef_[i][j]
                #L1逻辑回归的权值系数不为0
                if coef != 0:
                    idx = [j]
                    #对应在L2逻辑回归中的权值系数
                    coef1 = self.l2.coef_[i][j]
                    for k in range(cntOfCol):
                        coef2 = self.l2.coef_[i][k]
                        #在L2逻辑回归中,权值系数之差小于设定的阈值,且在L1中对应的权值为0
                        if abs(coef1-coef2) < self.threshold and j != k and self.coef_[i][k] == 0:
                            idx.append(k)
                    #计算这一类特征的权值系数均值
                    mean = coef / len(idx)
                    self.coef_[i][idx] = mean
        return self

使用feature_selection库的SelectFromModel类结合带L1以及L2惩罚项的逻辑回归模型,来选择特征的代码如下:

from sklearn.feature_selection import SelectFromModel

#带L1和L2惩罚项的逻辑回归作为基模型的特征选择
#参数threshold为权值系数之差的阈值
SelectFromModel(LR(threshold=0.5, C=0.1)).fit_transform(iris.data, iris.target)
array([[ 5.1,  3.5,  1.4,  0.2],
       [ 4.9,  3. ,  1.4,  0.2],
       [ 4.7,  3.2,  1.3,  0.2],
       [ 4.6,  3.1,  1.5,  0.2],
       [ 5. ,  3.6,  1.4,  0.2],
       [ 5.4,  3.9,  1.7,  0.4],
       [ 4.6,  3.4,  1.4,  0.3],
       [ 5. ,  3.4,  1.5,  0.2],
       [ 4.4,  2.9,  1.4,  0.2],
       [ 4.9,  3.1,  1.5,  0.1],
       [ 5.4,  3.7,  1.5,  0.2],
       [ 4.8,  3.4,  1.6,  0.2],
       [ 4.8,  3. ,  1.4,  0.1],
       [ 4.3,  3. ,  1.1,  0.1],
       [ 5.8,  4. ,  1.2,  0.2],
       [ 5.7,  4.4,  1.5,  0.4],
       [ 5.4,  3.9,  1.3,  0.4],
       [ 5.1,  3.5,  1.4,  0.3],
       [ 5.7,  3.8,  1.7,  0.3],
       [ 5.1,  3.8,  1.5,  0.3],
       [ 5.4,  3.4,  1.7,  0.2],
       [ 5.1,  3.7,  1.5,  0.4],
       [ 4.6,  3.6,  1. ,  0.2],
       [ 5.1,  3.3,  1.7,  0.5],
       [ 4.8,  3.4,  1.9,  0.2],
       [ 5. ,  3. ,  1.6,  0.2],
       [ 5. ,  3.4,  1.6,  0.4],
       [ 5.2,  3.5,  1.5,  0.2],
       [ 5.2,  3.4,  1.4,  0.2],
       [ 4.7,  3.2,  1.6,  0.2],
       [ 4.8,  3.1,  1.6,  0.2],
       [ 5.4,  3.4,  1.5,  0.4],
       [ 5.2,  4.1,  1.5,  0.1],
       [ 5.5,  4.2,  1.4,  0.2],
       [ 4.9,  3.1,  1.5,  0.1],
       [ 5. ,  3.2,  1.2,  0.2],
       [ 5.5,  3.5,  1.3,  0.2],
       [ 4.9,  3.1,  1.5,  0.1],
       [ 4.4,  3. ,  1.3,  0.2],
       [ 5.1,  3.4,  1.5,  0.2],
       [ 5. ,  3.5,  1.3,  0.3],
       [ 4.5,  2.3,  1.3,  0.3],
       [ 4.4,  3.2,  1.3,  0.2],
       [ 5. ,  3.5,  1.6,  0.6],
       [ 5.1,  3.8,  1.9,  0.4],
       [ 4.8,  3. ,  1.4,  0.3],
       [ 5.1,  3.8,  1.6,  0.2],
       [ 4.6,  3.2,  1.4,  0.2],
       [ 5.3,  3.7,  1.5,  0.2],
       [ 5. ,  3.3,  1.4,  0.2],
       [ 7. ,  3.2,  4.7,  1.4],
       [ 6.4,  3.2,  4.5,  1.5],
       [ 6.9,  3.1,  4.9,  1.5],
       [ 5.5,  2.3,  4. ,  1.3],
       [ 6.5,  2.8,  4.6,  1.5],
       [ 5.7,  2.8,  4.5,  1.3],
       [ 6.3,  3.3,  4.7,  1.6],
       [ 4.9,  2.4,  3.3,  1. ],
       [ 6.6,  2.9,  4.6,  1.3],
       [ 5.2,  2.7,  3.9,  1.4],
       [ 5. ,  2. ,  3.5,  1. ],
       [ 5.9,  3. ,  4.2,  1.5],
       [ 6. ,  2.2,  4. ,  1. ],
       [ 6.1,  2.9,  4.7,  1.4],
       [ 5.6,  2.9,  3.6,  1.3],
       [ 6.7,  3.1,  4.4,  1.4],
       [ 5.6,  3. ,  4.5,  1.5],
       [ 5.8,  2.7,  4.1,  1. ],
       [ 6.2,  2.2,  4.5,  1.5],
       [ 5.6,  2.5,  3.9,  1.1],
       [ 5.9,  3.2,  4.8,  1.8],
       [ 6.1,  2.8,  4. ,  1.3],
       [ 6.3,  2.5,  4.9,  1.5],
       [ 6.1,  2.8,  4.7,  1.2],
       [ 6.4,  2.9,  4.3,  1.3],
       [ 6.6,  3. ,  4.4,  1.4],
       [ 6.8,  2.8,  4.8,  1.4],
       [ 6.7,  3. ,  5. ,  1.7],
       [ 6. ,  2.9,  4.5,  1.5],
       [ 5.7,  2.6,  3.5,  1. ],
       [ 5.5,  2.4,  3.8,  1.1],
       [ 5.5,  2.4,  3.7,  1. ],
       [ 5.8,  2.7,  3.9,  1.2],
       [ 6. ,  2.7,  5.1,  1.6],
       [ 5.4,  3. ,  4.5,  1.5],
       [ 6. ,  3.4,  4.5,  1.6],
       [ 6.7,  3.1,  4.7,  1.5],
       [ 6.3,  2.3,  4.4,  1.3],
       [ 5.6,  3. ,  4.1,  1.3],
       [ 5.5,  2.5,  4. ,  1.3],
       [ 5.5,  2.6,  4.4,  1.2],
       [ 6.1,  3. ,  4.6,  1.4],
       [ 5.8,  2.6,  4. ,  1.2],
       [ 5. ,  2.3,  3.3,  1. ],
       [ 5.6,  2.7,  4.2,  1.3],
       [ 5.7,  3. ,  4.2,  1.2],
       [ 5.7,  2.9,  4.2,  1.3],
       [ 6.2,  2.9,  4.3,  1.3],
       [ 5.1,  2.5,  3. ,  1.1],
       [ 5.7,  2.8,  4.1,  1.3],
       [ 6.3,  3.3,  6. ,  2.5],
       [ 5.8,  2.7,  5.1,  1.9],
       [ 7.1,  3. ,  5.9,  2.1],
       [ 6.3,  2.9,  5.6,  1.8],
       [ 6.5,  3. ,  5.8,  2.2],
       [ 7.6,  3. ,  6.6,  2.1],
       [ 4.9,  2.5,  4.5,  1.7],
       [ 7.3,  2.9,  6.3,  1.8],
       [ 6.7,  2.5,  5.8,  1.8],
       [ 7.2,  3.6,  6.1,  2.5],
       [ 6.5,  3.2,  5.1,  2. ],
       [ 6.4,  2.7,  5.3,  1.9],
       [ 6.8,  3. ,  5.5,  2.1],
       [ 5.7,  2.5,  5. ,  2. ],
       [ 5.8,  2.8,  5.1,  2.4],
       [ 6.4,  3.2,  5.3,  2.3],
       [ 6.5,  3. ,  5.5,  1.8],
       [ 7.7,  3.8,  6.7,  2.2],
       [ 7.7,  2.6,  6.9,  2.3],
       [ 6. ,  2.2,  5. ,  1.5],
       [ 6.9,  3.2,  5.7,  2.3],
       [ 5.6,  2.8,  4.9,  2. ],
       [ 7.7,  2.8,  6.7,  2. ],
       [ 6.3,  2.7,  4.9,  1.8],
       [ 6.7,  3.3,  5.7,  2.1],
       [ 7.2,  3.2,  6. ,  1.8],
       [ 6.2,  2.8,  4.8,  1.8],
       [ 6.1,  3. ,  4.9,  1.8],
       [ 6.4,  2.8,  5.6,  2.1],
       [ 7.2,  3. ,  5.8,  1.6],
       [ 7.4,  2.8,  6.1,  1.9],
       [ 7.9,  3.8,  6.4,  2. ],
       [ 6.4,  2.8,  5.6,  2.2],
       [ 6.3,  2.8,  5.1,  1.5],
       [ 6.1,  2.6,  5.6,  1.4],
       [ 7.7,  3. ,  6.1,  2.3],
       [ 6.3,  3.4,  5.6,  2.4],
       [ 6.4,  3.1,  5.5,  1.8],
       [ 6. ,  3. ,  4.8,  1.8],
       [ 6.9,  3.1,  5.4,  2.1],
       [ 6.7,  3.1,  5.6,  2.4],
       [ 6.9,  3.1,  5.1,  2.3],
       [ 5.8,  2.7,  5.1,  1.9],
       [ 6.8,  3.2,  5.9,  2.3],
       [ 6.7,  3.3,  5.7,  2.5],
       [ 6.7,  3. ,  5.2,  2.3],
       [ 6.3,  2.5,  5. ,  1.9],
       [ 6.5,  3. ,  5.2,  2. ],
       [ 6.2,  3.4,  5.4,  2.3],
       [ 5.9,  3. ,  5.1,  1.8]])

基于树模型的特征选择法

树模型中GBDT也可用来作为基模型进行特征选择,使用feature_selection库的SelectFromModel类结合GBDT模型,来选择特征的代码如下:

from sklearn.feature_selection import SelectFromModel
from sklearn.ensemble import GradientBoostingClassifier

#GBDT作为基模型的特征选择
SelectFromModel(GradientBoostingClassifier()).fit_transform(iris.data, iris.target)
array([[ 1.4,  0.2],
       [ 1.4,  0.2],
       [ 1.3,  0.2],
       [ 1.5,  0.2],
       [ 1.4,  0.2],
       [ 1.7,  0.4],
       [ 1.4,  0.3],
       [ 1.5,  0.2],
       [ 1.4,  0.2],
       [ 1.5,  0.1],
       [ 1.5,  0.2],
       [ 1.6,  0.2],
       [ 1.4,  0.1],
       [ 1.1,  0.1],
       [ 1.2,  0.2],
       [ 1.5,  0.4],
       [ 1.3,  0.4],
       [ 1.4,  0.3],
       [ 1.7,  0.3],
       [ 1.5,  0.3],
       [ 1.7,  0.2],
       [ 1.5,  0.4],
       [ 1. ,  0.2],
       [ 1.7,  0.5],
       [ 1.9,  0.2],
       [ 1.6,  0.2],
       [ 1.6,  0.4],
       [ 1.5,  0.2],
       [ 1.4,  0.2],
       [ 1.6,  0.2],
       [ 1.6,  0.2],
       [ 1.5,  0.4],
       [ 1.5,  0.1],
       [ 1.4,  0.2],
       [ 1.5,  0.1],
       [ 1.2,  0.2],
       [ 1.3,  0.2],
       [ 1.5,  0.1],
       [ 1.3,  0.2],
       [ 1.5,  0.2],
       [ 1.3,  0.3],
       [ 1.3,  0.3],
       [ 1.3,  0.2],
       [ 1.6,  0.6],
       [ 1.9,  0.4],
       [ 1.4,  0.3],
       [ 1.6,  0.2],
       [ 1.4,  0.2],
       [ 1.5,  0.2],
       [ 1.4,  0.2],
       [ 4.7,  1.4],
       [ 4.5,  1.5],
       [ 4.9,  1.5],
       [ 4. ,  1.3],
       [ 4.6,  1.5],
       [ 4.5,  1.3],
       [ 4.7,  1.6],
       [ 3.3,  1. ],
       [ 4.6,  1.3],
       [ 3.9,  1.4],
       [ 3.5,  1. ],
       [ 4.2,  1.5],
       [ 4. ,  1. ],
       [ 4.7,  1.4],
       [ 3.6,  1.3],
       [ 4.4,  1.4],
       [ 4.5,  1.5],
       [ 4.1,  1. ],
       [ 4.5,  1.5],
       [ 3.9,  1.1],
       [ 4.8,  1.8],
       [ 4. ,  1.3],
       [ 4.9,  1.5],
       [ 4.7,  1.2],
       [ 4.3,  1.3],
       [ 4.4,  1.4],
       [ 4.8,  1.4],
       [ 5. ,  1.7],
       [ 4.5,  1.5],
       [ 3.5,  1. ],
       [ 3.8,  1.1],
       [ 3.7,  1. ],
       [ 3.9,  1.2],
       [ 5.1,  1.6],
       [ 4.5,  1.5],
       [ 4.5,  1.6],
       [ 4.7,  1.5],
       [ 4.4,  1.3],
       [ 4.1,  1.3],
       [ 4. ,  1.3],
       [ 4.4,  1.2],
       [ 4.6,  1.4],
       [ 4. ,  1.2],
       [ 3.3,  1. ],
       [ 4.2,  1.3],
       [ 4.2,  1.2],
       [ 4.2,  1.3],
       [ 4.3,  1.3],
       [ 3. ,  1.1],
       [ 4.1,  1.3],
       [ 6. ,  2.5],
       [ 5.1,  1.9],
       [ 5.9,  2.1],
       [ 5.6,  1.8],
       [ 5.8,  2.2],
       [ 6.6,  2.1],
       [ 4.5,  1.7],
       [ 6.3,  1.8],
       [ 5.8,  1.8],
       [ 6.1,  2.5],
       [ 5.1,  2. ],
       [ 5.3,  1.9],
       [ 5.5,  2.1],
       [ 5. ,  2. ],
       [ 5.1,  2.4],
       [ 5.3,  2.3],
       [ 5.5,  1.8],
       [ 6.7,  2.2],
       [ 6.9,  2.3],
       [ 5. ,  1.5],
       [ 5.7,  2.3],
       [ 4.9,  2. ],
       [ 6.7,  2. ],
       [ 4.9,  1.8],
       [ 5.7,  2.1],
       [ 6. ,  1.8],
       [ 4.8,  1.8],
       [ 4.9,  1.8],
       [ 5.6,  2.1],
       [ 5.8,  1.6],
       [ 6.1,  1.9],
       [ 6.4,  2. ],
       [ 5.6,  2.2],
       [ 5.1,  1.5],
       [ 5.6,  1.4],
       [ 6.1,  2.3],
       [ 5.6,  2.4],
       [ 5.5,  1.8],
       [ 4.8,  1.8],
       [ 5.4,  2.1],
       [ 5.6,  2.4],
       [ 5.1,  2.3],
       [ 5.1,  1.9],
       [ 5.9,  2.3],
       [ 5.7,  2.5],
       [ 5.2,  2.3],
       [ 5. ,  1.9],
       [ 5.2,  2. ],
       [ 5.4,  2.3],
       [ 5.1,  1.8]])

降维

PCA

使用decomposition库的PCA类选择特征的代码如下:

from sklearn.decomposition import PCA

#主成分分析法,返回降维后的数据
#参数n_components为主成分数目
PCA(n_components=2).fit_transform(iris.data)
array([[-2.68420713,  0.32660731],
       [-2.71539062, -0.16955685],
       [-2.88981954, -0.13734561],
       [-2.7464372 , -0.31112432],
       [-2.72859298,  0.33392456],
       [-2.27989736,  0.74778271],
       [-2.82089068, -0.08210451],
       [-2.62648199,  0.17040535],
       [-2.88795857, -0.57079803],
       [-2.67384469, -0.1066917 ],
       [-2.50652679,  0.65193501],
       [-2.61314272,  0.02152063],
       [-2.78743398, -0.22774019],
       [-3.22520045, -0.50327991],
       [-2.64354322,  1.1861949 ],
       [-2.38386932,  1.34475434],
       [-2.6225262 ,  0.81808967],
       [-2.64832273,  0.31913667],
       [-2.19907796,  0.87924409],
       [-2.58734619,  0.52047364],
       [-2.3105317 ,  0.39786782],
       [-2.54323491,  0.44003175],
       [-3.21585769,  0.14161557],
       [-2.30312854,  0.10552268],
       [-2.35617109, -0.03120959],
       [-2.50791723, -0.13905634],
       [-2.469056  ,  0.13788731],
       [-2.56239095,  0.37468456],
       [-2.63982127,  0.31929007],
       [-2.63284791, -0.19007583],
       [-2.58846205, -0.19739308],
       [-2.41007734,  0.41808001],
       [-2.64763667,  0.81998263],
       [-2.59715948,  1.10002193],
       [-2.67384469, -0.1066917 ],
       [-2.86699985,  0.0771931 ],
       [-2.62522846,  0.60680001],
       [-2.67384469, -0.1066917 ],
       [-2.98184266, -0.48025005],
       [-2.59032303,  0.23605934],
       [-2.77013891,  0.27105942],
       [-2.85221108, -0.93286537],
       [-2.99829644, -0.33430757],
       [-2.4055141 ,  0.19591726],
       [-2.20883295,  0.44269603],
       [-2.71566519, -0.24268148],
       [-2.53757337,  0.51036755],
       [-2.8403213 , -0.22057634],
       [-2.54268576,  0.58628103],
       [-2.70391231,  0.11501085],
       [ 1.28479459,  0.68543919],
       [ 0.93241075,  0.31919809],
       [ 1.46406132,  0.50418983],
       [ 0.18096721, -0.82560394],
       [ 1.08713449,  0.07539039],
       [ 0.64043675, -0.41732348],
       [ 1.09522371,  0.28389121],
       [-0.75146714, -1.00110751],
       [ 1.04329778,  0.22895691],
       [-0.01019007, -0.72057487],
       [-0.5110862 , -1.26249195],
       [ 0.51109806, -0.10228411],
       [ 0.26233576, -0.5478933 ],
       [ 0.98404455, -0.12436042],
       [-0.174864  , -0.25181557],
       [ 0.92757294,  0.46823621],
       [ 0.65959279, -0.35197629],
       [ 0.23454059, -0.33192183],
       [ 0.94236171, -0.54182226],
       [ 0.0432464 , -0.58148945],
       [ 1.11624072, -0.08421401],
       [ 0.35678657, -0.06682383],
       [ 1.29646885, -0.32756152],
       [ 0.92050265, -0.18239036],
       [ 0.71400821,  0.15037915],
       [ 0.89964086,  0.32961098],
       [ 1.33104142,  0.24466952],
       [ 1.55739627,  0.26739258],
       [ 0.81245555, -0.16233157],
       [-0.30733476, -0.36508661],
       [-0.07034289, -0.70253793],
       [-0.19188449, -0.67749054],
       [ 0.13499495, -0.31170964],
       [ 1.37873698, -0.42120514],
       [ 0.58727485, -0.48328427],
       [ 0.8072055 ,  0.19505396],
       [ 1.22042897,  0.40803534],
       [ 0.81286779, -0.370679  ],
       [ 0.24519516, -0.26672804],
       [ 0.16451343, -0.67966147],
       [ 0.46303099, -0.66952655],
       [ 0.89016045, -0.03381244],
       [ 0.22887905, -0.40225762],
       [-0.70708128, -1.00842476],
       [ 0.35553304, -0.50321849],
       [ 0.33112695, -0.21118014],
       [ 0.37523823, -0.29162202],
       [ 0.64169028,  0.01907118],
       [-0.90846333, -0.75156873],
       [ 0.29780791, -0.34701652],
       [ 2.53172698, -0.01184224],
       [ 1.41407223, -0.57492506],
       [ 2.61648461,  0.34193529],
       [ 1.97081495, -0.18112569],
       [ 2.34975798, -0.04188255],
       [ 3.39687992,  0.54716805],
       [ 0.51938325, -1.19135169],
       [ 2.9320051 ,  0.35237701],
       [ 2.31967279, -0.24554817],
       [ 2.91813423,  0.78038063],
       [ 1.66193495,  0.2420384 ],
       [ 1.80234045, -0.21615461],
       [ 2.16537886,  0.21528028],
       [ 1.34459422, -0.77641543],
       [ 1.5852673 , -0.53930705],
       [ 1.90474358,  0.11881899],
       [ 1.94924878,  0.04073026],
       [ 3.48876538,  1.17154454],
       [ 3.79468686,  0.25326557],
       [ 1.29832982, -0.76101394],
       [ 2.42816726,  0.37678197],
       [ 1.19809737, -0.60557896],
       [ 3.49926548,  0.45677347],
       [ 1.38766825, -0.20403099],
       [ 2.27585365,  0.33338653],
       [ 2.61419383,  0.55836695],
       [ 1.25762518, -0.179137  ],
       [ 1.29066965, -0.11642525],
       [ 2.12285398, -0.21085488],
       [ 2.3875644 ,  0.46251925],
       [ 2.84096093,  0.37274259],
       [ 3.2323429 ,  1.37052404],
       [ 2.15873837, -0.21832553],
       [ 1.4431026 , -0.14380129],
       [ 1.77964011, -0.50146479],
       [ 3.07652162,  0.68576444],
       [ 2.14498686,  0.13890661],
       [ 1.90486293,  0.04804751],
       [ 1.16885347, -0.1645025 ],
       [ 2.10765373,  0.37148225],
       [ 2.31430339,  0.18260885],
       [ 1.92245088,  0.40927118],
       [ 1.41407223, -0.57492506],
       [ 2.56332271,  0.2759745 ],
       [ 2.41939122,  0.30350394],
       [ 1.94401705,  0.18741522],
       [ 1.52566363, -0.37502085],
       [ 1.76404594,  0.07851919],
       [ 1.90162908,  0.11587675],
       [ 1.38966613, -0.28288671]])

线性判别分析法(LDA)

使用lda库的LDA类选择特征的代码如下:

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA

#线性判别分析法,返回降维后的数据
#参数n_components为降维后的维数
LDA(n_components=2).fit_transform(iris.data, iris.target)
array([[ 8.0849532 ,  0.32845422],
       [ 7.1471629 , -0.75547326],
       [ 7.51137789, -0.23807832],
       [ 6.83767561, -0.64288476],
       [ 8.15781367,  0.54063935],
       [ 7.72363087,  1.48232345],
       [ 7.23514662,  0.3771537 ],
       [ 7.62974497,  0.01667246],
       [ 6.58274132, -0.98737424],
       [ 7.36884116, -0.91362729],
       [ 8.42181434,  0.67622968],
       [ 7.24739721, -0.08292417],
       [ 7.35062105, -1.0393597 ],
       [ 7.59646896, -0.77671553],
       [ 9.86936588,  1.61486093],
       [ 9.18033614,  2.75558626],
       [ 8.59760709,  1.85442217],
       [ 7.7995682 ,  0.60905468],
       [ 8.1000091 ,  0.99610981],
       [ 8.04543611,  1.16244332],
       [ 7.52046427, -0.156233  ],
       [ 7.60526378,  1.22757267],
       [ 8.70408249,  0.89959416],
       [ 6.26374139,  0.46023935],
       [ 6.59191505, -0.36199821],
       [ 6.79210164, -0.93823664],
       [ 6.84048091,  0.4848487 ],
       [ 7.948386  ,  0.23871551],
       [ 8.01209273,  0.11626909],
       [ 6.85589572, -0.51715236],
       [ 6.78303525, -0.72933749],
       [ 7.38668238,  0.59101728],
       [ 9.16249492,  1.25094169],
       [ 9.49617185,  1.84989586],
       [ 7.36884116, -0.91362729],
       [ 7.9756525 , -0.13519572],
       [ 8.63115466,  0.4346228 ],
       [ 7.36884116, -0.91362729],
       [ 6.95602269, -0.67887846],
       [ 7.71167183,  0.01995843],
       [ 7.9361354 ,  0.69879338],
       [ 5.6690533 , -1.90328976],
       [ 7.26559733, -0.24793625],
       [ 6.42449823,  1.26152073],
       [ 6.88607488,  1.07094506],
       [ 6.77985104, -0.47815878],
       [ 8.11232705,  0.78881818],
       [ 7.21095698, -0.33438897],
       [ 8.33988749,  0.6729437 ],
       [ 7.69345171, -0.10577397],
       [-1.45772244,  0.04186554],
       [-1.79768044,  0.48879951],
       [-2.41680973, -0.08234044],
       [-2.26486771, -1.57609174],
       [-2.55339693, -0.46282362],
       [-2.41954768, -0.95728766],
       [-2.44719309,  0.79553574],
       [-0.2160281 , -1.57096512],
       [-1.74591275, -0.80526746],
       [-1.95838993, -0.35044011],
       [-1.19023864, -2.61561292],
       [-1.86140718,  0.32050146],
       [-1.15386577, -2.61693435],
       [-2.65942607, -0.63412155],
       [-0.38024071,  0.09211958],
       [-1.20280815,  0.09561055],
       [-2.7626699 ,  0.03156949],
       [-0.76227692, -1.63917546],
       [-3.50940735, -1.6724835 ],
       [-1.08410216, -1.6100398 ],
       [-3.71895188,  1.03509697],
       [-0.99937   , -0.47902036],
       [-3.83709476, -1.39488292],
       [-2.24344339, -1.41079358],
       [-1.25428429, -0.53276537],
       [-1.43952232, -0.12314653],
       [-2.45921948, -0.91961551],
       [-3.52471481,  0.16379275],
       [-2.58974981, -0.17075771],
       [ 0.31197324, -1.29978446],
       [-1.10232227, -1.7357722 ],
       [-0.59844322, -1.92334798],
       [-0.89605882, -0.89192518],
       [-4.49567379, -0.87924754],
       [-2.9265236 ,  0.02499754],
       [-2.10119821,  1.18719828],
       [-2.14367532,  0.09713697],
       [-2.48342912, -1.92190266],
       [-1.31792367, -0.15753271],
       [-1.95529307, -1.14514953],
       [-2.38909697, -1.5823776 ],
       [-2.28614469, -0.32562577],
       [-1.26934019, -1.20042096],
       [-0.28888857, -1.78315025],
       [-2.00077969, -0.8969707 ],
       [-1.16910587, -0.52787187],
       [-1.6092782 , -0.46274252],
       [-1.41813799, -0.53933732],
       [ 0.47271009, -0.78924756],
       [-1.54557146, -0.58518894],
       [-7.85608083,  2.11161905],
       [-5.5156825 , -0.04401811],
       [-6.30499392,  0.46211638],
       [-5.60355888, -0.34236987],
       [-6.86344597,  0.81602566],
       [-7.42481805, -0.1726265 ],
       [-4.68086447, -0.50758694],
       [-6.31374875, -0.96068288],
       [-6.33198886, -1.37715975],
       [-6.87287126,  2.69458147],
       [-4.45364294,  1.33693971],
       [-5.4611095 , -0.21035161],
       [-5.67679825,  0.82435717],
       [-5.97407494, -0.10462115],
       [-6.78782019,  1.5744553 ],
       [-5.82871291,  1.98940576],
       [-5.0664238 , -0.02730214],
       [-6.60847169,  1.7420041 ],
       [-9.18829265, -0.74909806],
       [-4.76573133, -2.14417884],
       [-6.29305487,  1.63373692],
       [-5.37314577,  0.63153087],
       [-7.58557489, -0.97390788],
       [-4.38367513, -0.12213933],
       [-5.73135125,  1.28143515],
       [-5.27583147, -0.0384815 ],
       [-4.0923206 ,  0.18307048],
       [-4.08316687,  0.51770204],
       [-6.53257435,  0.28724638],
       [-4.577648  , -0.84457527],
       [-6.23500611, -0.70621819],
       [-5.21836582,  1.46644917],
       [-6.81795935,  0.56784684],
       [-3.80972091, -0.93451896],
       [-5.09023453, -2.11775698],
       [-6.82119092,  0.85698379],
       [-6.54193229,  2.41858841],
       [-4.99356333,  0.18488299],
       [-3.94659967,  0.60744074],
       [-5.22159002,  1.13613893],
       [-6.67858684,  1.785319  ],
       [-5.13687786,  1.97641389],
       [-5.5156825 , -0.04401811],
       [-6.81196984,  1.44440158],
       [-6.87289126,  2.40383699],
       [-5.67401294,  1.66134615],
       [-5.19712883, -0.36550576],
       [-4.98171163,  0.81297282],
       [-5.90148603,  2.32075134],
       [-4.68400868,  0.32508073]])

最后更新于

这有帮助吗?