🦖
Wii
  • 原码补码反码
  • Archive
    • Job
      • Learn
      • 算法
      • Company
        • HundunDaxue
      • Company
      • 基础
        • 原码补码反码
      • 项目经验
      • require
    • Hobbies
      • Physics
        • 上帝粒子
        • 概述
        • 时间
      • Movie
        • MovieList
      • Psychology
        • Psychology
        • Chenli
          • ChenliLivingRoom
      • Philosophy
        • Philosophy
        • Conceptions
        • 导言
      • Travel
        • City
          • 昆明
          • 沈阳
        • articles-check-list
      • Sports
        • Swimming
        • Skiing
      • Earth
        • Ocean
          • Biology
      • Read
        • BookList
        • 道德经
        • BookToRead
      • Music
        • sort
      • PickUp
        • SoldiersSortie
    • Care
      • Illness
        • cold
        • 腹泻
        • acne
        • EmotionalControl
        • 咽炎
        • Anemia
      • Foods
        • 破壁机
          • 食谱
      • I
      • WishList
      • WithL
        • MF
          • LY
    • Wfw
      • QA
    • Mac
      • Brew
        • 软件安装目录
      • Usage
        • RunScriptAsRootOnBoot
        • Mac-Config
      • 制作启动U盘
      • Software
        • IntelliJIDEALicenseServer
          • run-license-server
    • PlantUML
      • plantuml
    • Windows
      • Windows常用命令
      • PowerShell
        • powershell 命令
      • Cmder
      • MTP服务驱动无法安装
    • English
      • 英语阅读2016-07-29
      • 英语阅读2016-08-11
    • Tools
      • Plantuml
        • Setup
      • Eclipse
        • Eclipse
        • Eclipse常见问题
      • CommonHotkey
      • Jetbrain
        • JetbrainIDEs
      • VSCode
      • SublimeText
        • 格式化代码
    • I
      • WHATIAM
    • Device
      • Netgear
        • Astrill
      • RPi
        • Hardware
    • AwesomeSoftware
    • RESTful
    • Course
      • 自然辩证法
        • 我国在生态文明建设中存在的困境及解决对策
        • 工程师应该具有的基本道德素养
        • 科学文化与人文文化的关系
      • 英语写作
        • Description
      • 分布式系统
        • 分布式系统概论
      • 英语口语
        • 辩论赛
    • CloudLib
      • 推荐0.1
    • Project
      • README
        • emq
          • Emq架构
        • 启动
        • Hikvision
          • TimeSetting
    • Efficient
    • Neu
      • IpCamera
        • live
        • ffmpeg
    • Matlab
      • Matlab 2016b 破解
    • SchoolWork
      • 学术道德与学术规范
    • git-push
  • Coding
    • Design Pattern
      • 设计模式笔记_四_工厂模式
      • 设计模式笔记_六_命令模式
      • 设计模式笔记_三_装饰者模式
      • 设计模式入门
      • 设计模式笔记_八_模板方法模式
      • 设计模式笔记_一_策略模式
      • 设计模式笔记_五_单件模式
      • 设计模式笔记_七_适配器模式与外观模式
      • 设计模式笔记_二_观察者模式
      • 设计模式笔记_十_状态模式
      • 设计模式笔记_九_迭代器与组合模式
    • C++
      • Notes
        • Practice
          • Logger
        • Thread
          • PosixThreadPrograming
          • ThreadNote
        • Features
          • FuturePromise
          • Lambda
        • STL
          • STLPractice
          • 迭代器
          • UnorderedMapSet
          • Containor
          • STL
          • Vector
        • CMake
          • Startup
          • CMakeExample
          • CMake
          • CMakeUsage
          • CMakeKnowledge
        • Mutex
        • Gdb
          • Gdb
        • LanguageNotes
          • Pointer
          • String
          • Functions
          • 友元
          • IO
          • OOP
          • Exceptions
          • Basic
          • 初始化
          • Random
          • 模板函数
        • Glog
          • glog
        • Thrift
          • Thrift
        • Valgrind
          • valgrind
        • 动态库 & 静态库
        • BookNotes
          • AboutC++
        • LRvalues
      • map
      • protobuf
      • Build
      • Seastar
        • Notes
          • std::move
          • Introduce
          • Install
            • BuildAndInstall
          • Steps
          • cmd
      • Tricks
      • Map
      • CommonOperation
      • FreqAlgorithm
    • Tools
      • Git
        • GitExamples
        • GitUsage
        • GitKnowledge
        • GitIgnoreExample
        • DeleteBigFileFromHistory
      • Vim
        • VimTips
        • 安装
        • Vim-Usage
        • Plugins
        • Vim-Config
      • SVN
        • svn服务器搭建
        • svn
    • Scala
      • Notes
        • Scala-模式匹配
        • Scala: 隐式
        • Scala-符号语法
        • Scala-函数
        • Scala面向对象编程
        • Scala 函数式编程
        • Scala:zipWithIndex
        • Future
        • Scala-语法
        • Scala-基础
      • DateTime
      • 规范
    • Python
      • Notes
        • BookNotes
          • 生成器
          • 垃圾回收机制
          • 数据结构
          • 数据类型
          • RegularExpression
          • 迭代器
          • NetworkProgramming
          • 函数式编程
          • 上下文管理器
          • PythonDataModel
          • 运算符
          • 魔术方法
          • 面向对象
          • 装饰器
          • 模块
          • MultithreadProgramming
          • 异常
          • 函数
        • Modules
          • stack
          • Datetime
          • shutil:文件操作
          • logging
          • urllib
          • Re
          • 容器数据类型
          • TypeError
          • str
          • queue
          • urllib-and-requests
          • Exception
          • path
          • os
        • Others
          • PythonSerialization
          • Python函数的docstring
          • PIL
          • type-cast
          • operations
          • Python-类
          • 组及命名组匹配
          • Package
          • jieba分词
          • logging模块
          • Python
          • print
        • Examples
          • 文件读取写入
          • 命名
          • 递归更改文件为windows合法名称
          • 定制命令行运行方式
          • Python处理Excel文件(xlsx文件格式)
          • 读取ini配置文件
          • tor代理
          • 添加父目录到Path
        • CommonTips
        • CodingStandards
          • python注释
          • PEP-8
      • Django
        • DjangoDocs
          • making queries
          • 设置media路径
          • models
          • manage.py使用
          • template
          • view
          • forms
          • setting.py 文件配置说明
          • nginx-deploy
          • 使用pymysql
          • 自定义tags和filters
          • admin-interface
        • DjangoRestFramework
          • Customer Permissions
          • Serializers
          • FileField绝对路径问题
          • DjangoRestFrameworkNotes
          • ViewSet
        • DjangoNotes
          • Model对象转化为Dict
          • QuerySet
      • Scrapy
        • Scrapy
        • Spider
        • Scrapy安装出错
        • Selector
        • Scrapy模拟登陆
      • Job
        • 字典
      • Pandas
        • pandas
        • PandasExamples
      • VirtualEnv
        • virtualenv
      • Numpy
        • NumpyUsage
        • numpy
      • Matplotlib
        • MatplotlibNotes
        • MatplotlibUsage
      • Database
        • 获取表字段
      • Pip
        • 更改源
      • Scipy
        • scipy
    • Web
      • 插件
        • bootstrap-table
          • bootstrap-table
        • bootstrap
          • 模态框
        • requirejs
          • requirejs
        • toastr
          • toastor
      • Koa
        • Notes
          • KoaNotes
      • SCSS
        • 常用标签
        • Watch
      • Vue
        • Notes
          • 路由
          • 参考
          • 组件
          • Plugins
          • Vuex
          • StartUp
      • 样式
      • CSS
        • CSS
      • 排版
      • Notes
        • 跨域访问
      • Hexo
        • HexoUsage
      • Nodejs
        • Koa
          • jest
          • ParamValidate
        • 仓库镜像
      • Express
        • Express
        • Jade
          • Jade
      • Canvas
        • Canvas
    • Basic
      • Data Structure
        • Heap
          • Heap
        • Tree
          • Tree
        • Benchmark
          • map
      • Boolean
      • MultithreadProgramming
      • Software Engineering
        • UML
          • UML
      • OOP
      • 介绍
    • Antlr
      • Example
      • Grammar
      • Antlr
    • Java
      • Library
        • MyBatis
          • generator
            • mybatis配置详解
          • mybatis-获取自增ID
          • mybatis
          • problems
        • log4j
          • Usage
      • Maven
        • MavenUsage
        • Maven
        • MavenProject
        • 项目RUL路径问题
        • MavenPom
        • Settings
        • PomCommon
        • PomExample
      • Notes
        • Features
          • Reflect
          • Java函数式编程
          • toMap
          • Closeable & AutoCloseable & Flushable
          • Annotations
        • Common
        • ThinkingInJava
          • 控制执行流程
          • 接口
          • 复用类
          • 内部类
          • 操作符
          • 访问权限控制
          • 一切都是对象
          • 多态
          • 初始化与清理
          • 对象导论
        • SwordToOffer
        • Network
        • Thread
          • ThreadPool
        • Basic Library
        • Collections
          • List Interface
        • CommandLine
        • Project Common
        • JavaLang
      • JVM
        • Monitor
          • Jmap
          • mat
          • Jstat
          • Monitor
        • Notes
          • JVM
        • GC
          • GC
          • Shenandoah
            • Shenandoah
        • JVM
    • Algorithm
      • Code
        • LeetCode
          • Python
            • 0000-0050
              • 0005
              • 0030
        • SwordToOffer
          • SwordToOffer
      • AlgorithmSummary
      • Classics
        • string
          • KMP
        • Other
          • FullPermutation
        • 链表
        • Sort
          • Sort
      • Other
        • README
      • Notes
        • Math
          • 两点计算直线方程
    • Go
      • Notes
        • Go Project Layout
        • Install
        • Startup
      • Basic
        • Startup
        • Types
    • JavaScript
      • MasonryLayouts
      • jquery
      • Notes
        • Promise
      • js
    • Android
      • SDK
        • 打开SDK Manager
    • C#
      • WebBrowser
      • c#图片
      • 跨线程访问控件
    • Knowledge
      • 函数式编程
      • 设计框架
    • Rules
      • Rules
    • React
      • ReactNative
        • React Native Navigation
        • 打包Apk
        • ReactNative
      • React
        • README
    • RegExp
      • 正则表达式
    • WeChatApp
      • 登陆
    • Node
      • Notes
        • StartUp
  • Computer Science
    • ICS Security
      • 工控网络
      • 工业控制系统
      • HoneyPot
        • 蜜罐软件
          • Honeyd
      • 工业以太网
      • CNVD
        • 环境及依赖
      • 现有蜜罐系统及工具
      • 工控系统安全措施
      • 蜜网
      • 蜜罐
      • 工控安全相关概念
    • Data Analysis
      • Data Mining
        • Notes
          • Data_Preprocessing
          • 数据预处理
          • 认识数据
          • Mining_Modeling
          • 数据探索
          • Python_Data_Mining_Functions
          • Python数据分析平台搭建
          • Reference_Books
          • 数据分析与挖掘基础
        • Jupyter
          • show
            • mean
      • Hadoop
        • Hadoop权威指南:数据完整性
        • Hadoop权威指南: I/O操作序 - 列化
        • Hadoop权威指南-从Hadoop URL读取数据
        • Hadoop权威指南:FSDataInputStream对象
        • HDFS常用命令
        • Hadoop权威指南:HDFS-数据流
        • Hadoop权威指南:通过distcp并行复制
        • Hadoop权威指南:压缩
        • Hadoop权威指南:HDFS-Hadoop存档
        • 解决使用Idea/Eclipse编写Hadoop程序包依赖问题
        • HDFS
        • Hadoop-命令
        • 简单javaHadoop应用程序从打包到提交运行
        • Hadoop权威指南:HDFS-写入数据
        • Hadoop权威指南:HDFS-目录,查询文件系统,删除文件
        • HadoopInputFormat-OutputFormart
        • Hadoop-HDFS命令行接口
        • Hadoop权威指南-MapReduce应用开发
        • Linux下使用javac编译Hadoop程序
        • Hadoop权威指南:通过FileSystem API读取数据
        • Hadoop专有数据类型
      • Spark
        • Spark计算模型
        • Spark-入门二
        • 安装Hadoop及Spark(Ubuntu 16.04)
        • Spark:核心概念简介
        • Spark:控制日志输入
        • Spark - RDD编程
        • Spark工作机制
        • Spark-一个独立应用
        • Spark
        • Spark:使用Spark Shell的两个示例
    • Linux
      • Notes
        • BuildInCommand
          • ls
          • ip
          • ftp
          • 目录栈操作
          • scp
          • expect
            • expect示例
            • expect手册
            • expect笔记
          • ps
          • vsftpd
          • wget
          • 压缩程序
            • zip_unzip
            • tar
            • p7zip
          • 部署web服务
          • avidemux
          • cat
          • Awk
          • find
          • pssh使用
          • grep
          • sed
          • 路径
          • 通用命令
          • 安装JDK
          • 进程管理
          • network
          • rsync
          • cron
          • 示例
          • 用户管理
          • supervisor
        • Common
        • TestFileProcess
          • 替换文件内容
        • Commonds
        • Permissions
      • Ubuntu
        • Ubuntu 服务器配置部署
        • Ubuntu笔记
        • Ubuntu网络配置
        • Ubuntu 16.04 几个国内源
      • Script
        • ShellProgramming
        • ShellExamples
        • ShellCommands
      • CentOS
        • Centos笔记
        • 源
        • CentOS-Network-Config
        • CentOS-Security
      • Squid
        • BuildByDocker
        • Squid
      • Problem
        • 常见错误
      • Linux
        • Linux-c-cpp
        • Linux
        • Linux-NetworkProgramming
      • Codes
        • cron-test-01
      • Software
        • Shortcut key
        • Anaconda
      • Make
        • tricks
      • Deepin
        • 安装docker
      • SRE
        • CommonCommand
    • Cloud Computing
      • OpenStack
        • Fuel离线安装OpenStack
        • 验证网络
        • OpenStackNotes
    • Network
      • TCP/IP
      • 套接字
      • OSI模型
    • Data mining
      • StartUp
    • Machine Learning
      • Notes
        • 决策树学习-周志华
        • 神经网络-周志华
        • 概念学习和一般到特殊序
        • MachineLearningProblems
        • Math
          • 概率论与数理统计
          • 数学概念
          • KKT条件
          • 最优化问题
          • 优化算法
          • 最小二乘
        • 模型评估与选择
        • 引言
        • 过拟合处理
        • StatisticalLearningMethod
          • 统计学习方法概论
          • 感知机
        • 评估假设
        • Code
          • FeatureEngineering
            • Iris
        • 概念
        • SVM
        • FeatureEngineering
        • 神经网络
        • 决策树学习
        • MachineLearningKnowledge
        • 线性模型
        • 术语概念
        • 拉格朗日乘子法
      • route
      • Jupyter
        • JupyterUsage
      • Anaconda
        • AnacondaUsage
      • Coursera
        • Week01
      • ScikitLearn
        • FitTransform
        • Preprocessing
      • Octave
        • Octave
    • Search
      • Lucene
        • Api
        • Concepts
    • Virtualization Tech
      • Docker
        • dockerNetwork
        • Ubuntu
        • DockerUsage
        • Mac OS
    • Database
      • MySQL
        • Mysql Cluster
        • mysql-cluster
        • mysql
      • 部署phpmyadmin
      • SQL
      • SQL
        • SQLStatement
    • Concepts
      • Other
      • Mohout
      • LDA
    • Distributed System
      • Concepts
        • TODO
        • TODO
    • Recommend System
      • DataPipline
        • DataBus
        • 系统
    • OS
      • OS-Code
      • Notes
        • Introduce
        • ProcessManagement
        • Kernel
    • Deep Learning
      • Code
        • README
      • Notes
        • Conceptions
        • 神经网络
        • LeNet5
        • CNN
      • Tensorflow
        • Notes
          • Tensorflow
          • tensorflow开始
        • Anaconda
    • Media
      • FFmpeg
        • LiveStream
          • run
    • Spider
      • Selenium
        • Selenium
    • IoT
      • emq
        • Authentication
    • Big Data
      • Hadoop
        • MR 作业
  • Architecture
    • Storage
      • Mongodb
        • Mongodb
        • Failed to unlink socket file
      • Pegasus
        • Pegasus
        • ShellTools
      • Rocksdb
        • RocksJava
        • RocksDB
        • 本地缓存
      • Redis
        • Install
        • RedisUsage
      • 基本要素
      • HBase
        • HBase
    • MQ
      • Kafka
        • VersionCompare
        • Deploy
        • cppkafka
        • CommandLineTools
        • OffsetManage
        • Attentions
        • Notes
        • QA
    • Framework
      • Java
        • Dubbo
          • Annotation
          • 简介
        • Spring
          • SpringTest
          • 常见错误
          • TransactionRollback
          • FileUpload
          • SpringMVCNote
          • IoC
          • Start
          • Notes
            • Config
          • Spring
          • springmvc
          • Modules
        • Rose
          • Get request body
        • Netty
          • Netty
        • SpringBoot
          • SpringBoot
      • Esper
        • Documents
          • Keyed Segmented Context
          • 01 - Getting Started
          • 02 - Event Representations
          • 03 - Processing Modal
      • Swagger
        • Swagger
    • RPC
      • Thrift
        • Notes
          • Set fields
          • ThriftTyps
        • BuildFromSource
        • ThriftUsage
        • Install
      • ProtocolBuffer
        • 减少内存拷贝
        • UsageExample
        • Arena
        • ProtocolBuffer
    • Distribution
      • CAP
    • Streaming
      • Spark
        • Init
      • MapReduce
      • Spark
    • Nginx
      • Nginx knowledge
      • nginx-configurtion
      • 403
      • nginx解析php文件时502
    • Governance
      • Consul
      • Zookeeper
        • Zookeeper
      • MicroServiceArchitecture
      • 依赖
    • Conceptions
      • CloudNative
    • Kibana
      • Query
    • Performance Optimizaition
      • Notes
        • Conceptions
        • CPUAffinity
  • Math
    • Probability Theory
      • 一些概率分布
    • Statistics
      • 统计量与估计量
    • Other
      • 排列组合
  • Tools
    • Markdown
      • syntax
    • Jetbrains
      • Jetbrains
    • Zsh
      • Install
  • TODO
由 GitBook 提供支持
在本页
  • 简介
  • 参考
  • 加载数据
  • 特征预处理
  • 无量纲化
  • 对定量特征二值化
  • 对定性特征哑编码
  • 缺失值计算
  • 数据变换
  • 特征选择
  • Filter
  • Wrapper
  • Embedded
  • 降维
  • PCA
  • 线性判别分析法(LDA)

这有帮助吗?

  1. Computer Science
  2. Machine Learning
  3. Notes
  4. Code
  5. FeatureEngineering

Iris

上一页FeatureEngineering下一页概念

最后更新于4年前

这有帮助吗?

简介

使用 sklearn 中对特征处理功能进行说明。

IRIS数据集由Fisher在1936年整理,包含4个特征(Sepal.Length(花萼长度)、Sepal.Width(花萼宽度)、Petal.Length(花瓣长度)、Petal.Width(花瓣宽度)),特征值都为正浮点数,单位为厘米。目标值为鸢尾花的分类(Iris Setosa(山鸢尾)、Iris Versicolour(杂色鸢尾),Iris Virginica(维吉尼亚鸢尾))。

参考

加载数据

from sklearn.datasets import load_iris

# 导入数据集
iris = load_iris()

# 特征矩阵
print(type(iris.data))
print(iris.data[:5])

# 目标向量
print(type(iris.target))
print(iris.target)
<class 'numpy.ndarray'>
[[ 5.1  3.5  1.4  0.2]
 [ 4.9  3.   1.4  0.2]
 [ 4.7  3.2  1.3  0.2]
 [ 4.6  3.1  1.5  0.2]
 [ 5.   3.6  1.4  0.2]]
<class 'numpy.ndarray'>
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2]

特征预处理

无量纲化

标准化

使用preproccessing库的StandardScaler类对数据进行标准化的代码如下。

from sklearn.preprocessing import StandardScaler

# 标准化,返回值为标准化后的数据
StandardScaler().fit_transform(iris.data)[:10]
array([[-0.90068117,  1.03205722, -1.3412724 , -1.31297673],
       [-1.14301691, -0.1249576 , -1.3412724 , -1.31297673],
       [-1.38535265,  0.33784833, -1.39813811, -1.31297673],
       [-1.50652052,  0.10644536, -1.2844067 , -1.31297673],
       [-1.02184904,  1.26346019, -1.3412724 , -1.31297673],
       [-0.53717756,  1.95766909, -1.17067529, -1.05003079],
       [-1.50652052,  0.80065426, -1.3412724 , -1.18150376],
       [-1.02184904,  0.80065426, -1.2844067 , -1.31297673],
       [-1.74885626, -0.35636057, -1.3412724 , -1.31297673],
       [-1.14301691,  0.10644536, -1.2844067 , -1.4444497 ]])

区间缩放法

使用preproccessing库的MinMaxScaler类对数据进行区间缩放的代码如下。

from sklearn.preprocessing import MinMaxScaler

#区间缩放,返回值为缩放到[0, 1]区间的数据
MinMaxScaler().fit_transform(iris.data)[:10]
array([[ 0.22222222,  0.625     ,  0.06779661,  0.04166667],
       [ 0.16666667,  0.41666667,  0.06779661,  0.04166667],
       [ 0.11111111,  0.5       ,  0.05084746,  0.04166667],
       [ 0.08333333,  0.45833333,  0.08474576,  0.04166667],
       [ 0.19444444,  0.66666667,  0.06779661,  0.04166667],
       [ 0.30555556,  0.79166667,  0.11864407,  0.125     ],
       [ 0.08333333,  0.58333333,  0.06779661,  0.08333333],
       [ 0.19444444,  0.58333333,  0.08474576,  0.04166667],
       [ 0.02777778,  0.375     ,  0.06779661,  0.04166667],
       [ 0.16666667,  0.45833333,  0.08474576,  0.        ]])

标准化与归一化的区别

使用preproccessing库的Normalizer类对数据进行归一化的代码如下。

from sklearn.preprocessing import Normalizer

#归一化,返回值为归一化后的数据
Normalizer().fit_transform(iris.data)[:10]
array([[ 0.80377277,  0.55160877,  0.22064351,  0.0315205 ],
       [ 0.82813287,  0.50702013,  0.23660939,  0.03380134],
       [ 0.80533308,  0.54831188,  0.2227517 ,  0.03426949],
       [ 0.80003025,  0.53915082,  0.26087943,  0.03478392],
       [ 0.790965  ,  0.5694948 ,  0.2214702 ,  0.0316386 ],
       [ 0.78417499,  0.5663486 ,  0.2468699 ,  0.05808704],
       [ 0.78010936,  0.57660257,  0.23742459,  0.0508767 ],
       [ 0.80218492,  0.54548574,  0.24065548,  0.0320874 ],
       [ 0.80642366,  0.5315065 ,  0.25658935,  0.03665562],
       [ 0.81803119,  0.51752994,  0.25041771,  0.01669451]])

对定量特征二值化

使用preproccessing库的Binarizer类对数据进行二值化的代码如下。

from sklearn.preprocessing import Binarizer

#二值化,阈值设置为3,返回值为二值化后的数据
Binarizer(threshold=3).fit_transform(iris.data)[:5]
array([[ 1.,  1.,  0.,  0.],
       [ 1.,  0.,  0.,  0.],
       [ 1.,  1.,  0.,  0.],
       [ 1.,  1.,  0.,  0.],
       [ 1.,  1.,  0.,  0.]])

对定性特征哑编码

由于IRIS数据集的特征皆为定量特征,故使用其目标值进行哑编码(实际上是不需要的)。使用preproccessing库的OneHotEncoder类对数据进行哑编码的代码如下。

from sklearn.preprocessing import OneHotEncoder

# 哑编码,对IRIS数据集的目标值,返回值为哑编码后的数据
OneHotEncoder().fit_transform(iris.target.reshape((-1,1)))

' with 150 stored elements in Compressed Sparse Row format>

缺失值计算

由于IRIS数据集没有缺失值,故对数据集新增一个样本,4个特征均赋值为NaN,表示数据缺失。使用preproccessing库的Imputer类对数据进行缺失值计算的代码如下。

from numpy import vstack, array, nan
from sklearn.preprocessing import Imputer

#缺失值计算,返回值为计算缺失值后的数据
#参数missing_value为缺失值的表示形式,默认为NaN
#参数strategy为缺失值填充方式,默认为mean(均值)
Imputer().fit_transform(vstack((array([nan, nan, nan, nan]), iris.data)))
array([[ 5.84333333,  3.054     ,  3.75866667,  1.19866667],
       [ 5.1       ,  3.5       ,  1.4       ,  0.2       ],
       [ 4.9       ,  3.        ,  1.4       ,  0.2       ],
       [ 4.7       ,  3.2       ,  1.3       ,  0.2       ],
       [ 4.6       ,  3.1       ,  1.5       ,  0.2       ],
       [ 5.        ,  3.6       ,  1.4       ,  0.2       ],
       [ 5.4       ,  3.9       ,  1.7       ,  0.4       ],
       [ 4.6       ,  3.4       ,  1.4       ,  0.3       ],
       [ 5.        ,  3.4       ,  1.5       ,  0.2       ],
       [ 4.4       ,  2.9       ,  1.4       ,  0.2       ],
       [ 4.9       ,  3.1       ,  1.5       ,  0.1       ],
       [ 5.4       ,  3.7       ,  1.5       ,  0.2       ],
       [ 4.8       ,  3.4       ,  1.6       ,  0.2       ],
       [ 4.8       ,  3.        ,  1.4       ,  0.1       ],
       [ 4.3       ,  3.        ,  1.1       ,  0.1       ],
       [ 5.8       ,  4.        ,  1.2       ,  0.2       ],
       [ 5.7       ,  4.4       ,  1.5       ,  0.4       ],
       [ 5.4       ,  3.9       ,  1.3       ,  0.4       ],
       [ 5.1       ,  3.5       ,  1.4       ,  0.3       ],
       [ 5.7       ,  3.8       ,  1.7       ,  0.3       ],
       [ 5.1       ,  3.8       ,  1.5       ,  0.3       ],
       [ 5.4       ,  3.4       ,  1.7       ,  0.2       ],
       [ 5.1       ,  3.7       ,  1.5       ,  0.4       ],
       [ 4.6       ,  3.6       ,  1.        ,  0.2       ],
       [ 5.1       ,  3.3       ,  1.7       ,  0.5       ],
       [ 4.8       ,  3.4       ,  1.9       ,  0.2       ],
       [ 5.        ,  3.        ,  1.6       ,  0.2       ],
       [ 5.        ,  3.4       ,  1.6       ,  0.4       ],
       [ 5.2       ,  3.5       ,  1.5       ,  0.2       ],
       [ 5.2       ,  3.4       ,  1.4       ,  0.2       ],
       [ 4.7       ,  3.2       ,  1.6       ,  0.2       ],
       [ 4.8       ,  3.1       ,  1.6       ,  0.2       ],
       [ 5.4       ,  3.4       ,  1.5       ,  0.4       ],
       [ 5.2       ,  4.1       ,  1.5       ,  0.1       ],
       [ 5.5       ,  4.2       ,  1.4       ,  0.2       ],
       [ 4.9       ,  3.1       ,  1.5       ,  0.1       ],
       [ 5.        ,  3.2       ,  1.2       ,  0.2       ],
       [ 5.5       ,  3.5       ,  1.3       ,  0.2       ],
       [ 4.9       ,  3.1       ,  1.5       ,  0.1       ],
       [ 4.4       ,  3.        ,  1.3       ,  0.2       ],
       [ 5.1       ,  3.4       ,  1.5       ,  0.2       ],
       [ 5.        ,  3.5       ,  1.3       ,  0.3       ],
       [ 4.5       ,  2.3       ,  1.3       ,  0.3       ],
       [ 4.4       ,  3.2       ,  1.3       ,  0.2       ],
       [ 5.        ,  3.5       ,  1.6       ,  0.6       ],
       [ 5.1       ,  3.8       ,  1.9       ,  0.4       ],
       [ 4.8       ,  3.        ,  1.4       ,  0.3       ],
       [ 5.1       ,  3.8       ,  1.6       ,  0.2       ],
       [ 4.6       ,  3.2       ,  1.4       ,  0.2       ],
       [ 5.3       ,  3.7       ,  1.5       ,  0.2       ],
       [ 5.        ,  3.3       ,  1.4       ,  0.2       ],
       [ 7.        ,  3.2       ,  4.7       ,  1.4       ],
       [ 6.4       ,  3.2       ,  4.5       ,  1.5       ],
       [ 6.9       ,  3.1       ,  4.9       ,  1.5       ],
       [ 5.5       ,  2.3       ,  4.        ,  1.3       ],
       [ 6.5       ,  2.8       ,  4.6       ,  1.5       ],
       [ 5.7       ,  2.8       ,  4.5       ,  1.3       ],
       [ 6.3       ,  3.3       ,  4.7       ,  1.6       ],
       [ 4.9       ,  2.4       ,  3.3       ,  1.        ],
       [ 6.6       ,  2.9       ,  4.6       ,  1.3       ],
       [ 5.2       ,  2.7       ,  3.9       ,  1.4       ],
       [ 5.        ,  2.        ,  3.5       ,  1.        ],
       [ 5.9       ,  3.        ,  4.2       ,  1.5       ],
       [ 6.        ,  2.2       ,  4.        ,  1.        ],
       [ 6.1       ,  2.9       ,  4.7       ,  1.4       ],
       [ 5.6       ,  2.9       ,  3.6       ,  1.3       ],
       [ 6.7       ,  3.1       ,  4.4       ,  1.4       ],
       [ 5.6       ,  3.        ,  4.5       ,  1.5       ],
       [ 5.8       ,  2.7       ,  4.1       ,  1.        ],
       [ 6.2       ,  2.2       ,  4.5       ,  1.5       ],
       [ 5.6       ,  2.5       ,  3.9       ,  1.1       ],
       [ 5.9       ,  3.2       ,  4.8       ,  1.8       ],
       [ 6.1       ,  2.8       ,  4.        ,  1.3       ],
       [ 6.3       ,  2.5       ,  4.9       ,  1.5       ],
       [ 6.1       ,  2.8       ,  4.7       ,  1.2       ],
       [ 6.4       ,  2.9       ,  4.3       ,  1.3       ],
       [ 6.6       ,  3.        ,  4.4       ,  1.4       ],
       [ 6.8       ,  2.8       ,  4.8       ,  1.4       ],
       [ 6.7       ,  3.        ,  5.        ,  1.7       ],
       [ 6.        ,  2.9       ,  4.5       ,  1.5       ],
       [ 5.7       ,  2.6       ,  3.5       ,  1.        ],
       [ 5.5       ,  2.4       ,  3.8       ,  1.1       ],
       [ 5.5       ,  2.4       ,  3.7       ,  1.        ],
       [ 5.8       ,  2.7       ,  3.9       ,  1.2       ],
       [ 6.        ,  2.7       ,  5.1       ,  1.6       ],
       [ 5.4       ,  3.        ,  4.5       ,  1.5       ],
       [ 6.        ,  3.4       ,  4.5       ,  1.6       ],
       [ 6.7       ,  3.1       ,  4.7       ,  1.5       ],
       [ 6.3       ,  2.3       ,  4.4       ,  1.3       ],
       [ 5.6       ,  3.        ,  4.1       ,  1.3       ],
       [ 5.5       ,  2.5       ,  4.        ,  1.3       ],
       [ 5.5       ,  2.6       ,  4.4       ,  1.2       ],
       [ 6.1       ,  3.        ,  4.6       ,  1.4       ],
       [ 5.8       ,  2.6       ,  4.        ,  1.2       ],
       [ 5.        ,  2.3       ,  3.3       ,  1.        ],
       [ 5.6       ,  2.7       ,  4.2       ,  1.3       ],
       [ 5.7       ,  3.        ,  4.2       ,  1.2       ],
       [ 5.7       ,  2.9       ,  4.2       ,  1.3       ],
       [ 6.2       ,  2.9       ,  4.3       ,  1.3       ],
       [ 5.1       ,  2.5       ,  3.        ,  1.1       ],
       [ 5.7       ,  2.8       ,  4.1       ,  1.3       ],
       [ 6.3       ,  3.3       ,  6.        ,  2.5       ],
       [ 5.8       ,  2.7       ,  5.1       ,  1.9       ],
       [ 7.1       ,  3.        ,  5.9       ,  2.1       ],
       [ 6.3       ,  2.9       ,  5.6       ,  1.8       ],
       [ 6.5       ,  3.        ,  5.8       ,  2.2       ],
       [ 7.6       ,  3.        ,  6.6       ,  2.1       ],
       [ 4.9       ,  2.5       ,  4.5       ,  1.7       ],
       [ 7.3       ,  2.9       ,  6.3       ,  1.8       ],
       [ 6.7       ,  2.5       ,  5.8       ,  1.8       ],
       [ 7.2       ,  3.6       ,  6.1       ,  2.5       ],
       [ 6.5       ,  3.2       ,  5.1       ,  2.        ],
       [ 6.4       ,  2.7       ,  5.3       ,  1.9       ],
       [ 6.8       ,  3.        ,  5.5       ,  2.1       ],
       [ 5.7       ,  2.5       ,  5.        ,  2.        ],
       [ 5.8       ,  2.8       ,  5.1       ,  2.4       ],
       [ 6.4       ,  3.2       ,  5.3       ,  2.3       ],
       [ 6.5       ,  3.        ,  5.5       ,  1.8       ],
       [ 7.7       ,  3.8       ,  6.7       ,  2.2       ],
       [ 7.7       ,  2.6       ,  6.9       ,  2.3       ],
       [ 6.        ,  2.2       ,  5.        ,  1.5       ],
       [ 6.9       ,  3.2       ,  5.7       ,  2.3       ],
       [ 5.6       ,  2.8       ,  4.9       ,  2.        ],
       [ 7.7       ,  2.8       ,  6.7       ,  2.        ],
       [ 6.3       ,  2.7       ,  4.9       ,  1.8       ],
       [ 6.7       ,  3.3       ,  5.7       ,  2.1       ],
       [ 7.2       ,  3.2       ,  6.        ,  1.8       ],
       [ 6.2       ,  2.8       ,  4.8       ,  1.8       ],
       [ 6.1       ,  3.        ,  4.9       ,  1.8       ],
       [ 6.4       ,  2.8       ,  5.6       ,  2.1       ],
       [ 7.2       ,  3.        ,  5.8       ,  1.6       ],
       [ 7.4       ,  2.8       ,  6.1       ,  1.9       ],
       [ 7.9       ,  3.8       ,  6.4       ,  2.        ],
       [ 6.4       ,  2.8       ,  5.6       ,  2.2       ],
       [ 6.3       ,  2.8       ,  5.1       ,  1.5       ],
       [ 6.1       ,  2.6       ,  5.6       ,  1.4       ],
       [ 7.7       ,  3.        ,  6.1       ,  2.3       ],
       [ 6.3       ,  3.4       ,  5.6       ,  2.4       ],
       [ 6.4       ,  3.1       ,  5.5       ,  1.8       ],
       [ 6.        ,  3.        ,  4.8       ,  1.8       ],
       [ 6.9       ,  3.1       ,  5.4       ,  2.1       ],
       [ 6.7       ,  3.1       ,  5.6       ,  2.4       ],
       [ 6.9       ,  3.1       ,  5.1       ,  2.3       ],
       [ 5.8       ,  2.7       ,  5.1       ,  1.9       ],
       [ 6.8       ,  3.2       ,  5.9       ,  2.3       ],
       [ 6.7       ,  3.3       ,  5.7       ,  2.5       ],
       [ 6.7       ,  3.        ,  5.2       ,  2.3       ],
       [ 6.3       ,  2.5       ,  5.        ,  1.9       ],
       [ 6.5       ,  3.        ,  5.2       ,  2.        ],
       [ 6.2       ,  3.4       ,  5.4       ,  2.3       ],
       [ 5.9       ,  3.        ,  5.1       ,  1.8       ]])

数据变换

使用preproccessing库的PolynomialFeatures类对数据进行多项式转换的代码如下。

from sklearn.preprocessing import PolynomialFeatures

#多项式转换
#参数degree为度,默认值为2
PolynomialFeatures().fit_transform(iris.data)
array([[  1.  ,   5.1 ,   3.5 , ...,   1.96,   0.28,   0.04],
       [  1.  ,   4.9 ,   3.  , ...,   1.96,   0.28,   0.04],
       [  1.  ,   4.7 ,   3.2 , ...,   1.69,   0.26,   0.04],
       ..., 
       [  1.  ,   6.5 ,   3.  , ...,  27.04,  10.4 ,   4.  ],
       [  1.  ,   6.2 ,   3.4 , ...,  29.16,  12.42,   5.29],
       [  1.  ,   5.9 ,   3.  , ...,  26.01,   9.18,   3.24]])

基于单变元函数的数据变换可以使用一个统一的方式完成,使用preproccessing库的FunctionTransformer对数据进行对数函数转换的代码如下。

from numpy import log1p
from sklearn.preprocessing import FunctionTransformer

#自定义转换函数为对数函数的数据变换
#第一个参数是单变元函数
FunctionTransformer(log1p).fit_transform(iris.data)[:10]
array([[ 1.80828877,  1.5040774 ,  0.87546874,  0.18232156],
       [ 1.77495235,  1.38629436,  0.87546874,  0.18232156],
       [ 1.74046617,  1.43508453,  0.83290912,  0.18232156],
       [ 1.7227666 ,  1.41098697,  0.91629073,  0.18232156],
       [ 1.79175947,  1.5260563 ,  0.87546874,  0.18232156],
       [ 1.85629799,  1.58923521,  0.99325177,  0.33647224],
       [ 1.7227666 ,  1.48160454,  0.87546874,  0.26236426],
       [ 1.79175947,  1.48160454,  0.91629073,  0.18232156],
       [ 1.68639895,  1.36097655,  0.87546874,  0.18232156],
       [ 1.77495235,  1.41098697,  0.91629073,  0.09531018]])

特征选择

Filter

方差选择法

from sklearn.feature_selection import VarianceThreshold

#方差选择法,返回值为特征选择后的数据
#参数threshold为方差的阈值
VarianceThreshold(threshold=3).fit_transform(iris.data)
array([[ 1.4],
       [ 1.4],
       [ 1.3],
       [ 1.5],
       [ 1.4],
       [ 1.7],
       [ 1.4],
       [ 1.5],
       [ 1.4],
       [ 1.5],
       [ 1.5],
       [ 1.6],
       [ 1.4],
       [ 1.1],
       [ 1.2],
       [ 1.5],
       [ 1.3],
       [ 1.4],
       [ 1.7],
       [ 1.5],
       [ 1.7],
       [ 1.5],
       [ 1. ],
       [ 1.7],
       [ 1.9],
       [ 1.6],
       [ 1.6],
       [ 1.5],
       [ 1.4],
       [ 1.6],
       [ 1.6],
       [ 1.5],
       [ 1.5],
       [ 1.4],
       [ 1.5],
       [ 1.2],
       [ 1.3],
       [ 1.5],
       [ 1.3],
       [ 1.5],
       [ 1.3],
       [ 1.3],
       [ 1.3],
       [ 1.6],
       [ 1.9],
       [ 1.4],
       [ 1.6],
       [ 1.4],
       [ 1.5],
       [ 1.4],
       [ 4.7],
       [ 4.5],
       [ 4.9],
       [ 4. ],
       [ 4.6],
       [ 4.5],
       [ 4.7],
       [ 3.3],
       [ 4.6],
       [ 3.9],
       [ 3.5],
       [ 4.2],
       [ 4. ],
       [ 4.7],
       [ 3.6],
       [ 4.4],
       [ 4.5],
       [ 4.1],
       [ 4.5],
       [ 3.9],
       [ 4.8],
       [ 4. ],
       [ 4.9],
       [ 4.7],
       [ 4.3],
       [ 4.4],
       [ 4.8],
       [ 5. ],
       [ 4.5],
       [ 3.5],
       [ 3.8],
       [ 3.7],
       [ 3.9],
       [ 5.1],
       [ 4.5],
       [ 4.5],
       [ 4.7],
       [ 4.4],
       [ 4.1],
       [ 4. ],
       [ 4.4],
       [ 4.6],
       [ 4. ],
       [ 3.3],
       [ 4.2],
       [ 4.2],
       [ 4.2],
       [ 4.3],
       [ 3. ],
       [ 4.1],
       [ 6. ],
       [ 5.1],
       [ 5.9],
       [ 5.6],
       [ 5.8],
       [ 6.6],
       [ 4.5],
       [ 6.3],
       [ 5.8],
       [ 6.1],
       [ 5.1],
       [ 5.3],
       [ 5.5],
       [ 5. ],
       [ 5.1],
       [ 5.3],
       [ 5.5],
       [ 6.7],
       [ 6.9],
       [ 5. ],
       [ 5.7],
       [ 4.9],
       [ 6.7],
       [ 4.9],
       [ 5.7],
       [ 6. ],
       [ 4.8],
       [ 4.9],
       [ 5.6],
       [ 5.8],
       [ 6.1],
       [ 6.4],
       [ 5.6],
       [ 5.1],
       [ 5.6],
       [ 6.1],
       [ 5.6],
       [ 5.5],
       [ 4.8],
       [ 5.4],
       [ 5.6],
       [ 5.1],
       [ 5.1],
       [ 5.9],
       [ 5.7],
       [ 5.2],
       [ 5. ],
       [ 5.2],
       [ 5.4],
       [ 5.1]])

相关系数法

用feature_selection库的SelectKBest类结合相关系数来选择特征的代码如下。

from sklearn.feature_selection import SelectKBest
from scipy.stats import pearsonr

# 选择K个最好的特征,返回选择特征后的数据
# 第一个参数为计算评估特征是否好的函数,该函数输入特征矩阵和目标向量,输出二元组(评分,P值)的数组,数组第i项为第i个特征的评分和P值,在此定义为计算相关系数。
# 参数k为选择的特征个数

def get_pearsonr(X, y):
    m = map(lambda x: pearsonr(x, y), X.T)
    res = array(list(m)).T
    return (res[0], res[1])

SelectKBest(get_pearsonr, k=2).fit_transform(iris.data, iris.target)
# SelectKBest(lambda X, Y: array(list(map(lambda x: pearsonr(x, Y)[0], X.T))).T, k=2).fit_transform(iris.data, iris.target)
array([[ 1.4,  0.2],
       [ 1.4,  0.2],
       [ 1.3,  0.2],
       [ 1.5,  0.2],
       [ 1.4,  0.2],
       [ 1.7,  0.4],
       [ 1.4,  0.3],
       [ 1.5,  0.2],
       [ 1.4,  0.2],
       [ 1.5,  0.1],
       [ 1.5,  0.2],
       [ 1.6,  0.2],
       [ 1.4,  0.1],
       [ 1.1,  0.1],
       [ 1.2,  0.2],
       [ 1.5,  0.4],
       [ 1.3,  0.4],
       [ 1.4,  0.3],
       [ 1.7,  0.3],
       [ 1.5,  0.3],
       [ 1.7,  0.2],
       [ 1.5,  0.4],
       [ 1. ,  0.2],
       [ 1.7,  0.5],
       [ 1.9,  0.2],
       [ 1.6,  0.2],
       [ 1.6,  0.4],
       [ 1.5,  0.2],
       [ 1.4,  0.2],
       [ 1.6,  0.2],
       [ 1.6,  0.2],
       [ 1.5,  0.4],
       [ 1.5,  0.1],
       [ 1.4,  0.2],
       [ 1.5,  0.1],
       [ 1.2,  0.2],
       [ 1.3,  0.2],
       [ 1.5,  0.1],
       [ 1.3,  0.2],
       [ 1.5,  0.2],
       [ 1.3,  0.3],
       [ 1.3,  0.3],
       [ 1.3,  0.2],
       [ 1.6,  0.6],
       [ 1.9,  0.4],
       [ 1.4,  0.3],
       [ 1.6,  0.2],
       [ 1.4,  0.2],
       [ 1.5,  0.2],
       [ 1.4,  0.2],
       [ 4.7,  1.4],
       [ 4.5,  1.5],
       [ 4.9,  1.5],
       [ 4. ,  1.3],
       [ 4.6,  1.5],
       [ 4.5,  1.3],
       [ 4.7,  1.6],
       [ 3.3,  1. ],
       [ 4.6,  1.3],
       [ 3.9,  1.4],
       [ 3.5,  1. ],
       [ 4.2,  1.5],
       [ 4. ,  1. ],
       [ 4.7,  1.4],
       [ 3.6,  1.3],
       [ 4.4,  1.4],
       [ 4.5,  1.5],
       [ 4.1,  1. ],
       [ 4.5,  1.5],
       [ 3.9,  1.1],
       [ 4.8,  1.8],
       [ 4. ,  1.3],
       [ 4.9,  1.5],
       [ 4.7,  1.2],
       [ 4.3,  1.3],
       [ 4.4,  1.4],
       [ 4.8,  1.4],
       [ 5. ,  1.7],
       [ 4.5,  1.5],
       [ 3.5,  1. ],
       [ 3.8,  1.1],
       [ 3.7,  1. ],
       [ 3.9,  1.2],
       [ 5.1,  1.6],
       [ 4.5,  1.5],
       [ 4.5,  1.6],
       [ 4.7,  1.5],
       [ 4.4,  1.3],
       [ 4.1,  1.3],
       [ 4. ,  1.3],
       [ 4.4,  1.2],
       [ 4.6,  1.4],
       [ 4. ,  1.2],
       [ 3.3,  1. ],
       [ 4.2,  1.3],
       [ 4.2,  1.2],
       [ 4.2,  1.3],
       [ 4.3,  1.3],
       [ 3. ,  1.1],
       [ 4.1,  1.3],
       [ 6. ,  2.5],
       [ 5.1,  1.9],
       [ 5.9,  2.1],
       [ 5.6,  1.8],
       [ 5.8,  2.2],
       [ 6.6,  2.1],
       [ 4.5,  1.7],
       [ 6.3,  1.8],
       [ 5.8,  1.8],
       [ 6.1,  2.5],
       [ 5.1,  2. ],
       [ 5.3,  1.9],
       [ 5.5,  2.1],
       [ 5. ,  2. ],
       [ 5.1,  2.4],
       [ 5.3,  2.3],
       [ 5.5,  1.8],
       [ 6.7,  2.2],
       [ 6.9,  2.3],
       [ 5. ,  1.5],
       [ 5.7,  2.3],
       [ 4.9,  2. ],
       [ 6.7,  2. ],
       [ 4.9,  1.8],
       [ 5.7,  2.1],
       [ 6. ,  1.8],
       [ 4.8,  1.8],
       [ 4.9,  1.8],
       [ 5.6,  2.1],
       [ 5.8,  1.6],
       [ 6.1,  1.9],
       [ 6.4,  2. ],
       [ 5.6,  2.2],
       [ 5.1,  1.5],
       [ 5.6,  1.4],
       [ 6.1,  2.3],
       [ 5.6,  2.4],
       [ 5.5,  1.8],
       [ 4.8,  1.8],
       [ 5.4,  2.1],
       [ 5.6,  2.4],
       [ 5.1,  2.3],
       [ 5.1,  1.9],
       [ 5.9,  2.3],
       [ 5.7,  2.5],
       [ 5.2,  2.3],
       [ 5. ,  1.9],
       [ 5.2,  2. ],
       [ 5.4,  2.3],
       [ 5.1,  1.8]])

卡方检验

用feature_selection库的SelectKBest类结合卡方检验来选择特征的代码如下:

from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2

#选择K个最好的特征,返回选择特征后的数据
SelectKBest(chi2, k=2).fit_transform(iris.data, iris.target)
array([[ 1.4,  0.2],
       [ 1.4,  0.2],
       [ 1.3,  0.2],
       [ 1.5,  0.2],
       [ 1.4,  0.2],
       [ 1.7,  0.4],
       [ 1.4,  0.3],
       [ 1.5,  0.2],
       [ 1.4,  0.2],
       [ 1.5,  0.1],
       [ 1.5,  0.2],
       [ 1.6,  0.2],
       [ 1.4,  0.1],
       [ 1.1,  0.1],
       [ 1.2,  0.2],
       [ 1.5,  0.4],
       [ 1.3,  0.4],
       [ 1.4,  0.3],
       [ 1.7,  0.3],
       [ 1.5,  0.3],
       [ 1.7,  0.2],
       [ 1.5,  0.4],
       [ 1. ,  0.2],
       [ 1.7,  0.5],
       [ 1.9,  0.2],
       [ 1.6,  0.2],
       [ 1.6,  0.4],
       [ 1.5,  0.2],
       [ 1.4,  0.2],
       [ 1.6,  0.2],
       [ 1.6,  0.2],
       [ 1.5,  0.4],
       [ 1.5,  0.1],
       [ 1.4,  0.2],
       [ 1.5,  0.1],
       [ 1.2,  0.2],
       [ 1.3,  0.2],
       [ 1.5,  0.1],
       [ 1.3,  0.2],
       [ 1.5,  0.2],
       [ 1.3,  0.3],
       [ 1.3,  0.3],
       [ 1.3,  0.2],
       [ 1.6,  0.6],
       [ 1.9,  0.4],
       [ 1.4,  0.3],
       [ 1.6,  0.2],
       [ 1.4,  0.2],
       [ 1.5,  0.2],
       [ 1.4,  0.2],
       [ 4.7,  1.4],
       [ 4.5,  1.5],
       [ 4.9,  1.5],
       [ 4. ,  1.3],
       [ 4.6,  1.5],
       [ 4.5,  1.3],
       [ 4.7,  1.6],
       [ 3.3,  1. ],
       [ 4.6,  1.3],
       [ 3.9,  1.4],
       [ 3.5,  1. ],
       [ 4.2,  1.5],
       [ 4. ,  1. ],
       [ 4.7,  1.4],
       [ 3.6,  1.3],
       [ 4.4,  1.4],
       [ 4.5,  1.5],
       [ 4.1,  1. ],
       [ 4.5,  1.5],
       [ 3.9,  1.1],
       [ 4.8,  1.8],
       [ 4. ,  1.3],
       [ 4.9,  1.5],
       [ 4.7,  1.2],
       [ 4.3,  1.3],
       [ 4.4,  1.4],
       [ 4.8,  1.4],
       [ 5. ,  1.7],
       [ 4.5,  1.5],
       [ 3.5,  1. ],
       [ 3.8,  1.1],
       [ 3.7,  1. ],
       [ 3.9,  1.2],
       [ 5.1,  1.6],
       [ 4.5,  1.5],
       [ 4.5,  1.6],
       [ 4.7,  1.5],
       [ 4.4,  1.3],
       [ 4.1,  1.3],
       [ 4. ,  1.3],
       [ 4.4,  1.2],
       [ 4.6,  1.4],
       [ 4. ,  1.2],
       [ 3.3,  1. ],
       [ 4.2,  1.3],
       [ 4.2,  1.2],
       [ 4.2,  1.3],
       [ 4.3,  1.3],
       [ 3. ,  1.1],
       [ 4.1,  1.3],
       [ 6. ,  2.5],
       [ 5.1,  1.9],
       [ 5.9,  2.1],
       [ 5.6,  1.8],
       [ 5.8,  2.2],
       [ 6.6,  2.1],
       [ 4.5,  1.7],
       [ 6.3,  1.8],
       [ 5.8,  1.8],
       [ 6.1,  2.5],
       [ 5.1,  2. ],
       [ 5.3,  1.9],
       [ 5.5,  2.1],
       [ 5. ,  2. ],
       [ 5.1,  2.4],
       [ 5.3,  2.3],
       [ 5.5,  1.8],
       [ 6.7,  2.2],
       [ 6.9,  2.3],
       [ 5. ,  1.5],
       [ 5.7,  2.3],
       [ 4.9,  2. ],
       [ 6.7,  2. ],
       [ 4.9,  1.8],
       [ 5.7,  2.1],
       [ 6. ,  1.8],
       [ 4.8,  1.8],
       [ 4.9,  1.8],
       [ 5.6,  2.1],
       [ 5.8,  1.6],
       [ 6.1,  1.9],
       [ 6.4,  2. ],
       [ 5.6,  2.2],
       [ 5.1,  1.5],
       [ 5.6,  1.4],
       [ 6.1,  2.3],
       [ 5.6,  2.4],
       [ 5.5,  1.8],
       [ 4.8,  1.8],
       [ 5.4,  2.1],
       [ 5.6,  2.4],
       [ 5.1,  2.3],
       [ 5.1,  1.9],
       [ 5.9,  2.3],
       [ 5.7,  2.5],
       [ 5.2,  2.3],
       [ 5. ,  1.9],
       [ 5.2,  2. ],
       [ 5.4,  2.3],
       [ 5.1,  1.8]])

互信息法

使用feature_selection库的SelectKBest类结合最大信息系数法来选择特征的代码如下:

from sklearn.feature_selection import SelectKBest
from minepy import MINE

# 由于MINE的设计不是函数式的,定义mic方法将其为函数式的
def mic(x, y):
    m = MINE()
    m.compute_score(x, y)
    return m.mic()

#选择K个最好的特征,返回特征选择后的数据
SelectKBest(lambda X, Y: array(list(map(lambda x: mic(x, Y), X.T))).T, k=2).fit_transform(iris.data, iris.target)
array([[ 1.4,  0.2],
       [ 1.4,  0.2],
       [ 1.3,  0.2],
       [ 1.5,  0.2],
       [ 1.4,  0.2],
       [ 1.7,  0.4],
       [ 1.4,  0.3],
       [ 1.5,  0.2],
       [ 1.4,  0.2],
       [ 1.5,  0.1],
       [ 1.5,  0.2],
       [ 1.6,  0.2],
       [ 1.4,  0.1],
       [ 1.1,  0.1],
       [ 1.2,  0.2],
       [ 1.5,  0.4],
       [ 1.3,  0.4],
       [ 1.4,  0.3],
       [ 1.7,  0.3],
       [ 1.5,  0.3],
       [ 1.7,  0.2],
       [ 1.5,  0.4],
       [ 1. ,  0.2],
       [ 1.7,  0.5],
       [ 1.9,  0.2],
       [ 1.6,  0.2],
       [ 1.6,  0.4],
       [ 1.5,  0.2],
       [ 1.4,  0.2],
       [ 1.6,  0.2],
       [ 1.6,  0.2],
       [ 1.5,  0.4],
       [ 1.5,  0.1],
       [ 1.4,  0.2],
       [ 1.5,  0.1],
       [ 1.2,  0.2],
       [ 1.3,  0.2],
       [ 1.5,  0.1],
       [ 1.3,  0.2],
       [ 1.5,  0.2],
       [ 1.3,  0.3],
       [ 1.3,  0.3],
       [ 1.3,  0.2],
       [ 1.6,  0.6],
       [ 1.9,  0.4],
       [ 1.4,  0.3],
       [ 1.6,  0.2],
       [ 1.4,  0.2],
       [ 1.5,  0.2],
       [ 1.4,  0.2],
       [ 4.7,  1.4],
       [ 4.5,  1.5],
       [ 4.9,  1.5],
       [ 4. ,  1.3],
       [ 4.6,  1.5],
       [ 4.5,  1.3],
       [ 4.7,  1.6],
       [ 3.3,  1. ],
       [ 4.6,  1.3],
       [ 3.9,  1.4],
       [ 3.5,  1. ],
       [ 4.2,  1.5],
       [ 4. ,  1. ],
       [ 4.7,  1.4],
       [ 3.6,  1.3],
       [ 4.4,  1.4],
       [ 4.5,  1.5],
       [ 4.1,  1. ],
       [ 4.5,  1.5],
       [ 3.9,  1.1],
       [ 4.8,  1.8],
       [ 4. ,  1.3],
       [ 4.9,  1.5],
       [ 4.7,  1.2],
       [ 4.3,  1.3],
       [ 4.4,  1.4],
       [ 4.8,  1.4],
       [ 5. ,  1.7],
       [ 4.5,  1.5],
       [ 3.5,  1. ],
       [ 3.8,  1.1],
       [ 3.7,  1. ],
       [ 3.9,  1.2],
       [ 5.1,  1.6],
       [ 4.5,  1.5],
       [ 4.5,  1.6],
       [ 4.7,  1.5],
       [ 4.4,  1.3],
       [ 4.1,  1.3],
       [ 4. ,  1.3],
       [ 4.4,  1.2],
       [ 4.6,  1.4],
       [ 4. ,  1.2],
       [ 3.3,  1. ],
       [ 4.2,  1.3],
       [ 4.2,  1.2],
       [ 4.2,  1.3],
       [ 4.3,  1.3],
       [ 3. ,  1.1],
       [ 4.1,  1.3],
       [ 6. ,  2.5],
       [ 5.1,  1.9],
       [ 5.9,  2.1],
       [ 5.6,  1.8],
       [ 5.8,  2.2],
       [ 6.6,  2.1],
       [ 4.5,  1.7],
       [ 6.3,  1.8],
       [ 5.8,  1.8],
       [ 6.1,  2.5],
       [ 5.1,  2. ],
       [ 5.3,  1.9],
       [ 5.5,  2.1],
       [ 5. ,  2. ],
       [ 5.1,  2.4],
       [ 5.3,  2.3],
       [ 5.5,  1.8],
       [ 6.7,  2.2],
       [ 6.9,  2.3],
       [ 5. ,  1.5],
       [ 5.7,  2.3],
       [ 4.9,  2. ],
       [ 6.7,  2. ],
       [ 4.9,  1.8],
       [ 5.7,  2.1],
       [ 6. ,  1.8],
       [ 4.8,  1.8],
       [ 4.9,  1.8],
       [ 5.6,  2.1],
       [ 5.8,  1.6],
       [ 6.1,  1.9],
       [ 6.4,  2. ],
       [ 5.6,  2.2],
       [ 5.1,  1.5],
       [ 5.6,  1.4],
       [ 6.1,  2.3],
       [ 5.6,  2.4],
       [ 5.5,  1.8],
       [ 4.8,  1.8],
       [ 5.4,  2.1],
       [ 5.6,  2.4],
       [ 5.1,  2.3],
       [ 5.1,  1.9],
       [ 5.9,  2.3],
       [ 5.7,  2.5],
       [ 5.2,  2.3],
       [ 5. ,  1.9],
       [ 5.2,  2. ],
       [ 5.4,  2.3],
       [ 5.1,  1.8]])

Wrapper

递归特征消除法

使用feature_selection库的RFE类来选择特征的代码如下。

from sklearn.feature_selection import RFE
from sklearn.linear_model import LogisticRegression

# 递归特征消除法,返回特征选择后的数据
# 参数estimator为基模型
# 参数n_features_to_select为选择的特征个数
RFE(estimator=LogisticRegression(), n_features_to_select=2).fit_transform(iris.data, iris.target)
array([[ 3.5,  0.2],
       [ 3. ,  0.2],
       [ 3.2,  0.2],
       [ 3.1,  0.2],
       [ 3.6,  0.2],
       [ 3.9,  0.4],
       [ 3.4,  0.3],
       [ 3.4,  0.2],
       [ 2.9,  0.2],
       [ 3.1,  0.1],
       [ 3.7,  0.2],
       [ 3.4,  0.2],
       [ 3. ,  0.1],
       [ 3. ,  0.1],
       [ 4. ,  0.2],
       [ 4.4,  0.4],
       [ 3.9,  0.4],
       [ 3.5,  0.3],
       [ 3.8,  0.3],
       [ 3.8,  0.3],
       [ 3.4,  0.2],
       [ 3.7,  0.4],
       [ 3.6,  0.2],
       [ 3.3,  0.5],
       [ 3.4,  0.2],
       [ 3. ,  0.2],
       [ 3.4,  0.4],
       [ 3.5,  0.2],
       [ 3.4,  0.2],
       [ 3.2,  0.2],
       [ 3.1,  0.2],
       [ 3.4,  0.4],
       [ 4.1,  0.1],
       [ 4.2,  0.2],
       [ 3.1,  0.1],
       [ 3.2,  0.2],
       [ 3.5,  0.2],
       [ 3.1,  0.1],
       [ 3. ,  0.2],
       [ 3.4,  0.2],
       [ 3.5,  0.3],
       [ 2.3,  0.3],
       [ 3.2,  0.2],
       [ 3.5,  0.6],
       [ 3.8,  0.4],
       [ 3. ,  0.3],
       [ 3.8,  0.2],
       [ 3.2,  0.2],
       [ 3.7,  0.2],
       [ 3.3,  0.2],
       [ 3.2,  1.4],
       [ 3.2,  1.5],
       [ 3.1,  1.5],
       [ 2.3,  1.3],
       [ 2.8,  1.5],
       [ 2.8,  1.3],
       [ 3.3,  1.6],
       [ 2.4,  1. ],
       [ 2.9,  1.3],
       [ 2.7,  1.4],
       [ 2. ,  1. ],
       [ 3. ,  1.5],
       [ 2.2,  1. ],
       [ 2.9,  1.4],
       [ 2.9,  1.3],
       [ 3.1,  1.4],
       [ 3. ,  1.5],
       [ 2.7,  1. ],
       [ 2.2,  1.5],
       [ 2.5,  1.1],
       [ 3.2,  1.8],
       [ 2.8,  1.3],
       [ 2.5,  1.5],
       [ 2.8,  1.2],
       [ 2.9,  1.3],
       [ 3. ,  1.4],
       [ 2.8,  1.4],
       [ 3. ,  1.7],
       [ 2.9,  1.5],
       [ 2.6,  1. ],
       [ 2.4,  1.1],
       [ 2.4,  1. ],
       [ 2.7,  1.2],
       [ 2.7,  1.6],
       [ 3. ,  1.5],
       [ 3.4,  1.6],
       [ 3.1,  1.5],
       [ 2.3,  1.3],
       [ 3. ,  1.3],
       [ 2.5,  1.3],
       [ 2.6,  1.2],
       [ 3. ,  1.4],
       [ 2.6,  1.2],
       [ 2.3,  1. ],
       [ 2.7,  1.3],
       [ 3. ,  1.2],
       [ 2.9,  1.3],
       [ 2.9,  1.3],
       [ 2.5,  1.1],
       [ 2.8,  1.3],
       [ 3.3,  2.5],
       [ 2.7,  1.9],
       [ 3. ,  2.1],
       [ 2.9,  1.8],
       [ 3. ,  2.2],
       [ 3. ,  2.1],
       [ 2.5,  1.7],
       [ 2.9,  1.8],
       [ 2.5,  1.8],
       [ 3.6,  2.5],
       [ 3.2,  2. ],
       [ 2.7,  1.9],
       [ 3. ,  2.1],
       [ 2.5,  2. ],
       [ 2.8,  2.4],
       [ 3.2,  2.3],
       [ 3. ,  1.8],
       [ 3.8,  2.2],
       [ 2.6,  2.3],
       [ 2.2,  1.5],
       [ 3.2,  2.3],
       [ 2.8,  2. ],
       [ 2.8,  2. ],
       [ 2.7,  1.8],
       [ 3.3,  2.1],
       [ 3.2,  1.8],
       [ 2.8,  1.8],
       [ 3. ,  1.8],
       [ 2.8,  2.1],
       [ 3. ,  1.6],
       [ 2.8,  1.9],
       [ 3.8,  2. ],
       [ 2.8,  2.2],
       [ 2.8,  1.5],
       [ 2.6,  1.4],
       [ 3. ,  2.3],
       [ 3.4,  2.4],
       [ 3.1,  1.8],
       [ 3. ,  1.8],
       [ 3.1,  2.1],
       [ 3.1,  2.4],
       [ 3.1,  2.3],
       [ 2.7,  1.9],
       [ 3.2,  2.3],
       [ 3.3,  2.5],
       [ 3. ,  2.3],
       [ 2.5,  1.9],
       [ 3. ,  2. ],
       [ 3.4,  2.3],
       [ 3. ,  1.8]])

Embedded

基于惩罚项的特征选择法

使用feature_selection库的SelectFromModel类结合带L1惩罚项的逻辑回归模型,来选择特征的代码如下:

from sklearn.feature_selection import SelectFromModel
from sklearn.linear_model import LogisticRegression

#带L1惩罚项的逻辑回归作为基模型的特征选择
SelectFromModel(LogisticRegression(penalty="l1", C=0.1)).fit_transform(iris.data, iris.target)
array([[ 5.1,  3.5,  1.4],
       [ 4.9,  3. ,  1.4],
       [ 4.7,  3.2,  1.3],
       [ 4.6,  3.1,  1.5],
       [ 5. ,  3.6,  1.4],
       [ 5.4,  3.9,  1.7],
       [ 4.6,  3.4,  1.4],
       [ 5. ,  3.4,  1.5],
       [ 4.4,  2.9,  1.4],
       [ 4.9,  3.1,  1.5],
       [ 5.4,  3.7,  1.5],
       [ 4.8,  3.4,  1.6],
       [ 4.8,  3. ,  1.4],
       [ 4.3,  3. ,  1.1],
       [ 5.8,  4. ,  1.2],
       [ 5.7,  4.4,  1.5],
       [ 5.4,  3.9,  1.3],
       [ 5.1,  3.5,  1.4],
       [ 5.7,  3.8,  1.7],
       [ 5.1,  3.8,  1.5],
       [ 5.4,  3.4,  1.7],
       [ 5.1,  3.7,  1.5],
       [ 4.6,  3.6,  1. ],
       [ 5.1,  3.3,  1.7],
       [ 4.8,  3.4,  1.9],
       [ 5. ,  3. ,  1.6],
       [ 5. ,  3.4,  1.6],
       [ 5.2,  3.5,  1.5],
       [ 5.2,  3.4,  1.4],
       [ 4.7,  3.2,  1.6],
       [ 4.8,  3.1,  1.6],
       [ 5.4,  3.4,  1.5],
       [ 5.2,  4.1,  1.5],
       [ 5.5,  4.2,  1.4],
       [ 4.9,  3.1,  1.5],
       [ 5. ,  3.2,  1.2],
       [ 5.5,  3.5,  1.3],
       [ 4.9,  3.1,  1.5],
       [ 4.4,  3. ,  1.3],
       [ 5.1,  3.4,  1.5],
       [ 5. ,  3.5,  1.3],
       [ 4.5,  2.3,  1.3],
       [ 4.4,  3.2,  1.3],
       [ 5. ,  3.5,  1.6],
       [ 5.1,  3.8,  1.9],
       [ 4.8,  3. ,  1.4],
       [ 5.1,  3.8,  1.6],
       [ 4.6,  3.2,  1.4],
       [ 5.3,  3.7,  1.5],
       [ 5. ,  3.3,  1.4],
       [ 7. ,  3.2,  4.7],
       [ 6.4,  3.2,  4.5],
       [ 6.9,  3.1,  4.9],
       [ 5.5,  2.3,  4. ],
       [ 6.5,  2.8,  4.6],
       [ 5.7,  2.8,  4.5],
       [ 6.3,  3.3,  4.7],
       [ 4.9,  2.4,  3.3],
       [ 6.6,  2.9,  4.6],
       [ 5.2,  2.7,  3.9],
       [ 5. ,  2. ,  3.5],
       [ 5.9,  3. ,  4.2],
       [ 6. ,  2.2,  4. ],
       [ 6.1,  2.9,  4.7],
       [ 5.6,  2.9,  3.6],
       [ 6.7,  3.1,  4.4],
       [ 5.6,  3. ,  4.5],
       [ 5.8,  2.7,  4.1],
       [ 6.2,  2.2,  4.5],
       [ 5.6,  2.5,  3.9],
       [ 5.9,  3.2,  4.8],
       [ 6.1,  2.8,  4. ],
       [ 6.3,  2.5,  4.9],
       [ 6.1,  2.8,  4.7],
       [ 6.4,  2.9,  4.3],
       [ 6.6,  3. ,  4.4],
       [ 6.8,  2.8,  4.8],
       [ 6.7,  3. ,  5. ],
       [ 6. ,  2.9,  4.5],
       [ 5.7,  2.6,  3.5],
       [ 5.5,  2.4,  3.8],
       [ 5.5,  2.4,  3.7],
       [ 5.8,  2.7,  3.9],
       [ 6. ,  2.7,  5.1],
       [ 5.4,  3. ,  4.5],
       [ 6. ,  3.4,  4.5],
       [ 6.7,  3.1,  4.7],
       [ 6.3,  2.3,  4.4],
       [ 5.6,  3. ,  4.1],
       [ 5.5,  2.5,  4. ],
       [ 5.5,  2.6,  4.4],
       [ 6.1,  3. ,  4.6],
       [ 5.8,  2.6,  4. ],
       [ 5. ,  2.3,  3.3],
       [ 5.6,  2.7,  4.2],
       [ 5.7,  3. ,  4.2],
       [ 5.7,  2.9,  4.2],
       [ 6.2,  2.9,  4.3],
       [ 5.1,  2.5,  3. ],
       [ 5.7,  2.8,  4.1],
       [ 6.3,  3.3,  6. ],
       [ 5.8,  2.7,  5.1],
       [ 7.1,  3. ,  5.9],
       [ 6.3,  2.9,  5.6],
       [ 6.5,  3. ,  5.8],
       [ 7.6,  3. ,  6.6],
       [ 4.9,  2.5,  4.5],
       [ 7.3,  2.9,  6.3],
       [ 6.7,  2.5,  5.8],
       [ 7.2,  3.6,  6.1],
       [ 6.5,  3.2,  5.1],
       [ 6.4,  2.7,  5.3],
       [ 6.8,  3. ,  5.5],
       [ 5.7,  2.5,  5. ],
       [ 5.8,  2.8,  5.1],
       [ 6.4,  3.2,  5.3],
       [ 6.5,  3. ,  5.5],
       [ 7.7,  3.8,  6.7],
       [ 7.7,  2.6,  6.9],
       [ 6. ,  2.2,  5. ],
       [ 6.9,  3.2,  5.7],
       [ 5.6,  2.8,  4.9],
       [ 7.7,  2.8,  6.7],
       [ 6.3,  2.7,  4.9],
       [ 6.7,  3.3,  5.7],
       [ 7.2,  3.2,  6. ],
       [ 6.2,  2.8,  4.8],
       [ 6.1,  3. ,  4.9],
       [ 6.4,  2.8,  5.6],
       [ 7.2,  3. ,  5.8],
       [ 7.4,  2.8,  6.1],
       [ 7.9,  3.8,  6.4],
       [ 6.4,  2.8,  5.6],
       [ 6.3,  2.8,  5.1],
       [ 6.1,  2.6,  5.6],
       [ 7.7,  3. ,  6.1],
       [ 6.3,  3.4,  5.6],
       [ 6.4,  3.1,  5.5],
       [ 6. ,  3. ,  4.8],
       [ 6.9,  3.1,  5.4],
       [ 6.7,  3.1,  5.6],
       [ 6.9,  3.1,  5.1],
       [ 5.8,  2.7,  5.1],
       [ 6.8,  3.2,  5.9],
       [ 6.7,  3.3,  5.7],
       [ 6.7,  3. ,  5.2],
       [ 6.3,  2.5,  5. ],
       [ 6.5,  3. ,  5.2],
       [ 6.2,  3.4,  5.4],
       [ 5.9,  3. ,  5.1]])

实际上,L1惩罚项降维的原理在于保留多个对目标值具有同等相关性的特征中的一个,所以没选到的特征不代表不重要。故,可结合L2惩罚项来优化。具体操作为:若一个特征在L1中的权值为1,选择在L2中权值差别不大且在L1中权值为0的特征构成同类集合,将这一集合中的特征平分L1中的权值,故需要构建一个新的逻辑回归模型:

from sklearn.linear_model import LogisticRegression

class LR(LogisticRegression):
    def __init__(self, threshold=0.01, dual=False, tol=1e-4, C=1.0,
                 fit_intercept=True, intercept_scaling=1, class_weight=None,
                 random_state=None, solver='liblinear', max_iter=100,
                 multi_class='ovr', verbose=0, warm_start=False, n_jobs=1):

        #权值相近的阈值
        self.threshold = threshold
        LogisticRegression.__init__(self, penalty='l1', dual=dual, tol=tol, C=C,
                 fit_intercept=fit_intercept, intercept_scaling=intercept_scaling, class_weight=class_weight,
                 random_state=random_state, solver=solver, max_iter=max_iter,
                 multi_class=multi_class, verbose=verbose, warm_start=warm_start, n_jobs=n_jobs)
        #使用同样的参数创建L2逻辑回归
        self.l2 = LogisticRegression(penalty='l2', dual=dual, tol=tol, C=C, fit_intercept=fit_intercept, intercept_scaling=intercept_scaling, class_weight = class_weight, random_state=random_state, solver=solver, max_iter=max_iter, multi_class=multi_class, verbose=verbose, warm_start=warm_start, n_jobs=n_jobs)

    def fit(self, X, y, sample_weight=None):
        #训练L1逻辑回归
        super(LR, self).fit(X, y, sample_weight=sample_weight)
        self.coef_old_ = self.coef_.copy()
        #训练L2逻辑回归
        self.l2.fit(X, y, sample_weight=sample_weight)

        cntOfRow, cntOfCol = self.coef_.shape
        #权值系数矩阵的行数对应目标值的种类数目
        for i in range(cntOfRow):
            for j in range(cntOfCol):
                coef = self.coef_[i][j]
                #L1逻辑回归的权值系数不为0
                if coef != 0:
                    idx = [j]
                    #对应在L2逻辑回归中的权值系数
                    coef1 = self.l2.coef_[i][j]
                    for k in range(cntOfCol):
                        coef2 = self.l2.coef_[i][k]
                        #在L2逻辑回归中,权值系数之差小于设定的阈值,且在L1中对应的权值为0
                        if abs(coef1-coef2) < self.threshold and j != k and self.coef_[i][k] == 0:
                            idx.append(k)
                    #计算这一类特征的权值系数均值
                    mean = coef / len(idx)
                    self.coef_[i][idx] = mean
        return self

使用feature_selection库的SelectFromModel类结合带L1以及L2惩罚项的逻辑回归模型,来选择特征的代码如下:

from sklearn.feature_selection import SelectFromModel

#带L1和L2惩罚项的逻辑回归作为基模型的特征选择
#参数threshold为权值系数之差的阈值
SelectFromModel(LR(threshold=0.5, C=0.1)).fit_transform(iris.data, iris.target)
array([[ 5.1,  3.5,  1.4,  0.2],
       [ 4.9,  3. ,  1.4,  0.2],
       [ 4.7,  3.2,  1.3,  0.2],
       [ 4.6,  3.1,  1.5,  0.2],
       [ 5. ,  3.6,  1.4,  0.2],
       [ 5.4,  3.9,  1.7,  0.4],
       [ 4.6,  3.4,  1.4,  0.3],
       [ 5. ,  3.4,  1.5,  0.2],
       [ 4.4,  2.9,  1.4,  0.2],
       [ 4.9,  3.1,  1.5,  0.1],
       [ 5.4,  3.7,  1.5,  0.2],
       [ 4.8,  3.4,  1.6,  0.2],
       [ 4.8,  3. ,  1.4,  0.1],
       [ 4.3,  3. ,  1.1,  0.1],
       [ 5.8,  4. ,  1.2,  0.2],
       [ 5.7,  4.4,  1.5,  0.4],
       [ 5.4,  3.9,  1.3,  0.4],
       [ 5.1,  3.5,  1.4,  0.3],
       [ 5.7,  3.8,  1.7,  0.3],
       [ 5.1,  3.8,  1.5,  0.3],
       [ 5.4,  3.4,  1.7,  0.2],
       [ 5.1,  3.7,  1.5,  0.4],
       [ 4.6,  3.6,  1. ,  0.2],
       [ 5.1,  3.3,  1.7,  0.5],
       [ 4.8,  3.4,  1.9,  0.2],
       [ 5. ,  3. ,  1.6,  0.2],
       [ 5. ,  3.4,  1.6,  0.4],
       [ 5.2,  3.5,  1.5,  0.2],
       [ 5.2,  3.4,  1.4,  0.2],
       [ 4.7,  3.2,  1.6,  0.2],
       [ 4.8,  3.1,  1.6,  0.2],
       [ 5.4,  3.4,  1.5,  0.4],
       [ 5.2,  4.1,  1.5,  0.1],
       [ 5.5,  4.2,  1.4,  0.2],
       [ 4.9,  3.1,  1.5,  0.1],
       [ 5. ,  3.2,  1.2,  0.2],
       [ 5.5,  3.5,  1.3,  0.2],
       [ 4.9,  3.1,  1.5,  0.1],
       [ 4.4,  3. ,  1.3,  0.2],
       [ 5.1,  3.4,  1.5,  0.2],
       [ 5. ,  3.5,  1.3,  0.3],
       [ 4.5,  2.3,  1.3,  0.3],
       [ 4.4,  3.2,  1.3,  0.2],
       [ 5. ,  3.5,  1.6,  0.6],
       [ 5.1,  3.8,  1.9,  0.4],
       [ 4.8,  3. ,  1.4,  0.3],
       [ 5.1,  3.8,  1.6,  0.2],
       [ 4.6,  3.2,  1.4,  0.2],
       [ 5.3,  3.7,  1.5,  0.2],
       [ 5. ,  3.3,  1.4,  0.2],
       [ 7. ,  3.2,  4.7,  1.4],
       [ 6.4,  3.2,  4.5,  1.5],
       [ 6.9,  3.1,  4.9,  1.5],
       [ 5.5,  2.3,  4. ,  1.3],
       [ 6.5,  2.8,  4.6,  1.5],
       [ 5.7,  2.8,  4.5,  1.3],
       [ 6.3,  3.3,  4.7,  1.6],
       [ 4.9,  2.4,  3.3,  1. ],
       [ 6.6,  2.9,  4.6,  1.3],
       [ 5.2,  2.7,  3.9,  1.4],
       [ 5. ,  2. ,  3.5,  1. ],
       [ 5.9,  3. ,  4.2,  1.5],
       [ 6. ,  2.2,  4. ,  1. ],
       [ 6.1,  2.9,  4.7,  1.4],
       [ 5.6,  2.9,  3.6,  1.3],
       [ 6.7,  3.1,  4.4,  1.4],
       [ 5.6,  3. ,  4.5,  1.5],
       [ 5.8,  2.7,  4.1,  1. ],
       [ 6.2,  2.2,  4.5,  1.5],
       [ 5.6,  2.5,  3.9,  1.1],
       [ 5.9,  3.2,  4.8,  1.8],
       [ 6.1,  2.8,  4. ,  1.3],
       [ 6.3,  2.5,  4.9,  1.5],
       [ 6.1,  2.8,  4.7,  1.2],
       [ 6.4,  2.9,  4.3,  1.3],
       [ 6.6,  3. ,  4.4,  1.4],
       [ 6.8,  2.8,  4.8,  1.4],
       [ 6.7,  3. ,  5. ,  1.7],
       [ 6. ,  2.9,  4.5,  1.5],
       [ 5.7,  2.6,  3.5,  1. ],
       [ 5.5,  2.4,  3.8,  1.1],
       [ 5.5,  2.4,  3.7,  1. ],
       [ 5.8,  2.7,  3.9,  1.2],
       [ 6. ,  2.7,  5.1,  1.6],
       [ 5.4,  3. ,  4.5,  1.5],
       [ 6. ,  3.4,  4.5,  1.6],
       [ 6.7,  3.1,  4.7,  1.5],
       [ 6.3,  2.3,  4.4,  1.3],
       [ 5.6,  3. ,  4.1,  1.3],
       [ 5.5,  2.5,  4. ,  1.3],
       [ 5.5,  2.6,  4.4,  1.2],
       [ 6.1,  3. ,  4.6,  1.4],
       [ 5.8,  2.6,  4. ,  1.2],
       [ 5. ,  2.3,  3.3,  1. ],
       [ 5.6,  2.7,  4.2,  1.3],
       [ 5.7,  3. ,  4.2,  1.2],
       [ 5.7,  2.9,  4.2,  1.3],
       [ 6.2,  2.9,  4.3,  1.3],
       [ 5.1,  2.5,  3. ,  1.1],
       [ 5.7,  2.8,  4.1,  1.3],
       [ 6.3,  3.3,  6. ,  2.5],
       [ 5.8,  2.7,  5.1,  1.9],
       [ 7.1,  3. ,  5.9,  2.1],
       [ 6.3,  2.9,  5.6,  1.8],
       [ 6.5,  3. ,  5.8,  2.2],
       [ 7.6,  3. ,  6.6,  2.1],
       [ 4.9,  2.5,  4.5,  1.7],
       [ 7.3,  2.9,  6.3,  1.8],
       [ 6.7,  2.5,  5.8,  1.8],
       [ 7.2,  3.6,  6.1,  2.5],
       [ 6.5,  3.2,  5.1,  2. ],
       [ 6.4,  2.7,  5.3,  1.9],
       [ 6.8,  3. ,  5.5,  2.1],
       [ 5.7,  2.5,  5. ,  2. ],
       [ 5.8,  2.8,  5.1,  2.4],
       [ 6.4,  3.2,  5.3,  2.3],
       [ 6.5,  3. ,  5.5,  1.8],
       [ 7.7,  3.8,  6.7,  2.2],
       [ 7.7,  2.6,  6.9,  2.3],
       [ 6. ,  2.2,  5. ,  1.5],
       [ 6.9,  3.2,  5.7,  2.3],
       [ 5.6,  2.8,  4.9,  2. ],
       [ 7.7,  2.8,  6.7,  2. ],
       [ 6.3,  2.7,  4.9,  1.8],
       [ 6.7,  3.3,  5.7,  2.1],
       [ 7.2,  3.2,  6. ,  1.8],
       [ 6.2,  2.8,  4.8,  1.8],
       [ 6.1,  3. ,  4.9,  1.8],
       [ 6.4,  2.8,  5.6,  2.1],
       [ 7.2,  3. ,  5.8,  1.6],
       [ 7.4,  2.8,  6.1,  1.9],
       [ 7.9,  3.8,  6.4,  2. ],
       [ 6.4,  2.8,  5.6,  2.2],
       [ 6.3,  2.8,  5.1,  1.5],
       [ 6.1,  2.6,  5.6,  1.4],
       [ 7.7,  3. ,  6.1,  2.3],
       [ 6.3,  3.4,  5.6,  2.4],
       [ 6.4,  3.1,  5.5,  1.8],
       [ 6. ,  3. ,  4.8,  1.8],
       [ 6.9,  3.1,  5.4,  2.1],
       [ 6.7,  3.1,  5.6,  2.4],
       [ 6.9,  3.1,  5.1,  2.3],
       [ 5.8,  2.7,  5.1,  1.9],
       [ 6.8,  3.2,  5.9,  2.3],
       [ 6.7,  3.3,  5.7,  2.5],
       [ 6.7,  3. ,  5.2,  2.3],
       [ 6.3,  2.5,  5. ,  1.9],
       [ 6.5,  3. ,  5.2,  2. ],
       [ 6.2,  3.4,  5.4,  2.3],
       [ 5.9,  3. ,  5.1,  1.8]])

基于树模型的特征选择法

树模型中GBDT也可用来作为基模型进行特征选择,使用feature_selection库的SelectFromModel类结合GBDT模型,来选择特征的代码如下:

from sklearn.feature_selection import SelectFromModel
from sklearn.ensemble import GradientBoostingClassifier

#GBDT作为基模型的特征选择
SelectFromModel(GradientBoostingClassifier()).fit_transform(iris.data, iris.target)
array([[ 1.4,  0.2],
       [ 1.4,  0.2],
       [ 1.3,  0.2],
       [ 1.5,  0.2],
       [ 1.4,  0.2],
       [ 1.7,  0.4],
       [ 1.4,  0.3],
       [ 1.5,  0.2],
       [ 1.4,  0.2],
       [ 1.5,  0.1],
       [ 1.5,  0.2],
       [ 1.6,  0.2],
       [ 1.4,  0.1],
       [ 1.1,  0.1],
       [ 1.2,  0.2],
       [ 1.5,  0.4],
       [ 1.3,  0.4],
       [ 1.4,  0.3],
       [ 1.7,  0.3],
       [ 1.5,  0.3],
       [ 1.7,  0.2],
       [ 1.5,  0.4],
       [ 1. ,  0.2],
       [ 1.7,  0.5],
       [ 1.9,  0.2],
       [ 1.6,  0.2],
       [ 1.6,  0.4],
       [ 1.5,  0.2],
       [ 1.4,  0.2],
       [ 1.6,  0.2],
       [ 1.6,  0.2],
       [ 1.5,  0.4],
       [ 1.5,  0.1],
       [ 1.4,  0.2],
       [ 1.5,  0.1],
       [ 1.2,  0.2],
       [ 1.3,  0.2],
       [ 1.5,  0.1],
       [ 1.3,  0.2],
       [ 1.5,  0.2],
       [ 1.3,  0.3],
       [ 1.3,  0.3],
       [ 1.3,  0.2],
       [ 1.6,  0.6],
       [ 1.9,  0.4],
       [ 1.4,  0.3],
       [ 1.6,  0.2],
       [ 1.4,  0.2],
       [ 1.5,  0.2],
       [ 1.4,  0.2],
       [ 4.7,  1.4],
       [ 4.5,  1.5],
       [ 4.9,  1.5],
       [ 4. ,  1.3],
       [ 4.6,  1.5],
       [ 4.5,  1.3],
       [ 4.7,  1.6],
       [ 3.3,  1. ],
       [ 4.6,  1.3],
       [ 3.9,  1.4],
       [ 3.5,  1. ],
       [ 4.2,  1.5],
       [ 4. ,  1. ],
       [ 4.7,  1.4],
       [ 3.6,  1.3],
       [ 4.4,  1.4],
       [ 4.5,  1.5],
       [ 4.1,  1. ],
       [ 4.5,  1.5],
       [ 3.9,  1.1],
       [ 4.8,  1.8],
       [ 4. ,  1.3],
       [ 4.9,  1.5],
       [ 4.7,  1.2],
       [ 4.3,  1.3],
       [ 4.4,  1.4],
       [ 4.8,  1.4],
       [ 5. ,  1.7],
       [ 4.5,  1.5],
       [ 3.5,  1. ],
       [ 3.8,  1.1],
       [ 3.7,  1. ],
       [ 3.9,  1.2],
       [ 5.1,  1.6],
       [ 4.5,  1.5],
       [ 4.5,  1.6],
       [ 4.7,  1.5],
       [ 4.4,  1.3],
       [ 4.1,  1.3],
       [ 4. ,  1.3],
       [ 4.4,  1.2],
       [ 4.6,  1.4],
       [ 4. ,  1.2],
       [ 3.3,  1. ],
       [ 4.2,  1.3],
       [ 4.2,  1.2],
       [ 4.2,  1.3],
       [ 4.3,  1.3],
       [ 3. ,  1.1],
       [ 4.1,  1.3],
       [ 6. ,  2.5],
       [ 5.1,  1.9],
       [ 5.9,  2.1],
       [ 5.6,  1.8],
       [ 5.8,  2.2],
       [ 6.6,  2.1],
       [ 4.5,  1.7],
       [ 6.3,  1.8],
       [ 5.8,  1.8],
       [ 6.1,  2.5],
       [ 5.1,  2. ],
       [ 5.3,  1.9],
       [ 5.5,  2.1],
       [ 5. ,  2. ],
       [ 5.1,  2.4],
       [ 5.3,  2.3],
       [ 5.5,  1.8],
       [ 6.7,  2.2],
       [ 6.9,  2.3],
       [ 5. ,  1.5],
       [ 5.7,  2.3],
       [ 4.9,  2. ],
       [ 6.7,  2. ],
       [ 4.9,  1.8],
       [ 5.7,  2.1],
       [ 6. ,  1.8],
       [ 4.8,  1.8],
       [ 4.9,  1.8],
       [ 5.6,  2.1],
       [ 5.8,  1.6],
       [ 6.1,  1.9],
       [ 6.4,  2. ],
       [ 5.6,  2.2],
       [ 5.1,  1.5],
       [ 5.6,  1.4],
       [ 6.1,  2.3],
       [ 5.6,  2.4],
       [ 5.5,  1.8],
       [ 4.8,  1.8],
       [ 5.4,  2.1],
       [ 5.6,  2.4],
       [ 5.1,  2.3],
       [ 5.1,  1.9],
       [ 5.9,  2.3],
       [ 5.7,  2.5],
       [ 5.2,  2.3],
       [ 5. ,  1.9],
       [ 5.2,  2. ],
       [ 5.4,  2.3],
       [ 5.1,  1.8]])

降维

PCA

使用decomposition库的PCA类选择特征的代码如下:

from sklearn.decomposition import PCA

#主成分分析法,返回降维后的数据
#参数n_components为主成分数目
PCA(n_components=2).fit_transform(iris.data)
array([[-2.68420713,  0.32660731],
       [-2.71539062, -0.16955685],
       [-2.88981954, -0.13734561],
       [-2.7464372 , -0.31112432],
       [-2.72859298,  0.33392456],
       [-2.27989736,  0.74778271],
       [-2.82089068, -0.08210451],
       [-2.62648199,  0.17040535],
       [-2.88795857, -0.57079803],
       [-2.67384469, -0.1066917 ],
       [-2.50652679,  0.65193501],
       [-2.61314272,  0.02152063],
       [-2.78743398, -0.22774019],
       [-3.22520045, -0.50327991],
       [-2.64354322,  1.1861949 ],
       [-2.38386932,  1.34475434],
       [-2.6225262 ,  0.81808967],
       [-2.64832273,  0.31913667],
       [-2.19907796,  0.87924409],
       [-2.58734619,  0.52047364],
       [-2.3105317 ,  0.39786782],
       [-2.54323491,  0.44003175],
       [-3.21585769,  0.14161557],
       [-2.30312854,  0.10552268],
       [-2.35617109, -0.03120959],
       [-2.50791723, -0.13905634],
       [-2.469056  ,  0.13788731],
       [-2.56239095,  0.37468456],
       [-2.63982127,  0.31929007],
       [-2.63284791, -0.19007583],
       [-2.58846205, -0.19739308],
       [-2.41007734,  0.41808001],
       [-2.64763667,  0.81998263],
       [-2.59715948,  1.10002193],
       [-2.67384469, -0.1066917 ],
       [-2.86699985,  0.0771931 ],
       [-2.62522846,  0.60680001],
       [-2.67384469, -0.1066917 ],
       [-2.98184266, -0.48025005],
       [-2.59032303,  0.23605934],
       [-2.77013891,  0.27105942],
       [-2.85221108, -0.93286537],
       [-2.99829644, -0.33430757],
       [-2.4055141 ,  0.19591726],
       [-2.20883295,  0.44269603],
       [-2.71566519, -0.24268148],
       [-2.53757337,  0.51036755],
       [-2.8403213 , -0.22057634],
       [-2.54268576,  0.58628103],
       [-2.70391231,  0.11501085],
       [ 1.28479459,  0.68543919],
       [ 0.93241075,  0.31919809],
       [ 1.46406132,  0.50418983],
       [ 0.18096721, -0.82560394],
       [ 1.08713449,  0.07539039],
       [ 0.64043675, -0.41732348],
       [ 1.09522371,  0.28389121],
       [-0.75146714, -1.00110751],
       [ 1.04329778,  0.22895691],
       [-0.01019007, -0.72057487],
       [-0.5110862 , -1.26249195],
       [ 0.51109806, -0.10228411],
       [ 0.26233576, -0.5478933 ],
       [ 0.98404455, -0.12436042],
       [-0.174864  , -0.25181557],
       [ 0.92757294,  0.46823621],
       [ 0.65959279, -0.35197629],
       [ 0.23454059, -0.33192183],
       [ 0.94236171, -0.54182226],
       [ 0.0432464 , -0.58148945],
       [ 1.11624072, -0.08421401],
       [ 0.35678657, -0.06682383],
       [ 1.29646885, -0.32756152],
       [ 0.92050265, -0.18239036],
       [ 0.71400821,  0.15037915],
       [ 0.89964086,  0.32961098],
       [ 1.33104142,  0.24466952],
       [ 1.55739627,  0.26739258],
       [ 0.81245555, -0.16233157],
       [-0.30733476, -0.36508661],
       [-0.07034289, -0.70253793],
       [-0.19188449, -0.67749054],
       [ 0.13499495, -0.31170964],
       [ 1.37873698, -0.42120514],
       [ 0.58727485, -0.48328427],
       [ 0.8072055 ,  0.19505396],
       [ 1.22042897,  0.40803534],
       [ 0.81286779, -0.370679  ],
       [ 0.24519516, -0.26672804],
       [ 0.16451343, -0.67966147],
       [ 0.46303099, -0.66952655],
       [ 0.89016045, -0.03381244],
       [ 0.22887905, -0.40225762],
       [-0.70708128, -1.00842476],
       [ 0.35553304, -0.50321849],
       [ 0.33112695, -0.21118014],
       [ 0.37523823, -0.29162202],
       [ 0.64169028,  0.01907118],
       [-0.90846333, -0.75156873],
       [ 0.29780791, -0.34701652],
       [ 2.53172698, -0.01184224],
       [ 1.41407223, -0.57492506],
       [ 2.61648461,  0.34193529],
       [ 1.97081495, -0.18112569],
       [ 2.34975798, -0.04188255],
       [ 3.39687992,  0.54716805],
       [ 0.51938325, -1.19135169],
       [ 2.9320051 ,  0.35237701],
       [ 2.31967279, -0.24554817],
       [ 2.91813423,  0.78038063],
       [ 1.66193495,  0.2420384 ],
       [ 1.80234045, -0.21615461],
       [ 2.16537886,  0.21528028],
       [ 1.34459422, -0.77641543],
       [ 1.5852673 , -0.53930705],
       [ 1.90474358,  0.11881899],
       [ 1.94924878,  0.04073026],
       [ 3.48876538,  1.17154454],
       [ 3.79468686,  0.25326557],
       [ 1.29832982, -0.76101394],
       [ 2.42816726,  0.37678197],
       [ 1.19809737, -0.60557896],
       [ 3.49926548,  0.45677347],
       [ 1.38766825, -0.20403099],
       [ 2.27585365,  0.33338653],
       [ 2.61419383,  0.55836695],
       [ 1.25762518, -0.179137  ],
       [ 1.29066965, -0.11642525],
       [ 2.12285398, -0.21085488],
       [ 2.3875644 ,  0.46251925],
       [ 2.84096093,  0.37274259],
       [ 3.2323429 ,  1.37052404],
       [ 2.15873837, -0.21832553],
       [ 1.4431026 , -0.14380129],
       [ 1.77964011, -0.50146479],
       [ 3.07652162,  0.68576444],
       [ 2.14498686,  0.13890661],
       [ 1.90486293,  0.04804751],
       [ 1.16885347, -0.1645025 ],
       [ 2.10765373,  0.37148225],
       [ 2.31430339,  0.18260885],
       [ 1.92245088,  0.40927118],
       [ 1.41407223, -0.57492506],
       [ 2.56332271,  0.2759745 ],
       [ 2.41939122,  0.30350394],
       [ 1.94401705,  0.18741522],
       [ 1.52566363, -0.37502085],
       [ 1.76404594,  0.07851919],
       [ 1.90162908,  0.11587675],
       [ 1.38966613, -0.28288671]])

线性判别分析法(LDA)

使用lda库的LDA类选择特征的代码如下:

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA

#线性判别分析法,返回降维后的数据
#参数n_components为降维后的维数
LDA(n_components=2).fit_transform(iris.data, iris.target)
array([[ 8.0849532 ,  0.32845422],
       [ 7.1471629 , -0.75547326],
       [ 7.51137789, -0.23807832],
       [ 6.83767561, -0.64288476],
       [ 8.15781367,  0.54063935],
       [ 7.72363087,  1.48232345],
       [ 7.23514662,  0.3771537 ],
       [ 7.62974497,  0.01667246],
       [ 6.58274132, -0.98737424],
       [ 7.36884116, -0.91362729],
       [ 8.42181434,  0.67622968],
       [ 7.24739721, -0.08292417],
       [ 7.35062105, -1.0393597 ],
       [ 7.59646896, -0.77671553],
       [ 9.86936588,  1.61486093],
       [ 9.18033614,  2.75558626],
       [ 8.59760709,  1.85442217],
       [ 7.7995682 ,  0.60905468],
       [ 8.1000091 ,  0.99610981],
       [ 8.04543611,  1.16244332],
       [ 7.52046427, -0.156233  ],
       [ 7.60526378,  1.22757267],
       [ 8.70408249,  0.89959416],
       [ 6.26374139,  0.46023935],
       [ 6.59191505, -0.36199821],
       [ 6.79210164, -0.93823664],
       [ 6.84048091,  0.4848487 ],
       [ 7.948386  ,  0.23871551],
       [ 8.01209273,  0.11626909],
       [ 6.85589572, -0.51715236],
       [ 6.78303525, -0.72933749],
       [ 7.38668238,  0.59101728],
       [ 9.16249492,  1.25094169],
       [ 9.49617185,  1.84989586],
       [ 7.36884116, -0.91362729],
       [ 7.9756525 , -0.13519572],
       [ 8.63115466,  0.4346228 ],
       [ 7.36884116, -0.91362729],
       [ 6.95602269, -0.67887846],
       [ 7.71167183,  0.01995843],
       [ 7.9361354 ,  0.69879338],
       [ 5.6690533 , -1.90328976],
       [ 7.26559733, -0.24793625],
       [ 6.42449823,  1.26152073],
       [ 6.88607488,  1.07094506],
       [ 6.77985104, -0.47815878],
       [ 8.11232705,  0.78881818],
       [ 7.21095698, -0.33438897],
       [ 8.33988749,  0.6729437 ],
       [ 7.69345171, -0.10577397],
       [-1.45772244,  0.04186554],
       [-1.79768044,  0.48879951],
       [-2.41680973, -0.08234044],
       [-2.26486771, -1.57609174],
       [-2.55339693, -0.46282362],
       [-2.41954768, -0.95728766],
       [-2.44719309,  0.79553574],
       [-0.2160281 , -1.57096512],
       [-1.74591275, -0.80526746],
       [-1.95838993, -0.35044011],
       [-1.19023864, -2.61561292],
       [-1.86140718,  0.32050146],
       [-1.15386577, -2.61693435],
       [-2.65942607, -0.63412155],
       [-0.38024071,  0.09211958],
       [-1.20280815,  0.09561055],
       [-2.7626699 ,  0.03156949],
       [-0.76227692, -1.63917546],
       [-3.50940735, -1.6724835 ],
       [-1.08410216, -1.6100398 ],
       [-3.71895188,  1.03509697],
       [-0.99937   , -0.47902036],
       [-3.83709476, -1.39488292],
       [-2.24344339, -1.41079358],
       [-1.25428429, -0.53276537],
       [-1.43952232, -0.12314653],
       [-2.45921948, -0.91961551],
       [-3.52471481,  0.16379275],
       [-2.58974981, -0.17075771],
       [ 0.31197324, -1.29978446],
       [-1.10232227, -1.7357722 ],
       [-0.59844322, -1.92334798],
       [-0.89605882, -0.89192518],
       [-4.49567379, -0.87924754],
       [-2.9265236 ,  0.02499754],
       [-2.10119821,  1.18719828],
       [-2.14367532,  0.09713697],
       [-2.48342912, -1.92190266],
       [-1.31792367, -0.15753271],
       [-1.95529307, -1.14514953],
       [-2.38909697, -1.5823776 ],
       [-2.28614469, -0.32562577],
       [-1.26934019, -1.20042096],
       [-0.28888857, -1.78315025],
       [-2.00077969, -0.8969707 ],
       [-1.16910587, -0.52787187],
       [-1.6092782 , -0.46274252],
       [-1.41813799, -0.53933732],
       [ 0.47271009, -0.78924756],
       [-1.54557146, -0.58518894],
       [-7.85608083,  2.11161905],
       [-5.5156825 , -0.04401811],
       [-6.30499392,  0.46211638],
       [-5.60355888, -0.34236987],
       [-6.86344597,  0.81602566],
       [-7.42481805, -0.1726265 ],
       [-4.68086447, -0.50758694],
       [-6.31374875, -0.96068288],
       [-6.33198886, -1.37715975],
       [-6.87287126,  2.69458147],
       [-4.45364294,  1.33693971],
       [-5.4611095 , -0.21035161],
       [-5.67679825,  0.82435717],
       [-5.97407494, -0.10462115],
       [-6.78782019,  1.5744553 ],
       [-5.82871291,  1.98940576],
       [-5.0664238 , -0.02730214],
       [-6.60847169,  1.7420041 ],
       [-9.18829265, -0.74909806],
       [-4.76573133, -2.14417884],
       [-6.29305487,  1.63373692],
       [-5.37314577,  0.63153087],
       [-7.58557489, -0.97390788],
       [-4.38367513, -0.12213933],
       [-5.73135125,  1.28143515],
       [-5.27583147, -0.0384815 ],
       [-4.0923206 ,  0.18307048],
       [-4.08316687,  0.51770204],
       [-6.53257435,  0.28724638],
       [-4.577648  , -0.84457527],
       [-6.23500611, -0.70621819],
       [-5.21836582,  1.46644917],
       [-6.81795935,  0.56784684],
       [-3.80972091, -0.93451896],
       [-5.09023453, -2.11775698],
       [-6.82119092,  0.85698379],
       [-6.54193229,  2.41858841],
       [-4.99356333,  0.18488299],
       [-3.94659967,  0.60744074],
       [-5.22159002,  1.13613893],
       [-6.67858684,  1.785319  ],
       [-5.13687786,  1.97641389],
       [-5.5156825 , -0.04401811],
       [-6.81196984,  1.44440158],
       [-6.87289126,  2.40383699],
       [-5.67401294,  1.66134615],
       [-5.19712883, -0.36550576],
       [-4.98171163,  0.81297282],
       [-5.90148603,  2.32075134],
       [-4.68400868,  0.32508073]])
鸢尾花数据集
城东 - 知乎