赞
踩
Pipeline
Pipeline 将若干个估计器按顺序连在一起,比如
特征提取 -> 降维 -> 拟合 -> 预测
在整个 Pipeline 中,它的属性永远和最后一个估计器属性一样
如果最后一个估计器是预测器,那么 Pipeline 是预测器
如果最后一个估计器是转换器,那么 Pipeline 是转换器
pip作为转换器测试:
- import numpy as np
- from sklearn.pipeline import Pipeline
- from sklearn.impute import SimpleImputer
- from sklearn.preprocessing import MinMaxScaler
-
- a=np.array([[1,2,3,4,5,6,np.NAN,5],[3,4,5,6,np.NAN,3,np.NAN,9]])
- X=np.transpose(a)#转换
- print(X)
-
- #impleImputer 起名叫 impute,MinMaxScaler起名叫 normalize。
- pipp=Pipeline([("impute",SimpleImputer(missing_values=np.nan,strategy="mean")),("normalize",MinMaxScaler())])
-
- #因为这是转换器,所以pipp也是转换器
- X_pro=pipp.fit_transform(X)
-
- print(X_pro)
-
- #单独尝试一下
- aa=SimpleImputer(missing_values=np.nan,strategy="mean").fit_transform(X)
- mms=MinMaxScaler().fit_transform(aa)
-
- print(mms)#结果和上面的是一样的
-
-

测试结果:
- F:\开发工具\pythonProject\tools\venv\Scripts\python.exe F:/开发工具/pythonProject/tools/python的sklear学习/sklearn07.py
- [[ 1. 3.]
- [ 2. 4.]
- [ 3. 5.]
- [ 4. 6.]
- [ 5. nan]
- [ 6. 3.]
- [nan nan]
- [ 5. 9.]]
- [[0. 0. ]
- [0.2 0.16666667]
- [0.4 0.33333333]
- [0.6 0.5 ]
- [0.8 0.33333333]
- [1. 0. ]
- [0.54285714 0.33333333]
- [0.8 1. ]]
-
- Process finished with exit code 0

FeatureUnion
如果我们想在一个节点同时运行几个估计器,我们可用 FeatureUnion
策略:
对分类型变量:获取 -> 中位数填充 -> 独热编码
对数值型变量:获取 -> 均值填充 -> 标准化
主要就是 transform 函数中,将输入的 DataFrame X 根据属性名称来获取其值。
接下来建立一个流水线 full_pipe,它并联着两个流水线
categorical_pipe 处理分类型变量
DataFrameSelector 用来获取
SimpleImputer 用出现最多的值来填充 None
OneHotEncoder 来编码返回非稀疏矩阵
numeric_pipe 处理数值型变量
DataFrameSelector 用来获取
SimpleImputer 用均值来填充 NaN
normalize 来规范化数值
- import pandas as pd
- import numpy as np
- from sklearn.pipeline import Pipeline
- from sklearn.pipeline import FeatureUnion
- from sklearn.impute import SimpleImputer
- from sklearn.preprocessing import MinMaxScaler
- from sklearn.preprocessing import OneHotEncoder
- from sklearn.base import BaseEstimator,TransformerMixin
-
- class DataFrameSelector(BaseEstimator,TransformerMixin):
- def __init__(self,attribute_names):
- self.attribute_names=attribute_names
- def fit(self,X,y=None):
- return self
- def transform(self,X,y=None):
- return X[self.attribute_names].values
-
- #创建一个字典
- fe={"height":[1.67,1.89,np.NAN,1.66,1.88,np.NAN],
- "weight":[56,78,92,np.NAN,78,92],
- "age":[26,34,18,34,25,27],
- "love":["apple","origine","piss","loss","good",None]
- }
- X=pd.DataFrame(fe)
- categorical_feature=["love"]
- numeric_feature=["height","age","weight"]
-
- categorical_pipe=Pipeline([
- ("select",DataFrameSelector(categorical_feature)),
- ("impute",SimpleImputer(missing_values=None,strategy="most_frequent")),
- ("one_hot_encode",OneHotEncoder(sparse=False))
- ])
-
- numeric_pipe=Pipeline([
- ("select",DataFrameSelector(numeric_feature)),
- ("impute",SimpleImputer(missing_values=np.nan,strategy="mean")),
- ("normalize",MinMaxScaler())
- ])
-
- full_pipe=FeatureUnion(transformer_list=[
- ("numeric_pipe",numeric_pipe),
- ("categorical_pipe",categorical_pipe)
- ])
- x_pro=full_pipe.fit_transform(X)
- print(x_pro)

测试结果:
- F:\开发工具\pythonProject\tools\venv\Scripts\python.exe F:/开发工具/pythonProject/tools/python的sklear学习/sklearn08.py
- [[0.04347826 0.5 0. 1. 0. 0.
- 0. 0. ]
- [1. 1. 0.61111111 0. 0. 0.
- 1. 0. ]
- [0.5 0. 1. 0. 0. 0.
- 0. 1. ]
- [0. 1. 0.64444444 0. 0. 1.
- 0. 0. ]
- [0.95652174 0.4375 0.61111111 0. 1. 0.
- 0. 0. ]
- [0.5 0.5625 1. 1. 0. 0.
- 0. 0. ]]
-
- Process finished with exit code 0
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。