当前位置:   article > 正文

sklearn之Pipeline 估计器_pipeline 视差估计

pipeline 视差估计

Pipeline

Pipeline 将若干个估计器按顺序连在一起,比如

特征提取 -> 降维 -> 拟合 -> 预测

在整个 Pipeline 中,它的属性永远和最后一个估计器属性一样

  • 如果最后一个估计器是预测器,那么 Pipeline 是预测器

  • 如果最后一个估计器是转换器,那么 Pipeline 是转换器

 

pip作为转换器测试:

  1. import numpy as np
  2. from sklearn.pipeline import Pipeline
  3. from sklearn.impute import SimpleImputer
  4. from sklearn.preprocessing import MinMaxScaler
  5. a=np.array([[1,2,3,4,5,6,np.NAN,5],[3,4,5,6,np.NAN,3,np.NAN,9]])
  6. X=np.transpose(a)#转换
  7. print(X)
  8. #impleImputer 起名叫 impute,MinMaxScaler起名叫 normalize。
  9. pipp=Pipeline([("impute",SimpleImputer(missing_values=np.nan,strategy="mean")),("normalize",MinMaxScaler())])
  10. #因为这是转换器,所以pipp也是转换器
  11. X_pro=pipp.fit_transform(X)
  12. print(X_pro)
  13. #单独尝试一下
  14. aa=SimpleImputer(missing_values=np.nan,strategy="mean").fit_transform(X)
  15. mms=MinMaxScaler().fit_transform(aa)
  16. print(mms)#结果和上面的是一样的

测试结果:

  1. F:\开发工具\pythonProject\tools\venv\Scripts\python.exe F:/开发工具/pythonProject/tools/python的sklear学习/sklearn07.py
  2. [[ 1. 3.]
  3. [ 2. 4.]
  4. [ 3. 5.]
  5. [ 4. 6.]
  6. [ 5. nan]
  7. [ 6. 3.]
  8. [nan nan]
  9. [ 5. 9.]]
  10. [[0. 0. ]
  11. [0.2 0.16666667]
  12. [0.4 0.33333333]
  13. [0.6 0.5 ]
  14. [0.8 0.33333333]
  15. [1. 0. ]
  16. [0.54285714 0.33333333]
  17. [0.8 1. ]]
  18. Process finished with exit code 0

 

FeatureUnion

如果我们想在一个节点同时运行几个估计器,我们可用 FeatureUnion

策略:

  • 对分类型变量:获取 -> 中位数填充 -> 独热编码

  • 对数值型变量:获取 -> 均值填充 -> 标准化

主要就是  transform  函数中,将输入的  DataFrame  X 根据属性名称来获取其值。

接下来建立一个流水线 full_pipe,它并联着两个流水线

categorical_pipe 处理分类型变量

DataFrameSelector 用来获取

SimpleImputer 用出现最多的值来填充 None

OneHotEncoder 来编码返回非稀疏矩阵

numeric_pipe 处理数值型变量

DataFrameSelector 用来获取

SimpleImputer 用均值来填充 NaN

normalize 来规范化数值

  1. import pandas as pd
  2. import numpy as np
  3. from sklearn.pipeline import Pipeline
  4. from sklearn.pipeline import FeatureUnion
  5. from sklearn.impute import SimpleImputer
  6. from sklearn.preprocessing import MinMaxScaler
  7. from sklearn.preprocessing import OneHotEncoder
  8. from sklearn.base import BaseEstimator,TransformerMixin
  9. class DataFrameSelector(BaseEstimator,TransformerMixin):
  10. def __init__(self,attribute_names):
  11. self.attribute_names=attribute_names
  12. def fit(self,X,y=None):
  13. return self
  14. def transform(self,X,y=None):
  15. return X[self.attribute_names].values
  16. #创建一个字典
  17. fe={"height":[1.67,1.89,np.NAN,1.66,1.88,np.NAN],
  18. "weight":[56,78,92,np.NAN,78,92],
  19. "age":[26,34,18,34,25,27],
  20. "love":["apple","origine","piss","loss","good",None]
  21. }
  22. X=pd.DataFrame(fe)
  23. categorical_feature=["love"]
  24. numeric_feature=["height","age","weight"]
  25. categorical_pipe=Pipeline([
  26. ("select",DataFrameSelector(categorical_feature)),
  27. ("impute",SimpleImputer(missing_values=None,strategy="most_frequent")),
  28. ("one_hot_encode",OneHotEncoder(sparse=False))
  29. ])
  30. numeric_pipe=Pipeline([
  31. ("select",DataFrameSelector(numeric_feature)),
  32. ("impute",SimpleImputer(missing_values=np.nan,strategy="mean")),
  33. ("normalize",MinMaxScaler())
  34. ])
  35. full_pipe=FeatureUnion(transformer_list=[
  36. ("numeric_pipe",numeric_pipe),
  37. ("categorical_pipe",categorical_pipe)
  38. ])
  39. x_pro=full_pipe.fit_transform(X)
  40. print(x_pro)

测试结果:

  1. F:\开发工具\pythonProject\tools\venv\Scripts\python.exe F:/开发工具/pythonProject/tools/python的sklear学习/sklearn08.py
  2. [[0.04347826 0.5 0. 1. 0. 0.
  3. 0. 0. ]
  4. [1. 1. 0.61111111 0. 0. 0.
  5. 1. 0. ]
  6. [0.5 0. 1. 0. 0. 0.
  7. 0. 1. ]
  8. [0. 1. 0.64444444 0. 0. 1.
  9. 0. 0. ]
  10. [0.95652174 0.4375 0.61111111 0. 1. 0.
  11. 0. 0. ]
  12. [0.5 0.5625 1. 1. 0. 0.
  13. 0. 0. ]]
  14. Process finished with exit code 0

 

声明:本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:【wpsshop博客】
推荐阅读
相关标签
  

闽ICP备14008679号