sklearn.preprocessing.OrdinalEncoder(*, categories='auto', dtype=, handle_unknown='error', unknown_value=None, encoded_missing_value=nan)
将分类特征转化为整数数组
编码器的输入应该是以整数或字符串为元素的类数组,表示由分类的(离散的)特征所获得的值,这些特征被转换为序列整数,这将导致每个特征产生一个整数列
The input to this transformer should be an array-like of integers or strings, denoting the values taken on by categorical (discrete) features. The features are converted to ordinal integers. This results in a single column of integers (0 to n_categories - 1) per feature
‘auto’ or a list of array-like, default=’auto’
参数可选值 | |
---|---|
auto | 根据数据选择编码规则 |
list | categories[i] 保存第i列中期望的类别 |
number type, default np.float64
期望的输出数据类型
{‘error’, ‘use_encoded_value’}, default=’error’
当被设置为error时,当transform过程中遇到未知分类特征时将会抛出一个错误
int or np.nan, default=None
当参数handle_unknown
被设置为use_encoded_value
时,该参数是必须的
int or np.nan, default=np.nan
缺失类别的编码值。如果设置为np.Nan,那么参数dtype
必须是浮点型
list of arrays
在拟合过程中确定每个特征的类别
The categories of each feature determined during fit (in order of the features in X and corresponding with the output of transform). This does not include categories that weren’t seen during fit.
int
拟合过程中的特征数量
ndarray of shape (n_features_in_,)
拟合过程中的特征名称
Names of features seen during fit. Defined only when X has feature names that are all strings.
拟合数据
Fit the OrdinalEncoder to X.
拟合数据并进行转换
Fit to data, then transform it.
返回输出特征名称
Get output feature names for transformation.
返回模型参数
Get parameters for this estimator.
还原数据
Convert the data back to the original representation.
设置模型参数
Set the parameters of this estimator.
转换数据为序列代码
Transform X to ordinal codes.
from sklearn.preprocessing import OrdinalEncoder
encoder = OrdinalEncoder()
x = [['Male', 1], ['Female', 3], ['Female', 2]]
x_transform=encoder.fit_transform(x)
x_transform
>>> array([[1., 0.],[0., 2.],[0., 1.]])
encoder.inverse_transform(x_transform)
>>>array([['Male', 1],['Female', 3],['Female', 2]], dtype=object)