SimpleImputer#
Imputador univariado que usa el completado de valores faltantes con estrategias simples.
Permite reemplazar los valores faltantes con un estadistico descriptivo (media, mediana, valor más frecuente o constante).
[1]:
import numpy as np
X_train = [
[1, 2],
[np.nan, 3],
[7, 6],
]
X_test = [
[np.nan, 2],
[6, np.nan],
[7, 6],
[4, np.nan],
]
[2]:
from sklearn.impute import SimpleImputer
simpleImputer = SimpleImputer(
# -------------------------------------------------------------------------
# The placeholder for the missing values.
missing_values=np.nan,
# -------------------------------------------------------------------------
# The imputation strategy.
# - If “mean”, then replace missing values using the mean along each column
# - If “median”, then replace missing values using the median along each
# column.
# - If “most_frequent”, then replace missing using the most frequent value
# along each column.
# - If “constant”, then replace missing values with fill_value.
strategy="mean",
# -------------------------------------------------------------------------
# When strategy == “constant”, fill_value is used to replace all
# occurrences of missing_values.
fill_value=None,
)
simpleImputer.fit(X_train)
simpleImputer.transform(X_test)
[2]:
array([[4. , 2. ],
[6. , 3.66666667],
[7. , 6. ],
[4. , 3.66666667]])
[3]:
simpleImputer.statistics_
[3]:
array([4. , 3.66666667])