SimpleImputer#

  • Imputador univariado que usa el completado de valores faltantes con estrategias simples.

  • Permite reemplazar los valores faltantes con un estadistico descriptivo (media, mediana, valor más frecuente o constante).

[1]:
import numpy as np

X_train = [
    [1, 2],
    [np.nan, 3],
    [7, 6],
]

X_test = [
    [np.nan, 2],
    [6, np.nan],
    [7, 6],
    [4, np.nan],
]
[2]:
from sklearn.impute import SimpleImputer

simpleImputer = SimpleImputer(
    # -------------------------------------------------------------------------
    # The placeholder for the missing values.
    missing_values=np.nan,
    # -------------------------------------------------------------------------
    # The imputation strategy.
    # - If “mean”, then replace missing values using the mean along each column
    # - If “median”, then replace missing values using the median along each
    #   column.
    # - If “most_frequent”, then replace missing using the most frequent value
    #   along each column.
    # - If “constant”, then replace missing values with fill_value.
    strategy="mean",
    # -------------------------------------------------------------------------
    # When strategy == “constant”, fill_value is used to replace all
    # occurrences of missing_values.
    fill_value=None,
)

simpleImputer.fit(X_train)

simpleImputer.transform(X_test)
[2]:
array([[4.        , 2.        ],
       [6.        , 3.66666667],
       [7.        , 6.        ],
       [4.        , 3.66666667]])
[3]:
simpleImputer.statistics_
[3]:
array([4.        , 3.66666667])