
  • Imputador multivariado que estima cada característica de las otras.

  • Para cada paso, una columna es designada como y y las demás columnas como X. Entonces, un regresor es ajustado para (X, y).

  • El regresor obtenido para cada característica es usado para imputar los valores faltantes.

  • El proceso es realizado de forma iterativa en max_iter rondas de imputación.

import numpy as np

X_train = np.array(
        [1, 2],
        [3, 6],
        [4, 8],
        [np.nan, 3],
        [7, np.nan],

X_test = np.array(
        [np.nan, 2],
        [6, np.nan],
        [np.nan, 6],
from sklearn.linear_model import LinearRegression

#     X_train
# ----------------
# [
#     [1, 2],       media columna 1: (1 + 3 + 4 + 7) / 4 = 3.75
#     [3, 6],
#     [4, 8],       media columna 2: (2 + 6 + 8 + 3) / 4 = 4.75
#     [np.nan, 3],
#     [7, np.nan],
# ]

X_train = np.array(
        [1, 2],
        [3, 6],
        [4, 8],
        [3.75, 3],
        [7, 4.75],

for i in range(100):
    m = LinearRegression()

    # completado de la columna 1
        X_train[:, 0].reshape(-1, 1),
        X_train[:, 1],

    X_train[4, 1] = m.predict(
        X_train[4, 0].reshape(-1, 1),

    # completado de la columna 0
        X_train[:, 1].reshape(-1, 1),
        X_train[:, 0],

    X_train[3, 0] = m.predict(
        X_train[3, 1].reshape(-1, 1),

array([[ 1.        ,  2.        ],
       [ 3.        ,  6.        ],
       [ 4.        ,  8.        ],
       [ 1.5       ,  3.        ],
       [ 7.        , 13.99999999]])
# X_test = np.array(
#     [
#         [np.nan, 2],
#         [6, np.nan],
#         [np.nan, 6],
#     ]
# )

m = LinearRegression()[:, 0].reshape(-1, 1), X_train[:, 1])
X_test[1, 1] = m.predict([[6]])[0][:, 1].reshape(-1, 1), X_train[:, 0])
X_test[0, 0] = m.predict([[2]])[0]
X_test[2, 0] = m.predict([[6]])[0]

array([[ 1.        ,  2.        ],
       [ 6.        , 11.99999999],
       [ 3.        ,  6.        ]])
from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import IterativeImputer
from sklearn.linear_model import LinearRegression

iterativeImputer = IterativeImputer(
    # -------------------------------------------------------------------------
    # The estimator to use at each step of the round-robin imputation
    # -------------------------------------------------------------------------
    # The placeholder for the missing values.
    # -------------------------------------------------------------------------
    # Whether to sample from the (Gaussian) predictive posterior of the fitted
    # estimator for each imputation.
    # -------------------------------------------------------------------------
    # Maximum number of imputation rounds to perform before returning the
    # imputations computed during the final round.  A round is a single
    # imputation of each feature with missing values.
    # -------------------------------------------------------------------------
    # Number of other features to use to estimate the missing values of each
    # feature column. Nearness between features is measured using the absolute
    # correlation coefficient between each feature pair (after initial
    # imputation).
    # -------------------------------------------------------------------------
    # Which strategy to use to initialize the missing values.
    # - "mean"
    # - "median"
    # - "most_frequent"
    # - "constant"
    # -------------------------------------------------------------------------
    # The order in which the features will be imputed. Possible values:
    # - "ascending": From features with fewest missing values to most.
    # - "decending": From features with most missing values to fewest.
    # - "roman": Left to right.
    # - "arabic": Right to left.
    # - "random": A random order for each round.
    # -------------------------------------------------------------------------
    # If True then features with missing values during transform which did not
    # have any missing values during fit will be imputed with the initial
    # imputation method only.
    # -------------------------------------------------------------------------
    # Minimum possible imputed value.
    # -------------------------------------------------------------------------
    # Maximum possible imputed value.
    # -------------------------------------------------------------------------
    # The seed of the pseudo random number generator to use.

array([[ 1.        ,  2.        ],
       [ 6.        , 11.99999999],
       [ 3.        ,  6.        ]])