TheilSenRegressor#
Ajusta un modelo lineal a una muestra de datos escogiendo la mediana de las pendientes entre pares de puntos.
Es un método robusto a outliers, pero su robustez decrece al aumentar la cantidad de dimensiones.
En sklearn, se utiliza una mediana espacial como una generalización de la mediana en múltiples dimensiones.
[1]:
from sklearn.datasets import make_regression
X, y = make_regression(n_samples=200, n_features=2, noise=4.0, random_state=0)
[2]:
from sklearn.linear_model import TheilSenRegressor
estimator = TheilSenRegressor(
# -------------------------------------------------------------------------
# Whether or not to fit the intercept.
fit_intercept=True,
# -------------------------------------------------------------------------
# Instead of computing with a set of cardinality ‘n choose k’, where n is
# the number of samples and k is the number of subsamples (at least number
# of features), consider only a stochastic subpopulation of a given maximal
# size if ‘n choose k’ is larger than max_subpopulation.
max_subpopulation=1e4,
# -------------------------------------------------------------------------
# Number of samples to calculate the parameters. This is at least the
# number of features (plus 1 if fit_intercept=True) and the number of
# samples as a maximum.
n_subsamples=None,
# -------------------------------------------------------------------------
# Maximum number of iterations for the calculation of spatial median.
max_iter=300,
# -------------------------------------------------------------------------
# Tolerance when calculating spatial median.
tol=1e-3,
# -------------------------------------------------------------------------
# A random number generator instance to define the state of the random
# permutations generator. Pass an int for reproducible output across
# multiple function calls.
random_state=None,
)
[3]:
estimator.fit(X, y)
[3]:
TheilSenRegressor()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
TheilSenRegressor()
[4]:
estimator.coef_
[4]:
array([20.33155521, 34.11974129])
[5]:
estimator.intercept_
[5]:
-0.5003121773453673