Reducción de dimensionalidad usando SelectFromModel()#
Ultima modificación: 2023-03-11 | YouTube
Modelos lineales#
Los modelos lineales penalizados con una norma L1 tienen a hacer muchos de los coeficientes de las características iguales a cero, por lo que pueden ser usados para la reducción de la dimensionalidad de los datos (selección de variables). Se recomiendan los siguientes tipos de modelos:
Lasso()
LogisticRegress()
LinearSVC()
[1]:
from sklearn.datasets import load_iris
X, y = load_iris(return_X_y=True)
X.shape
[1]:
(150, 4)
[2]:
from sklearn.feature_selection import SelectFromModel
from sklearn.svm import LinearSVC
#
# Crea y entrena un estimador
#
linearSVC = LinearSVC(
C=0.01,
penalty="l1",
dual=False,
max_iter=10000,
)
linearSVC.fit(X, y)
#
# Selector
#
model = SelectFromModel(
# -------------------------------------------------------------------------
# The base estimator from which the transformer is built. This can be both
# a fitted (if prefit is set to True) or a non-fitted estimator.
estimator=linearSVC,
# -------------------------------------------------------------------------
# The threshold value to use for feature selection. Features whose
# importance is greater or equal are kept while the others are discarded.
# * float.
# * "median": the threshold value is the median of feature importances.
# * "mean": the threshold value is the mean of feature importances.
# * "1.25*mean": a scaling factor
# * None: if penality is L1, then threshold is 1e-5, otherwise "mean"
threshold=None,
# -------------------------------------------------------------------------
# Whether a prefit model is expected to be passed into the constructor
# directly or not.
prefit=True,
# -------------------------------------------------------------------------
# Order of the norm used to filter the vectors of coefficients below
# threshold in the case where the coef_ attribute of the estimator is of
# dimension 2.
norm_order=1,
# -------------------------------------------------------------------------
# The maximum number of features to select.
max_features=None,
)
X_new = model.transform(X)
X_new.shape
[2]:
(150, 3)
Usando árboles#
[3]:
from sklearn.ensemble import ExtraTreesClassifier
treeClassifier = ExtraTreesClassifier(n_estimators=50)
treeClassifier = treeClassifier.fit(X, y)
treeClassifier.feature_importances_
[3]:
array([0.07210089, 0.04735931, 0.42085408, 0.45968572])
[4]:
from sklearn.feature_selection import SelectFromModel
model = SelectFromModel(
estimator=treeClassifier,
prefit=True,
)
X_new = model.transform(X)
X_new.shape
[4]:
(150, 2)