Reducción de dimensionalidad usando SelectFromModel()#

Ultima modificación: 2023-03-11 | YouTube

Modelos lineales#

Los modelos lineales penalizados con una norma L1 tienen a hacer muchos de los coeficientes de las características iguales a cero, por lo que pueden ser usados para la reducción de la dimensionalidad de los datos (selección de variables). Se recomiendan los siguientes tipos de modelos:

Lasso()
LogisticRegress()
LinearSVC()

[1]:

from sklearn.datasets import load_iris

X, y = load_iris(return_X_y=True)
X.shape

[1]:

(150, 4)

[2]:

from sklearn.feature_selection import SelectFromModel
from sklearn.svm import LinearSVC

#
# Crea y entrena un estimador
#
linearSVC = LinearSVC(
    C=0.01,
    penalty="l1",
    dual=False,
    max_iter=10000,
)

linearSVC.fit(X, y)

#
# Selector
#
model = SelectFromModel(
    # -------------------------------------------------------------------------
    # The base estimator from which the transformer is built. This can be both
    # a fitted (if prefit is set to True) or a non-fitted estimator.
    estimator=linearSVC,
    # -------------------------------------------------------------------------
    # The threshold value to use for feature selection. Features whose
    # importance is greater or equal are kept while the others are discarded.
    # * float.
    # * "median": the threshold value is the median of feature importances.
    # * "mean": the threshold value is the mean of feature importances.
    # * "1.25*mean": a scaling factor
    # * None: if penality is L1, then threshold is 1e-5, otherwise "mean"
    threshold=None,
    # -------------------------------------------------------------------------
    # Whether a prefit model is expected to be passed into the constructor
    # directly or not.
    prefit=True,
    # -------------------------------------------------------------------------
    # Order of the norm used to filter the vectors of coefficients below
    # threshold in the case where the coef_ attribute of the estimator is of
    # dimension 2.
    norm_order=1,
    # -------------------------------------------------------------------------
    # The maximum number of features to select.
    max_features=None,
)

X_new = model.transform(X)
X_new.shape

[2]:

(150, 3)

Usando árboles#

[3]:

from sklearn.ensemble import ExtraTreesClassifier

treeClassifier = ExtraTreesClassifier(n_estimators=50)
treeClassifier = treeClassifier.fit(X, y)
treeClassifier.feature_importances_

[3]:

array([0.07210089, 0.04735931, 0.42085408, 0.45968572])

[4]:

from sklearn.feature_selection import SelectFromModel

model = SelectFromModel(
    estimator=treeClassifier,
    prefit=True,
)

X_new = model.transform(X)
X_new.shape

[4]:

(150, 2)