Eliminación recursiva de características (RFE)#

Ultima modificación: 2023-03-11 | YouTube

En esta metodología se usa un estimador que asigna pesos a las características, por ejemplo, los pesos en un modelo de regresión lineal.

Se inicia con un conjunto que contiene todas las características.
Se estima un modelo y se hace un ranking de las características.
Se elimina la menos importante.
Se repite el proceso hasta que se alcance el número de características deseado.

[1]:

from sklearn.datasets import make_friedman1

X, y = make_friedman1(
    n_samples=50,
    n_features=10,
    random_state=0,
)

[2]:

from sklearn.feature_selection import RFE
from sklearn.svm import SVR

estimator = SVR(kernel="linear")

selector = RFE(
    # -------------------------------------------------------------------------
    # A supervised learning estimator with a fit method that provides
    # information about feature importance
    estimator=estimator,
    # -------------------------------------------------------------------------
    # The number of features to select. If None, half of the features are
    # selected. If integer, the parameter is the absolute number of features to
    # select. If float between 0 and 1, it is the fraction of features to
    # select.
    n_features_to_select=5,
    # -------------------------------------------------------------------------
    # If greater than or equal to 1, then step corresponds to the (integer)
    # number of features to remove at each iteration. If within (0.0, 1.0),
    # then step corresponds to the percentage (rounded down) of features to
    # remove at each iteration.
    step=1,
    # -------------------------------------------------------------------------
    # Controls verbosity of output.
    verbose=0,
)

selector = selector.fit(X, y)

X_new = selector.transform(X)
X_new.shape

[2]:

(50, 5)

[3]:

#
# The mask of selected features.
#
selector.support_

[3]:

array([ True,  True,  True,  True,  True, False, False, False, False,
       False])

[4]:

#
# The feature ranking, such that ranking_[i] corresponds to the ranking
# position of the i-th feature. Selected (i.e., estimated best) features are
# assigned rank 1.
#
selector.ranking_

[4]:

array([1, 1, 1, 1, 1, 6, 4, 3, 2, 5])

[5]:

#
# The fitted estimator used to select features.
#
selector.estimator_

[5]:

SVR(kernel='linear')

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.