Selección de variables a partir de una prueba FPR (SelectFpr)#

  • Ultima modificación: 2023-03-11 | YouTube

  • Este metodología selecciona las características con valores críticos por debajo de un valor alpha basado en un test FPR (False Positive Rate Test).

  • La regla de selección es la siguiente:

    \text{p-value}_i < \alpha

[1]:
from sklearn.datasets import load_breast_cancer

X, y = load_breast_cancer(return_X_y=True)
X.shape
[1]:
(569, 30)
[2]:
from sklearn.feature_selection import SelectFpr, chi2

selectFpr = SelectFpr(
    # -------------------------------------------------------------------------
    # Function taking two arrays X and y, and returning a pair of arrays
    # (scores, pvalues).
    score_func=chi2,
    # -------------------------------------------------------------------------
    # The highest p-value for features to be kept.
    alpha=0.01,
)

selectFpr.fit(X, y)

X_new = selectFpr.transform(X)
X_new.shape
[2]:
(569, 16)
[3]:
selectFpr.scores_
[3]:
array([2.66104917e+02, 9.38975081e+01, 2.01110286e+03, 5.39916559e+04,
       1.49899264e-01, 5.40307549e+00, 1.97123536e+01, 1.05440354e+01,
       2.57379775e-01, 7.43065536e-05, 3.46752472e+01, 9.79353970e-03,
       2.50571896e+02, 8.75850471e+03, 3.26620664e-03, 6.13785332e-01,
       1.04471761e+00, 3.05231563e-01, 8.03633831e-05, 6.37136566e-03,
       4.91689157e+02, 1.74449400e+02, 3.66503542e+03, 1.12598432e+05,
       3.97365694e-01, 1.93149220e+01, 3.95169151e+01, 1.34854195e+01,
       1.29886140e+00, 2.31522407e-01])
[5]:
selectFpr.pvalues_
[5]:
array([8.01397628e-060, 3.32292194e-022, 0.00000000e+000, 0.00000000e+000,
       6.98631644e-001, 2.01012999e-002, 9.00175712e-006, 1.16563638e-003,
       6.11926026e-001, 9.93122221e-001, 3.89553429e-009, 9.21168192e-001,
       1.94877489e-056, 0.00000000e+000, 9.54425121e-001, 4.33366115e-001,
       3.06726812e-001, 5.80621137e-001, 9.92847410e-001, 9.36379753e-001,
       6.11324751e-109, 7.89668299e-040, 0.00000000e+000, 0.00000000e+000,
       5.28452867e-001, 1.10836762e-005, 3.25230064e-010, 2.40424384e-004,
       2.54421307e-001, 6.30397277e-001])