Selección de variables a partir de una prueba FPR (SelectFpr)#
Ultima modificación: 2023-03-11 | YouTube
Este metodología selecciona las características con valores críticos por debajo de un valor alpha basado en un test FPR (False Positive Rate Test).
La regla de selección es la siguiente:
\text{p-value}_i < \alpha
[1]:
from sklearn.datasets import load_breast_cancer
X, y = load_breast_cancer(return_X_y=True)
X.shape
[1]:
(569, 30)
[2]:
from sklearn.feature_selection import SelectFpr, chi2
selectFpr = SelectFpr(
# -------------------------------------------------------------------------
# Function taking two arrays X and y, and returning a pair of arrays
# (scores, pvalues).
score_func=chi2,
# -------------------------------------------------------------------------
# The highest p-value for features to be kept.
alpha=0.01,
)
selectFpr.fit(X, y)
X_new = selectFpr.transform(X)
X_new.shape
[2]:
(569, 16)
[3]:
selectFpr.scores_
[3]:
array([2.66104917e+02, 9.38975081e+01, 2.01110286e+03, 5.39916559e+04,
1.49899264e-01, 5.40307549e+00, 1.97123536e+01, 1.05440354e+01,
2.57379775e-01, 7.43065536e-05, 3.46752472e+01, 9.79353970e-03,
2.50571896e+02, 8.75850471e+03, 3.26620664e-03, 6.13785332e-01,
1.04471761e+00, 3.05231563e-01, 8.03633831e-05, 6.37136566e-03,
4.91689157e+02, 1.74449400e+02, 3.66503542e+03, 1.12598432e+05,
3.97365694e-01, 1.93149220e+01, 3.95169151e+01, 1.34854195e+01,
1.29886140e+00, 2.31522407e-01])
[5]:
selectFpr.pvalues_
[5]:
array([8.01397628e-060, 3.32292194e-022, 0.00000000e+000, 0.00000000e+000,
6.98631644e-001, 2.01012999e-002, 9.00175712e-006, 1.16563638e-003,
6.11926026e-001, 9.93122221e-001, 3.89553429e-009, 9.21168192e-001,
1.94877489e-056, 0.00000000e+000, 9.54425121e-001, 4.33366115e-001,
3.06726812e-001, 5.80621137e-001, 9.92847410e-001, 9.36379753e-001,
6.11324751e-109, 7.89668299e-040, 0.00000000e+000, 0.00000000e+000,
5.28452867e-001, 1.10836762e-005, 3.25230064e-010, 2.40424384e-004,
2.54421307e-001, 6.30397277e-001])