Construcción de tuberias de estimadores con Pipeline y make_pipeline#

Creación de un pipeline asignando nombres a las componentes#

[1]:
from sklearn.decomposition import PCA
from sklearn.pipeline import Pipeline
from sklearn.svm import SVC

#
# Creación de una tubería de estimadores
# asignando un identificador
#
estimators = [
    ("reduce_dim", PCA()),
    ("clf", SVC()),
]

pipeline = Pipeline(
    # -------------------------------------------------------------------------
    # List of (name, transform) tuples (implementing fit/transform) that are
    # chained, in the order in which they are chained, with the last object an
    # estimator
    steps=estimators,
    # -------------------------------------------------------------------------
    # If True, the time elapsed while fitting each step will be printed as it
    # is completed.
    verbose=False,
)
pipeline
[1]:
Pipeline(steps=[('reduce_dim', PCA()), ('clf', SVC())])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Creación de un pipeline usando make_pipeline#

[2]:
#
# Creación de una tubería con identificadores
# por defecto para sus componentes
#
from sklearn.pipeline import make_pipeline

make_pipeline(
    # -------------------------------------------------------------------------
    # List of the scikit-learn estimators that are chained together.
    PCA(),
    SVC(),
)
[2]:
Pipeline(steps=[('pca', PCA()), ('svc', SVC())])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Acceso a las componentes del pipeline#

[3]:
#
# Acceso a la tupla mediante un índice
#
pipeline.steps[0]
[3]:
('reduce_dim', PCA())
[4]:
#
# Acceso al estimador mediante un índice
#
pipeline[0]
[4]:
PCA()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
[5]:
#
# Accesso al estimador usando su nombre
#
pipeline["reduce_dim"]
[5]:
PCA()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Actualización de los parámetros de los estimadores#

[6]:
#
# Actualización de los parámetros de un estimador
# en la tubería
#
pipeline.set_params(clf__C=10)
[6]:
Pipeline(steps=[('reduce_dim', PCA()), ('clf', SVC(C=10))])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Búsqueda de hiperparámetros óptimos con GridSearchCV#

[7]:
#
# Especificación de una malla de valores de parámetros en la tubería para
# buscar su combinación óptima
#
from sklearn.model_selection import GridSearchCV

param_grid = dict(
    reduce_dim__n_components=[2, 5, 10],
    clf__C=[0.1, 10, 100],
)

grid_search = GridSearchCV(
    # -------------------------------------------------------------------------
    # estimador/modelo
    estimator=pipeline,
    # -------------------------------------------------------------------------
    # Dictionary with parameters names (str) as keys and lists of parameter
    # settings to try as values
    param_grid=param_grid,
)

Creación de modelos fuera de la tubería de estimadores#

[8]:
from sklearn.datasets import load_digits

X_digits, y_digits = load_digits(return_X_y=True)

pca = PCA()
clf = SVC()

pipeline = Pipeline(
    [
        ("reduce_dim", pca),
        ("clf", clf),
    ],
)

pipeline.fit(X_digits, y_digits)

pca.components_
[8]:
array([[-1.77484909e-19, -1.73094651e-02, -2.23428835e-01, ...,
        -8.94184677e-02, -3.65977111e-02, -1.14684954e-02],
       [ 3.27805401e-18, -1.01064569e-02, -4.90849204e-02, ...,
         1.76697117e-01,  1.94547053e-02, -6.69693895e-03],
       [-1.68358559e-18,  1.83420720e-02,  1.26475543e-01, ...,
         2.32084163e-01,  1.67026563e-01,  3.48043832e-02],
       ...,
       [ 0.00000000e+00, -1.29445414e-16, -2.59448629e-17, ...,
        -1.11022302e-16, -5.55111512e-17,  3.46944695e-17],
       [ 0.00000000e+00, -1.29045086e-16, -9.43433991e-19, ...,
         0.00000000e+00,  0.00000000e+00,  3.12250226e-17],
       [ 1.00000000e+00, -1.68983002e-17,  5.73338351e-18, ...,
         8.66631300e-18, -1.57615962e-17,  4.07058917e-18]])