Evaluación de métricas con cross_validate — 6:44#

Ultima modificación: 2023-02-27 | YouTube

Evalua las métricas indicadas para cada uno de los grupos de test generados cuando se usa un esquema de validación cruzada.

Permite muliples métricas para evaluación.

Retorna un diccionario conteniendo los tiempos de entrenamiento y scoring, y el resultado del test.

[1]:

from sklearn import datasets, linear_model
from sklearn.metrics import confusion_matrix, make_scorer
from sklearn.model_selection import cross_validate
from sklearn.svm import LinearSVC
import numpy as np


diabetes = datasets.load_diabetes()
X = diabetes.data[:150]
y = diabetes.target[:150]

estimator = linear_model.Lasso()

[2]:

cv_results = cross_validate(
    # -------------------------------------------------------------------------
    # The object to use to fit the data. Must implement fit()
    estimator=estimator,
    # -------------------------------------------------------------------------
    # The data to fit. Can be for example a list, or an array.
    X=X,
    # -------------------------------------------------------------------------
    # The target variable to try to predict in the case of supervised learning.
    y=y,
    # -------------------------------------------------------------------------
    # Group labels for the samples used while splitting the dataset into
    # train/test set. Only used in conjunction with a “Group” cv instance
    # (e.g., GroupKFold).
    groups=None,
    # -------------------------------------------------------------------------
    # Strategy to evaluate the performance of the cross-validated model on the
    # test set.
    scoring=None,
    # -------------------------------------------------------------------------
    # Determines the cross-validation splitting strategy.
    cv=3,
    # -------------------------------------------------------------------------
    # The verbosity level.
    verbose=0,
    # -------------------------------------------------------------------------
    # Parameters to pass to the fit method of the estimator.
    fit_params=None,
    # -------------------------------------------------------------------------
    # Whether to include train scores.
    return_train_score=False,
    # -------------------------------------------------------------------------
    # Whether to return the estimators fitted on each split.
    return_estimator=False,
    # -------------------------------------------------------------------------
    # Value to assign to the score if an error occurs in estimator fitting.
    error_score=np.nan,
)

sorted(cv_results.keys())

[2]:

['fit_time', 'score_time', 'test_score']

[3]:

cv_results["test_score"]

[3]:

array([0.3315057 , 0.08022103, 0.03531816])

[4]:

scores = cross_validate(
    estimator,
    X,
    y,
    cv=3,
    scoring=("r2", "neg_mean_squared_error"),
    return_train_score=True,
)

scores["test_neg_mean_squared_error"]

[4]:

array([-3635.52042005, -3573.35050281, -6114.77901585])

[5]:

scores["train_r2"]

[5]:

array([0.28009951, 0.3908844 , 0.22784907])

[6]:

scores = cross_validate(
    estimator,
    X,
    y,
    cv=3,
    scoring=("r2", "neg_mean_squared_error"),
    return_estimator=True,
)

scores['estimator']

[6]:

[Lasso(), Lasso(), Lasso()]