Evaluación de métricas con cross_validate — 6:44#

  • Ultima modificación: 2023-02-27 | YouTube

Evalua las métricas indicadas para cada uno de los grupos de test generados cuando se usa un esquema de validación cruzada.

  • Permite muliples métricas para evaluación.

  • Retorna un diccionario conteniendo los tiempos de entrenamiento y scoring, y el resultado del test.

[1]:
from sklearn import datasets, linear_model
from sklearn.metrics import confusion_matrix, make_scorer
from sklearn.model_selection import cross_validate
from sklearn.svm import LinearSVC
import numpy as np


diabetes = datasets.load_diabetes()
X = diabetes.data[:150]
y = diabetes.target[:150]

estimator = linear_model.Lasso()
[2]:
cv_results = cross_validate(
    # -------------------------------------------------------------------------
    # The object to use to fit the data. Must implement fit()
    estimator=estimator,
    # -------------------------------------------------------------------------
    # The data to fit. Can be for example a list, or an array.
    X=X,
    # -------------------------------------------------------------------------
    # The target variable to try to predict in the case of supervised learning.
    y=y,
    # -------------------------------------------------------------------------
    # Group labels for the samples used while splitting the dataset into
    # train/test set. Only used in conjunction with a “Group” cv instance
    # (e.g., GroupKFold).
    groups=None,
    # -------------------------------------------------------------------------
    # Strategy to evaluate the performance of the cross-validated model on the
    # test set.
    scoring=None,
    # -------------------------------------------------------------------------
    # Determines the cross-validation splitting strategy.
    cv=3,
    # -------------------------------------------------------------------------
    # The verbosity level.
    verbose=0,
    # -------------------------------------------------------------------------
    # Parameters to pass to the fit method of the estimator.
    fit_params=None,
    # -------------------------------------------------------------------------
    # Whether to include train scores.
    return_train_score=False,
    # -------------------------------------------------------------------------
    # Whether to return the estimators fitted on each split.
    return_estimator=False,
    # -------------------------------------------------------------------------
    # Value to assign to the score if an error occurs in estimator fitting.
    error_score=np.nan,
)

sorted(cv_results.keys())
[2]:
['fit_time', 'score_time', 'test_score']
[3]:
cv_results["test_score"]
[3]:
array([0.3315057 , 0.08022103, 0.03531816])
[4]:
scores = cross_validate(
    estimator,
    X,
    y,
    cv=3,
    scoring=("r2", "neg_mean_squared_error"),
    return_train_score=True,
)

scores["test_neg_mean_squared_error"]
[4]:
array([-3635.52042005, -3573.35050281, -6114.77901585])
[5]:
scores["train_r2"]
[5]:
array([0.28009951, 0.3908844 , 0.22784907])
[6]:
scores = cross_validate(
    estimator,
    X,
    y,
    cv=3,
    scoring=("r2", "neg_mean_squared_error"),
    return_estimator=True,
)

scores['estimator']
[6]:
[Lasso(), Lasso(), Lasso()]