R2 Score o coeficiente de determinación#

Representa el porcentaje de la varianza de la variable de salida que es explicado por las variables independientes del modelo.

Es un indicador de la bondad de ajuste.

Se calcula como:

R^2(y, \hat{y})= 1 - \frac {\sum (y_i - \hat{y}_i)^2} {\sum (y_i - \bar{y}_i)^2}

con:
- \bar{y} = \frac{1}{n} \sum y_i
- \sum (y_i - \hat{y}_i)^2 = \sum e_i^2

[1]:

from sklearn.metrics import r2_score

y_true = [3.0, -0.5, 2, 7]
y_pred = [2.5, +0.0, 2, 8]

#
#  y_true  y_pred   e^2  (y_true-y_mean)^2
# -------------------------------------------
#     3.0     2.5  0.25             0.0156
#    -0.5     0.0  0.25            11.3906
#     2.0     2.0  0.00             0.7656
#     7.0     8.0  1.00            17.0156
#                 ------         -----------
#                  1.50            29.1875
#
# y_mean = 0.25 * (3.0 - 0.5 + 2.0 + 7.0) = 2.875
#
# r2 = 1.0 - 1.5 / 29.1875 = 0.9486
#

r2_score(
    # -------------------------------------------------------------------------
    # Ground truth (correct) target values.
    y_true=y_true,
    # -------------------------------------------------------------------------
    # Estimated target values.
    y_pred=y_pred,
    # -------------------------------------------------------------------------
    # Sample weights.
    sample_weight=None,
    # -------------------------------------------------------------------------
    # Defines aggregating of multiple output scores.
    # * 'raw_values': Returns a full set of scores in case of multioutput input.
    # * 'uniform_average': Scores of all outputs are averaged with uniform
    #      weight.
    # * 'variance_weighted': Scores of all outputs are averaged, weighted by
    #      the variances of each individual output.
    multioutput="uniform_average",
    # -------------------------------------------------------------------------
    # Flag indicating if NaN and -Inf scores resulting from constant data
    # should be replaced with real numbers (1.0 if prediction is perfect, 0.0
    # otherwise).
    force_finite=True,
)

[1]:

0.9486081370449679

[2]:

y_true = [[0.5, 1], [-1, 1], [7, -6]]
y_pred = [[0, 2], [-1, 2], [8, -5]]

#
# y_true_0 = [0.5, -1, 7]
# y_pred_0 = [0, -1, 8]
# r2_score(y_true_0, y_pred_0) = 0.9654
#
# y_true_1 = [1, 1, -6]
# y_pred_1 = [2, 2, -5]
# r2_score(y_true_1, y_pred_1) = 0.9081
#
r2_score(
    y_true,
    y_pred,
    multioutput="raw_values",
)

[2]:

array([0.96543779, 0.90816327])

[3]:

y_true = [[0.5, 1], [-1, 1], [7, -6]]
y_pred = [[0, 2], [-1, 2], [8, -5]]

#
# var(y_true_0) = 12.0556
# var(y_true_1) = 10.8889
#
# r2 = 0.9654 * 12.0556 / (12.0556 + 10.8889) +
#      0.9081 * 10.8889 / (12.0556 + 10.8889)
#    = 0.9383
#
r2_score(
    y_true,
    y_pred,
    multioutput="variance_weighted",
)

[3]:

0.9382566585956417

[4]:

y_true = [[0.5, 1], [-1, 1], [7, -6]]
y_pred = [[0, 2], [-1, 2], [8, -5]]

#
# r2 = 0.5 * (0.9654 + 0.9081)
#    = 0.9368
#
r2_score(
    y_true,
    y_pred,
    multioutput="uniform_average",
)

[4]:

0.9368005266622779

[5]:

#
# r2 = 0.9654 * 0.3 + 0.9081 * 0.7
#    = 0.9253
#
r2_score(
    y_true,
    y_pred,
    multioutput=[0.3, 0.7],
)

[5]:

0.9253456221198156

[6]:

r2_score(
    y_true,
    y_pred,
    force_finite=False,
)

[6]:

0.9368005266622779

[7]:

y_true = [-2, -2, -2]
y_pred = [-2, -2, -2 + 1e-8]

r2_score(
    y_true,
    y_pred,
)

[7]:

0.0

[8]:

r2_score(
    y_true,
    y_pred,
    force_finite=False,
)

/usr/local/lib/python3.8/dist-packages/sklearn/metrics/_regression.py:624: RuntimeWarning: divide by zero encountered in divide
  output_scores = 1 - (numerator / denominator)

[8]:

-inf