R2 Score o coeficiente de determinación#

  • Representa el porcentaje de la varianza de la variable de salida que es explicado por las variables independientes del modelo.

  • Es un indicador de la bondad de ajuste.

  • Se calcula como:

    R^2(y, \hat{y})= 1 - \frac {\sum (y_i - \hat{y}_i)^2} {\sum (y_i - \bar{y}_i)^2}

    con:

    • \bar{y} = \frac{1}{n} \sum y_i

    • \sum (y_i - \hat{y}_i)^2 = \sum e_i^2

[1]:
from sklearn.metrics import r2_score

y_true = [3.0, -0.5, 2, 7]
y_pred = [2.5, +0.0, 2, 8]

#
#  y_true  y_pred   e^2  (y_true-y_mean)^2
# -------------------------------------------
#     3.0     2.5  0.25             0.0156
#    -0.5     0.0  0.25            11.3906
#     2.0     2.0  0.00             0.7656
#     7.0     8.0  1.00            17.0156
#                 ------         -----------
#                  1.50            29.1875
#
# y_mean = 0.25 * (3.0 - 0.5 + 2.0 + 7.0) = 2.875
#
# r2 = 1.0 - 1.5 / 29.1875 = 0.9486
#

r2_score(
    # -------------------------------------------------------------------------
    # Ground truth (correct) target values.
    y_true=y_true,
    # -------------------------------------------------------------------------
    # Estimated target values.
    y_pred=y_pred,
    # -------------------------------------------------------------------------
    # Sample weights.
    sample_weight=None,
    # -------------------------------------------------------------------------
    # Defines aggregating of multiple output scores.
    # * 'raw_values': Returns a full set of scores in case of multioutput input.
    # * 'uniform_average': Scores of all outputs are averaged with uniform
    #      weight.
    # * 'variance_weighted': Scores of all outputs are averaged, weighted by
    #      the variances of each individual output.
    multioutput="uniform_average",
    # -------------------------------------------------------------------------
    # Flag indicating if NaN and -Inf scores resulting from constant data
    # should be replaced with real numbers (1.0 if prediction is perfect, 0.0
    # otherwise).
    force_finite=True,
)
[1]:
0.9486081370449679
[2]:
y_true = [[0.5, 1], [-1, 1], [7, -6]]
y_pred = [[0, 2], [-1, 2], [8, -5]]

#
# y_true_0 = [0.5, -1, 7]
# y_pred_0 = [0, -1, 8]
# r2_score(y_true_0, y_pred_0) = 0.9654
#
# y_true_1 = [1, 1, -6]
# y_pred_1 = [2, 2, -5]
# r2_score(y_true_1, y_pred_1) = 0.9081
#
r2_score(
    y_true,
    y_pred,
    multioutput="raw_values",
)
[2]:
array([0.96543779, 0.90816327])
[3]:
y_true = [[0.5, 1], [-1, 1], [7, -6]]
y_pred = [[0, 2], [-1, 2], [8, -5]]

#
# var(y_true_0) = 12.0556
# var(y_true_1) = 10.8889
#
# r2 = 0.9654 * 12.0556 / (12.0556 + 10.8889) +
#      0.9081 * 10.8889 / (12.0556 + 10.8889)
#    = 0.9383
#
r2_score(
    y_true,
    y_pred,
    multioutput="variance_weighted",
)
[3]:
0.9382566585956417
[4]:
y_true = [[0.5, 1], [-1, 1], [7, -6]]
y_pred = [[0, 2], [-1, 2], [8, -5]]

#
# r2 = 0.5 * (0.9654 + 0.9081)
#    = 0.9368
#
r2_score(
    y_true,
    y_pred,
    multioutput="uniform_average",
)
[4]:
0.9368005266622779
[5]:
#
# r2 = 0.9654 * 0.3 + 0.9081 * 0.7
#    = 0.9253
#
r2_score(
    y_true,
    y_pred,
    multioutput=[0.3, 0.7],
)
[5]:
0.9253456221198156
[6]:
r2_score(
    y_true,
    y_pred,
    force_finite=False,
)
[6]:
0.9368005266622779
[7]:
y_true = [-2, -2, -2]
y_pred = [-2, -2, -2 + 1e-8]

r2_score(
    y_true,
    y_pred,
)
[7]:
0.0
[8]:
r2_score(
    y_true,
    y_pred,
    force_finite=False,
)
/usr/local/lib/python3.8/dist-packages/sklearn/metrics/_regression.py:624: RuntimeWarning: divide by zero encountered in divide
  output_scores = 1 - (numerator / denominator)
[8]:
-inf