Hinge loss#

Computa la distancia promedio entre el modelo y los datos usando la función de pérdida de hinge.

Caso binario#

Si y_i es la clase verdadera en un problema de clasificación binario codificada como y_i=\{-1, +1\} para cada muestra i, y w_i es la predicción de la decisión, esta función se define como:

L_\text{Hinge}(y, w) = \frac{1}{n_\text{samples}} \sum_{i=0}^{n_\text{samples}-1} \text{max} \{1-w_i y_i, 0 \}

[1]:

from sklearn import svm
from sklearn.metrics import hinge_loss

#   Train set         Test set
#  ------------    --------------
#   X     y         X      y_true
#   [0]   -1        [-2]   -1
#   [1]   +1        [3]    +1
#                   [0.5]  +1
X = [[0], [1]]
y = [-1, 1]

est = svm.LinearSVC(random_state=0)
est.fit(X, y)

[1]:

LinearSVC(random_state=0)

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

[2]:

pred_decision = est.decision_function([[-2], [3], [0.5]])
pred_decision

[2]:

array([-2.18177944,  2.36355888,  0.09088972])

[3]:

#               decision   y_true
#    max(1 - -2.18177944 *     -1, 0) = max(-1.181779, 0) = 0
#    max(1 - +2.36355888 *     +1, 0) = max(-1.363558, 0) = 0
#    max(1 - +0.09088972 *     +1, 0) = max(+0.909110, 0) = 0.909110
#
#   (0 + 0 + 0.909110) / 3 = 0.303036
#
hinge_loss([-1, 1, 1], pred_decision)

[3]:

0.3030367603854425

Caso con múltiple clases#

Si w_{i,y_i} es la predicción para la etiqueta verdadera y_i de la i-ésima muestra, y \hat{w}_{i,y_i}=\text{max}\{w_{i,y_i} | y_j \ne y_i \} es el máximo de las decisiones pronosticadas para todas las otras etiquetas, entonces esta función se define como:

L_\text{Hinge}(y, w) = \frac{1}{n_\text{samples}} \sum_{i=0}^{n_\text{samples}-1} \text{max}\{1+\hat{w}_{i,y_i} - w_{i,y_i}, 0\}

[4]:

X = [[0], [1], [2], [3]]
Y = [0, 1, 2, 3]
labels = [0, 1, 2, 3]
est = svm.LinearSVC()
est.fit(X, Y)

[4]:

LinearSVC()

[5]:

#                              y_true = [0,   2,   3]
pred_decision = est.decision_function([[-1], [2], [3]])
pred_decision

[5]:

array([[ 1.2727043 ,  0.03421968, -0.68376828, -1.40169125],
       [-1.45454086, -0.58122799, -0.37609109, -0.17098113],
       [-2.36362258, -0.78637722, -0.27353203,  0.23925558]])

[6]:

#
# max(0, 1 - (+1.2727241) + max(+0.03420421, -0.68377149, -1.40167018))
# max(0, 1 - (-0.3760998) + max(-1.45454482, -0.58120684, -0.17100248))
# max(0, 1 - (+0.2392200) + max(-2.36363446, -0.78634386, -0.27354264))
#
# max(0, 1 - 1.2727241 + 0.03420421) = 0
# max(0, 1 + 0.3760998 - 0.17100248) = 1.20509732
# max(0, 1 - 0.2392200 - 0.27354264) = 0.48723736
#
# (0 + 1.20509732 + 0.48723736) / 3 = 0.56411154
#
y_true = [0, 2, 3]
hinge_loss(y_true, pred_decision, labels=labels)

[6]:

0.564107451353502