Regresión Logística#

Para un problema binario dicotomico con clases C_1 y C_2, la probabilidad de pertenencia de un punto \mathbf{x} a la clase C_1 depende de los parámetros del modelo \mathbf{w}, tal que:

p(C_1|\mathbf{w}) = \sigma \left( w_0 + \sum_i w_i x_i \right)

y la pertenecia a la clase C_2 es igual a 1 - p(C_1 | \mathbf{w}). En la ecuación anterior, \sigma(u) es la función logística:

y_i = \sigma(u) = \frac{1}{1+\exp(-u)}

assets/logreg-2.jpg

En la regresión logística, los parámetros del modelo se estiman a través de la minimización de la función de entropía cruzada, definida como:

L = - \sum_i^P d_i \cdot \log(y_i) + (1-d_i) \cdot \log(1 - y_i)

La interpretación es la siguiente:
- Si d_i = 1, entonces se desea minimizar -\log(y_i) por lo que y_i debe ser lo más cercana posible a 1.0 (recuerde que la función logística genera valores entre 0 y 1, y que el término (1 - d_i) = 0).
- Si d_i = 0, entonces se desea minimizar -\log(1 - y_i) por lo que y_i debe ser lo más cercana posible a 0.0.

[1]:

import matplotlib.pyplot as plt
import numpy as np


def plot_entropy():

    x = np.linspace(0.0001, 0.9999, 100)

    plt.figure(figsize=(12, 5))

    plt.subplot(1, 2, 1)
    plt.plot(x, -np.log(x))
    plt.xlabel("Probabilidad pronosticada (p) con y=1")
    plt.ylabel("$-\log(p)$")

    plt.subplot(1, 2, 2)
    plt.plot(x, -np.log(1 - x))
    plt.xlabel("Probabilidad pronosticada (p) con y=0")
    plt.ylabel("$-\log(p)$")

    plt.show()


plot_entropy()

../_images/25_clasificadores_clasicos_01_LogisticRegression_5_0.png

Para más de dos clases se generaliza a: $$
\sum `d_c :nbsphinx-math:cdot :nbsphinx-math:log (p_c) $$ donde :math:`d_c indica que el punto pertenece a la clase c y p_c es su probabilidad de pertenencia (resultado del clasificador).

scikit-learn implementa diferentes funciones de costo, donde y_i \in \{0, 1 \}:
- Penalización l_2
- Penalización l_1
- Elastic-Net

[2]:

from sklearn.datasets import make_blobs

NPOINTS = 150

X, d = make_blobs(
    n_samples=NPOINTS,
    n_features=2,
    centers=3,
    cluster_std=0.8,
    shuffle=False,
    random_state=12345,
)

plt.scatter(X[:50, 0], X[:50, 1], color="red")
plt.scatter(X[50:100, 0], X[50:100, 1], color="blue")
plt.scatter(X[100:, 0], X[100:, 1], color="green")
plt.gca().set_aspect("equal", adjustable="box")

../_images/25_clasificadores_clasicos_01_LogisticRegression_8_0.png

[3]:

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import log_loss

logisticRegression = LogisticRegression(
    # --------------------------------------------------------------------------
    # Specify the norm of the penalty:
    # * None: no penalty is added.
    # * 'l2': add a L2 penalty term and it is the default choice.
    # * 'l1': add a L1 penalty term.
    # * 'elasticnet': both L1 and L2 penalty terms are added.
    penalty=None,
    # --------------------------------------------------------------------------
    # Tolerance for stopping criteria.
    tol=0.0001,
    # --------------------------------------------------------------------------
    # Inverse of regularization strength; must be a positive float. Like in
    # support vector machines, smaller values specify stronger regularization.
    C=1.0,
    # --------------------------------------------------------------------------
    # Specifies if a constant (a.k.a. bias or intercept) should be added to the
    # decision function.
    fit_intercept=True,
    # --------------------------------------------------------------------------
    # Weights associated with classes in the form {class_label: weight}. If not
    # given, all classes are supposed to have weight one.
    # * 'balanced' uses the values of y to automatically adjust weights
    # inversely proportional to class frequencies in the input data as
    # n_samples / (n_classes * np.bincount(y)).
    class_weight=None,
    # --------------------------------------------------------------------------
    # Used when solver == ‘sag’, ‘saga’ or ‘liblinear’ to shuffle the data.
    random_state=None,
    # --------------------------------------------------------------------------
    # Algorithm to use in the optimization problem. Default is ‘lbfgs’. To
    # choose a solver, you might want to consider the following aspects:
    # * For small datasets, ‘liblinear’ is a good choice, whereas ‘sag’ and
    #   ‘saga’ are faster for large ones.
    # * For multiclass problems, only ‘newton-cg’, ‘sag’, ‘saga’ and ‘lbfgs’
    #   handle multinomial loss.
    # * ‘liblinear’ and is limited to one-versus-rest schemes.
    # * ‘newton-cholesky’ is a good choice for n_samples >> n_features,
    #   especially with one-hot encoded categorical features with rare
    #   categories. Note that it is limited to binary classification and the
    #   one-versus-rest reduction for multiclass classification. Be aware that
    #   the memory usage of this solver has a quadratic dependency on n_features
    #   because it explicitly computes the Hessian matrix.
    solver="lbfgs",
    # --------------------------------------------------------------------------
    # Maximum number of iterations taken for the solvers to converge.
    max_iter=100,
    # --------------------------------------------------------------------------
    # {‘ovr’, ‘multinomial’, ‘auto’}
    # If the option chosen is ‘ovr’, then a binary problem is fit for each
    # label. For ‘multinomial’ the loss minimised is the multinomial loss fit
    # across the entire probability distribution, even when the data is binary.
    multi_class="auto",
    # --------------------------------------------------------------------------
    verbose=0,
    # --------------------------------------------------------------------------
    # When set to True, reuse the solution of the previous call to fit as
    # initialization, otherwise, just erase the previous solution.
    warm_start=False,
    # --------------------------------------------------------------------------
    # The Elastic-Net mixing parameter, with 0 <= l1_ratio <= 1. Only used if
    # penalty='elasticnet'. Setting l1_ratio=0 is equivalent to using
    # penalty='l2', while setting l1_ratio=1 is equivalent to using penalty='l1'.
    # For 0 < l1_ratio <1, the penalty is a combination of L1 and L2.
    l1_ratio=None,
)

logisticRegression.fit(X, d)

logisticRegression.predict(X)

[3]:

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

[4]:

logisticRegression.predict([[2.5, -4.0]])

[4]:

array([0])

[5]:

logisticRegression.intercept_

[5]:

array([-5.32731038, -2.58361454,  7.91092493])

[6]:

logisticRegression.coef_

[6]:

array([[ 1.47085335, -1.78916427],
       [-1.03371155, -1.63099238],
       [-0.4371418 ,  3.42015665]])

[7]:

import numpy as np

plt.scatter(X[:50, 0], X[:50, 1], color="red")
plt.scatter(X[50:100, 0], X[50:100, 1], color="blue")
plt.scatter(X[100:, 0], X[100:, 1], color="green")
plt.gca().set_aspect("equal", adjustable="box")

xlim = plt.gca().get_xlim()
ylim = plt.gca().get_ylim()

xplot = np.linspace(start=xlim[0], stop=xlim[1], num=20)

fronteraA = (
    -logisticRegression.intercept_[0] / logisticRegression.coef_[0][1]
    - logisticRegression.coef_[0][0] / logisticRegression.coef_[0][1] * xplot
)

fronteraB = (
    -logisticRegression.intercept_[1] / logisticRegression.coef_[1][1]
    - logisticRegression.coef_[1][0] / logisticRegression.coef_[1][1] * xplot
)

fronteraC = (
    -logisticRegression.intercept_[2] / logisticRegression.coef_[2][1]
    - logisticRegression.coef_[2][0] / logisticRegression.coef_[2][1] * xplot
)

plt.plot(xplot, fronteraA, color="red", linewidth=3)
plt.plot(xplot, fronteraB, color="blue", linewidth=3)
plt.plot(xplot, fronteraC, color="green", linewidth=3)

plt.xlim(xlim)
plt.ylim(ylim);

../_images/25_clasificadores_clasicos_01_LogisticRegression_13_0.png

[8]:

logisticRegression.predict_proba(X)[0:10]

[8]:

array([[9.99999996e-01, 3.37213413e-09, 5.87763928e-10],
       [9.99999987e-01, 9.83061700e-10, 1.16679900e-08],
       [1.00000000e+00, 4.42184492e-10, 2.02839427e-13],
       [9.99999997e-01, 2.32571602e-09, 3.57170430e-10],
       [9.99999999e-01, 2.91545720e-10, 1.06746679e-09],
       [9.99999793e-01, 2.06351621e-07, 9.43530066e-10],
       [1.00000000e+00, 1.30864625e-10, 2.63441344e-12],
       [9.99999984e-01, 1.22786241e-08, 3.48299674e-09],
       [1.00000000e+00, 5.12857770e-12, 2.08349752e-14],
       [9.99999986e-01, 1.26548112e-08, 8.47345532e-10]])