Gradiente Descendente Estocástico para Regresión#

Ultima modificación: 2023-03-11 | YouTube

Definición#

En este caso, se asume que el dataset esta formado por los ejemplos (x_1, y_1), \cdots, (x_n, y_n), donde cada x_i es un vector m dimensional y y es un escalar.

En un modelo de regresión, la variable dependiente y es una función lineal de las variables independientes x_i, tal que:

f(x)= b + w^T x

Donde:

x es un patrón o ejemplo del dataset.
b es el intercepto del modelo.
w son los coeficientes de las variales independientes.

Función de pérdida#

En este modelo se minimiza:

E(w) = \frac{1}{n} \sum_i L \left(y_i, f(x_i) \right) + \alpha R(w)

L es una de las funciones de pérdida descritas a continuación:

squared_loss:

L(d, y) = (d - y)^2

huber:

L(d, y)= \begin{cases} \frac{1}{2}[d - y]^2 & \text{Si } |d-y| \le \epsilon \\ \\ \epsilon |d-y|-\frac{1}{2}\epsilon^2 & \text{en cualquier otro caso} \end{cases}

epsilon_insensitive:

L(d, y)= \begin{cases} 0 & \text{Si } |d-y| \le \epsilon \\ \\ |d-y| - \epsilon & \text{en cualquier otro caso} \end{cases}

squared_epsilon_insensitive:

L(d, y)= [\text{epsilon_insensitive}(d, y)]^2

Regularización#

R es el término de regularización, el cual puede ser especificado como:

R(w) = \frac{1}{2} \sum_j w_j^2

R(w) = \frac{1}{2} \sum_j |w_j|

Elastic Net:

R(w) = \frac{\rho}{2} \sum_j w_j^2 + (1 - \rho) \sum_j |w_j|

Algoritmo de entrenamiento#

La estimación de parámetros se realiza utilizando el algoritmo de gradiente descendente estocástico, actualizando los parámetros del modelo a medida que va recorriendo los ejemplos (patrón de entrenamiento).

w \leftarrow w - \eta \left[ \alpha \frac{\partial R(w)}{\partial w} + \frac{\partial L(w^Tx_i + b_i, y_i)}{\partial w} \right]

Ejemplo numérico#

[1]:

from sklearn.datasets import make_regression
from sklearn.linear_model import SGDRegressor

X, y = make_regression(
    n_samples=100,
    n_features=1,
    n_informative=1,
    bias=0.0,
    effective_rank=None,
    tail_strength=0.9,
    noise=10.0,
    shuffle=False,
    coef=False,
    random_state=12356,
)


sgdRegressor = SGDRegressor(
    # -----------------------------------------------------
    # 'squared_loss', 'huber', 'epsilon_insensitive',
    # 'squared_epsilon_insensitive'
    loss="squared_loss",
    # -----------------------------------------------------
    # # 'none', 'l2' (ridge), 'l1' (lasso), 'elasticnet'
    penalty="l2",
    # -----------------------------------------------------
    # Constant that multiplies the regularization term.
    alpha=0.0001,
    # -----------------------------------------------------
    # The Elastic Net mixing parameter
    l1_ratio=0.15,
    # -----------------------------------------------------
    # Whether the intercept should be estimated or not.
    fit_intercept=True,
    # -----------------------------------------------------
    # The maximum number of passes over the training data
    # (aka epochs)
    max_iter=1000,
    # -----------------------------------------------------
    # Whether or not the training data should be shuffled
    # after each epoch.
    shuffle=True,
    # -----------------------------------------------------
    # Epsilon in the epsilon-insensitive loss functions
    epsilon=0.1,
    # -----------------------------------------------------
    # Used for shuffling the data, when shuffle is set to
    # True.
    random_state=None,
    # -----------------------------------------------------
    # The learning rate schedule
    # 'constant', 'optimal', 'invscaling', 'adaptive'
    learning_rate="invscaling",
    # -----------------------------------------------------
    # The initial learning rate for the ‘constant’,
    # ‘invscaling’ or ‘adaptive’ schedules.
    eta0=0.01,
    # -----------------------------------------------------
    # The exponent for inverse scaling learning rate.
    power_t=0.25,
    # -----------------------------------------------------
    # Whether to use early stopping to terminate training
    # when validation score is not improving.
    early_stopping=False,
    # -----------------------------------------------------
    # The proportion of training data to set aside as
    # validation set for early stopping.
    validation_fraction=0.1,
    # -----------------------------------------------------
    # Number of iterations with no improvement to wait
    # before stopping fitting.
    n_iter_no_change=5,
    # -----------------------------------------------------
    # When set to True, reuse the solution of the previous
    # call to fit as initialization, otherwise, just erase
    # the previous solution.
    warm_start=False,
    # -----------------------------------------------------
    # When set to True, computes the averaged SGD weights
    # across all updates and stores the result in the
    # coef_ attribute.
    average=False,
)

sgdRegressor.fit(X, y)

display(
    sgdRegressor.coef_,
    sgdRegressor.intercept_,
)

array([44.17796037])

array([1.31894612])

[2]:

def plot_regression(y_pred):
    plt.scatter(X, y, marker="o", s=130, edgecolors="white", c="tab:blue")
    plt.plot(X_pred, y_pred, "-", linewidth=4, color="tab:orange")
    plt.gca().spines["left"].set_color("gray")
    plt.gca().spines["bottom"].set_color("gray")
    plt.gca().spines["top"].set_visible(False)
    plt.gca().spines["right"].set_visible(False)
    plt.xlabel("x")
    plt.ylabel("y")

[3]:

import matplotlib.pyplot as plt
import numpy as np

X_pred = np.linspace(start=-2.5, stop=4, num=50).reshape(-1, 1)
y_pred = sgdRegressor.predict(X_pred)

plt.figure(figsize=(8, 8))
plot_regression(y_pred)
plt.show()

../_images/31_gradiente_descendente_estocastico_01_SGDRegressor_21_0.png

Aprendizaje online#

[4]:

#
# Se tomarn los valores por defecto del constructor
#

sgdRegressor = SGDRegressor()

plt.figure(figsize=(16, 5))

for i in range(4):
    plt.subplot(1, 4, i + 1)
    sgdRegressor.partial_fit(X, y)
    y_pred = sgdRegressor.predict(X_pred)
    plot_regression(y_pred)


plt.show()

../_images/31_gradiente_descendente_estocastico_01_SGDRegressor_23_0.png

[5]:

def fig1():

    import matplotlib.pyplot as plt
    import numpy as np

    plt.figure(figsize=(10, 10))
    errors = np.linspace(start=-2, stop=2, num=50, endpoint=True)

    def format_plot():
        plt.gca().spines["left"].set_color("gray")
        plt.gca().spines["bottom"].set_color("gray")
        plt.gca().spines["top"].set_visible(False)
        plt.gca().spines["right"].set_visible(False)
        ylim = plt.ylim()
        plt.ylim(-0.1, ylim[1])

    # -------------------------------------------------------------
    plt.subplot(2, 2, 1)
    squared_loss = np.power(errors, 2)
    plt.plot(errors, squared_loss, c="tab:blue", lw=3)
    plt.title("Squared loss")
    format_plot()

    # -------------------------------------------------------------
    def hubber(loss, epsilon):
        return np.where(
            np.abs(loss) <= epsilon,
            0.5 * np.power(loss, 2),
            epsilon * np.abs(loss) - 0.5 * epsilon ** 2,
        )

    plt.subplot(2, 2, 2)
    plt.plot(errors, hubber(errors, 0.2), c="tab:blue", lw=3, label="$\epsilon$=0.2")
    plt.plot(errors, hubber(errors, 0.5), c="tab:orange", lw=3, label="$\epsilon$=0.5")
    plt.plot(errors, hubber(errors, 0.8), c="tab:green", lw=3, label="$\epsilon$=0.8")
    plt.title("Hubber")
    format_plot()
    plt.legend()

    # -------------------------------------------------------------
    def epsilon_insensitive(loss, epsilon):
        return np.where(np.abs(loss) <= epsilon, 0, np.abs(loss) - epsilon)

    plt.subplot(2, 2, 3)
    plt.plot(
        errors,
        epsilon_insensitive(errors, 0.2),
        c="tab:blue",
        lw=3,
        label="$\epsilon$=0.2",
    )
    plt.plot(
        errors,
        epsilon_insensitive(errors, 0.5),
        c="tab:orange",
        lw=3,
        label="$\epsilon$=0.5",
    )
    plt.plot(
        errors,
        epsilon_insensitive(errors, 0.8),
        c="tab:green",
        lw=3,
        label="$\epsilon$=0.8",
    )
    format_plot()
    plt.title("Epsilon insensitive")
    plt.legend()

    # -------------------------------------------------------------
    def squared_epsilon_insensitive(loss, epsilon):
        return np.power(epsilon_insensitive(loss, epsilon), 2)

    plt.subplot(2, 2, 4)
    plt.plot(
        errors,
        squared_epsilon_insensitive(errors, 0.2),
        c="tab:blue",
        lw=3,
        label="$\epsilon$=0.2",
    )
    plt.plot(
        errors,
        squared_epsilon_insensitive(errors, 0.5),
        c="tab:orange",
        lw=3,
        label="$\epsilon$=0.5",
    )
    plt.plot(
        errors,
        squared_epsilon_insensitive(errors, 0.8),
        c="tab:green",
        lw=3,
        label="$\epsilon$=0.8",
    )
    plt.title("Squared epsilon insensitive")
    format_plot()
    plt.legend()

    plt.tight_layout()
    plt.savefig("assets/loss_functions_regression.png")
    plt.close()


## fig1()