Creación de un proyecto básico en ambiente docker#

  • Ultima modificación: Mayo 14, 2022

Diretorio para almacenar el proyecto#

[1]:
#
# Crea una carpeta para el proyecto. El proyecto también puede estar alojado en
# un repositorio de GitHub.
#
!rm -rf mlruns
!rm -rf /tmp/wine_prj
!mkdir /tmp/wine_prj

Código en Python#

[2]:
%%writefile /tmp/wine_prj/train_elasticnet.py

#
# Puede ejecutarse en la línea de comandos como:
# $ python3 train_elasticnet.py {alpha} {l1_ratio} {verbose}
#

def load_data():

    import pandas as pd

    url = "http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv"
    df = pd.read_csv(url, sep=";")

    y = df["quality"]
    x = df.copy()
    x.pop("quality")

    return x, y


def make_train_test_split(x, y):

    from sklearn.model_selection import train_test_split

    (x_train, x_test, y_train, y_test) = train_test_split(
        x,
        y,
        test_size=0.25,
        random_state=123456,
    )
    return x_train, x_test, y_train, y_test


def eval_metrics(y_true, y_pred):

    from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

    mse = mean_squared_error(y_true, y_pred)
    mae = mean_absolute_error(y_true, y_pred)
    r2 = r2_score(y_true, y_pred)

    return mse, mae, r2


def report(estimator, mse, mae, r2):

    print(estimator, ":", sep="")
    print(f"  MSE: {mse}")
    print(f"  MAE: {mae}")
    print(f"  R2: {r2}")


def run():
    #
    # Entrena un modelo sklearn ElasticNet
    #

    import sys

    from sklearn.linear_model import ElasticNet

    import mlflow

    x, y = load_data()
    x_train, x_test, y_train, y_test = make_train_test_split(x, y)

    alpha = float(sys.argv[1])
    l1_ratio = float(sys.argv[2])
    verbose = int(sys.argv[3])

    print('Tracking directory:', mlflow.get_tracking_uri())

    with mlflow.start_run():

        estimator = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=12345)
        estimator.fit(x_train, y_train)
        mse, mae, r2 = eval_metrics(y_test, y_pred=estimator.predict(x_test))
        if verbose > 0:
            report(estimator, mse, mae, r2)

        mlflow.log_param("alpha", alpha)
        mlflow.log_param("l1_ratio", l1_ratio)

        mlflow.log_metric("mse", mse)
        mlflow.log_metric("mae", mae)
        mlflow.log_metric("r2", r2)

        mlflow.sklearn.log_model(estimator, "model")


if __name__ == "__main__":
    run()
Writing /tmp/wine_prj/train_elasticnet.py

MLproject#

[3]:
%%writefile /tmp/wine_prj/MLproject
name: proyecto-de-demostracion

docker_env:
    image:  jdvelasq/mlflow:example

entry_points:
    main:
        parameters:
            alpha: {type: float, default: 0.1}
            l1_ratio: {type: float, default: 0.1}
            verbose: {type: integer, default: 1}
        command: 'python3 train_elasticnet.py {alpha} {l1_ratio} {verbose}'
Writing /tmp/wine_prj/MLproject

Docker#

[4]:
%%writefile /tmp/wine_prj/Dockerfile

FROM condaforge/miniforge3

RUN pip install mlflow \
    && pip install pandas \
    && pip install scikit-learn \
    && pip install cloudpickle
Writing /tmp/wine_prj/Dockerfile

Cree el contendor reemplazando su nombre de usuario:

$ docker build -t jdvelasq/mlflow:example .
$ docker push jdvelasq/mlflow:example

Ejecución en el ambiente local con parámetros por defecto#

[5]:
#
# Ejecución con parámetros por defecto
#
!mlflow run /tmp/wine_prj
2022/06/03 22:40:28 INFO mlflow.projects.docker: === Building docker image proyecto-de-demostracion ===
2022/06/03 22:40:28 INFO mlflow.projects.utils: === Created directory /var/folders/34/8tnnc98d5bv6wy7xzfb0qwhh0000gn/T/tmphzu5896c for downloading remote URIs passed to arguments of type 'path' ===
2022/06/03 22:40:28 INFO mlflow.projects.backend.local: === Running command 'docker run --rm -v /Volumes/GitHub/courses-source/notebooks/mlflow/mlruns:/mlflow/tmp/mlruns -v /Volumes/GitHub/courses-source/notebooks/mlflow/mlruns/0/c43b891ced4c490cb90e8468d784ffa6/artifacts:/Volumes/GitHub/courses-source/notebooks/mlflow/mlruns/0/c43b891ced4c490cb90e8468d784ffa6/artifacts -e MLFLOW_RUN_ID=c43b891ced4c490cb90e8468d784ffa6 -e MLFLOW_TRACKING_URI=file:///mlflow/tmp/mlruns -e MLFLOW_EXPERIMENT_ID=0 proyecto-de-demostracion:latest python3 train_elasticnet.py 0.1 0.1 1' in run with ID 'c43b891ced4c490cb90e8468d784ffa6' ===
/opt/conda/lib/python3.9/site-packages/_distutils_hack/__init__.py:30: UserWarning: Setuptools is replacing distutils.
  warnings.warn("Setuptools is replacing distutils.")
Tracking directory: file:///mlflow/tmp/mlruns
ElasticNet(alpha=0.1, l1_ratio=0.1, random_state=12345):
  MSE: 0.489021012335199
  MAE: 0.551252749110561
  R2: 0.29836649473051535
2022/06/03 22:40:32 INFO mlflow.projects: === Run (ID 'c43b891ced4c490cb90e8468d784ffa6') succeeded ===

Ejecución en el ambiente local con parámetros suministrados por el usuario#

[6]:
!mlflow run /tmp/wine_prj -P alpha=0.2 -P l1_ratio=0.2 -P verbose=1
2022/06/03 22:40:34 INFO mlflow.projects.docker: === Building docker image proyecto-de-demostracion ===
2022/06/03 22:40:35 INFO mlflow.projects.utils: === Created directory /var/folders/34/8tnnc98d5bv6wy7xzfb0qwhh0000gn/T/tmpbke5zwdd for downloading remote URIs passed to arguments of type 'path' ===
2022/06/03 22:40:35 INFO mlflow.projects.backend.local: === Running command 'docker run --rm -v /Volumes/GitHub/courses-source/notebooks/mlflow/mlruns:/mlflow/tmp/mlruns -v /Volumes/GitHub/courses-source/notebooks/mlflow/mlruns/0/215c210f935d4840b0e69b856e79d62e/artifacts:/Volumes/GitHub/courses-source/notebooks/mlflow/mlruns/0/215c210f935d4840b0e69b856e79d62e/artifacts -e MLFLOW_RUN_ID=215c210f935d4840b0e69b856e79d62e -e MLFLOW_TRACKING_URI=file:///mlflow/tmp/mlruns -e MLFLOW_EXPERIMENT_ID=0 proyecto-de-demostracion:latest python3 train_elasticnet.py 0.2 0.2 1' in run with ID '215c210f935d4840b0e69b856e79d62e' ===
/opt/conda/lib/python3.9/site-packages/_distutils_hack/__init__.py:30: UserWarning: Setuptools is replacing distutils.
  warnings.warn("Setuptools is replacing distutils.")
Tracking directory: file:///mlflow/tmp/mlruns
ElasticNet(alpha=0.2, l1_ratio=0.2, random_state=12345):
  MSE: 0.5170837474931838
  MAE: 0.5701436798648394
  R2: 0.2581028767270219
2022/06/03 22:40:39 INFO mlflow.projects: === Run (ID '215c210f935d4840b0e69b856e79d62e') succeeded ===
[7]:
!mlflow run /tmp/wine_prj -P alpha=0.1 -P l1_ratio=0.1 -P verbose=1
2022/06/03 22:40:40 INFO mlflow.projects.docker: === Building docker image proyecto-de-demostracion ===
2022/06/03 22:40:41 INFO mlflow.projects.utils: === Created directory /var/folders/34/8tnnc98d5bv6wy7xzfb0qwhh0000gn/T/tmp3e_opkq0 for downloading remote URIs passed to arguments of type 'path' ===
2022/06/03 22:40:41 INFO mlflow.projects.backend.local: === Running command 'docker run --rm -v /Volumes/GitHub/courses-source/notebooks/mlflow/mlruns:/mlflow/tmp/mlruns -v /Volumes/GitHub/courses-source/notebooks/mlflow/mlruns/0/eaebccc62bf64f1a87778bebe9ea0d16/artifacts:/Volumes/GitHub/courses-source/notebooks/mlflow/mlruns/0/eaebccc62bf64f1a87778bebe9ea0d16/artifacts -e MLFLOW_RUN_ID=eaebccc62bf64f1a87778bebe9ea0d16 -e MLFLOW_TRACKING_URI=file:///mlflow/tmp/mlruns -e MLFLOW_EXPERIMENT_ID=0 proyecto-de-demostracion:latest python3 train_elasticnet.py 0.1 0.1 1' in run with ID 'eaebccc62bf64f1a87778bebe9ea0d16' ===
/opt/conda/lib/python3.9/site-packages/_distutils_hack/__init__.py:30: UserWarning: Setuptools is replacing distutils.
  warnings.warn("Setuptools is replacing distutils.")
Tracking directory: file:///mlflow/tmp/mlruns
ElasticNet(alpha=0.1, l1_ratio=0.1, random_state=12345):
  MSE: 0.489021012335199
  MAE: 0.551252749110561
  R2: 0.29836649473051535
2022/06/03 22:40:45 INFO mlflow.projects: === Run (ID 'eaebccc62bf64f1a87778bebe9ea0d16') succeeded ===
[8]:
!mlflow run /tmp/wine_prj -P alpha=0.5 -P l1_ratio=0.5 -P verbose=1
2022/06/03 22:40:46 INFO mlflow.projects.docker: === Building docker image proyecto-de-demostracion ===
2022/06/03 22:40:47 INFO mlflow.projects.utils: === Created directory /var/folders/34/8tnnc98d5bv6wy7xzfb0qwhh0000gn/T/tmpmc9470gt for downloading remote URIs passed to arguments of type 'path' ===
2022/06/03 22:40:47 INFO mlflow.projects.backend.local: === Running command 'docker run --rm -v /Volumes/GitHub/courses-source/notebooks/mlflow/mlruns:/mlflow/tmp/mlruns -v /Volumes/GitHub/courses-source/notebooks/mlflow/mlruns/0/e5f3f7f7915d4c2694e477da57578fd8/artifacts:/Volumes/GitHub/courses-source/notebooks/mlflow/mlruns/0/e5f3f7f7915d4c2694e477da57578fd8/artifacts -e MLFLOW_RUN_ID=e5f3f7f7915d4c2694e477da57578fd8 -e MLFLOW_TRACKING_URI=file:///mlflow/tmp/mlruns -e MLFLOW_EXPERIMENT_ID=0 proyecto-de-demostracion:latest python3 train_elasticnet.py 0.5 0.5 1' in run with ID 'e5f3f7f7915d4c2694e477da57578fd8' ===
/opt/conda/lib/python3.9/site-packages/_distutils_hack/__init__.py:30: UserWarning: Setuptools is replacing distutils.
  warnings.warn("Setuptools is replacing distutils.")
Tracking directory: file:///mlflow/tmp/mlruns
ElasticNet(alpha=0.5, random_state=12345):
  MSE: 0.6349429447805036
  MAE: 0.6453803508338732
  R2: 0.0890018368226928
2022/06/03 22:40:51 INFO mlflow.projects: === Run (ID 'e5f3f7f7915d4c2694e477da57578fd8') succeeded ===

MLflow ui#

Para visualizar la interfase use:

mlflow ui

Nota: En docker usar:

mlflow ui --host 0.0.0.0

con:

http://127.0.0.1:5001

assets/mlflow-project-2-docker-part-0

Detalles de la corrida

assets/mlflow-project-2-docker-part-1 assets/mlflow-project-2-docker-part-2 assets/mlflow-project-2-docker-part-3