(Opcional) Paso 3: Métricas, parametros y gráficas#

  • Ultima modificación: Mayo 14, 2022

Esta es la continuación del ejemplo anterior

[1]:
%cd dvcdemo
/workspace/dvcdemo

Recolección de métricas#

[2]:
cmd = """
dvc run -n evaluate \
          -d src/evaluate.py -d model.pkl -d data/features \
          -M scores.json \
          --plots-no-cache prc.json \
          --plots-no-cache roc.json \
          python3 src/evaluate.py model.pkl \
                  data/features scores.json prc.json roc.json
"""
!{cmd}
Running stage 'evaluate':                                             core>
> python3 src/evaluate.py model.pkl data/features scores.json prc.json roc.json
INFO:dvclive:Report path (if generated): /workspace/dvcdemo/evaluation/report.html
Arguments error. Usage:
        python evaluate.py model features
ERROR: failed to run: python3 src/evaluate.py model.pkl data/features scores.json prc.json roc.json, exited with 1

[3]:
!cat dvc.yaml
stages:
  prepare:
    cmd: python3 src/prepare.py data/data.xml
    deps:
    - data/data.xml
    - src/prepare.py
    params:
    - prepare.seed
    - prepare.split
    outs:
    - data/prepared
  featurize:
    cmd: python3 src/featurization.py data/prepared data/features
    deps:
    - data/prepared
    - src/featurization.py
    params:
    - featurize.max_features
    - featurize.ngrams
    outs:
    - data/features
  train:
    cmd: python3 src/train.py data/features model.pkl
    deps:
    - data/features
    - src/train.py
    params:
    - train.min_split
    - train.n_est
    - train.seed
    outs:
    - model.pkl
[4]:
!dvc metrics show
                                                                  core>
[5]:
!dvc plots modify prc.json -x recall -y precision
ERROR: Unable to find DVC file with output 'prc.json'

[6]:
!dvc plots modify roc.json -x fpr -y tpr
ERROR: Unable to find DVC file with output 'roc.json'

[7]:
!dvc plots show
file:///workspace/dvcdemo/dvc_plots/index.html

[8]:
!git add scores.json prc.json roc.json
fatal: pathspec 'scores.json' did not match any files
[9]:
!git commit -a -m "Create evaluation stage"
[master 0a78163] Create evaluation stage
 2 files changed, 4 insertions(+), 2 deletions(-)
[10]:
!cat dvc.yaml
stages:
  prepare:
    cmd: python3 src/prepare.py data/data.xml
    deps:
    - data/data.xml
    - src/prepare.py
    params:
    - prepare.seed
    - prepare.split
    outs:
    - data/prepared
  featurize:
    cmd: python3 src/featurization.py data/prepared data/features
    deps:
    - data/prepared
    - src/featurization.py
    params:
    - featurize.max_features
    - featurize.ngrams
    outs:
    - data/features
  train:
    cmd: python3 src/train.py data/features model.pkl
    deps:
    - data/features
    - src/train.py
    params:
    - train.min_split
    - train.n_est
    - train.seed
    outs:
    - model.pkl
[11]:
!cat params.yaml
prepare:
  split: 0.20
  seed: 20170428

featurize:
  max_features: 100
  ngrams: 1

train:
  seed: 20170428
  n_est: 50
  min_split: 0.01

Actualización de los parámetros#

[12]:
%%writefile params.yaml
prepare:
  split: 0.20
  seed: 20170428

featurize:
  max_features: 1500
  ngrams: 2

train:
  seed: 20170428
  n_est: 50
  min_split: 2
Overwriting params.yaml
[13]:
!dvc repro --quiet
The input data frame data/prepared/train.tsv size is (16011, 3)
The output matrix data/features/train.pkl size is (16011, 1502) and data type is float64
The input data frame data/prepared/test.tsv size is (3989, 3)
The output matrix data/features/test.pkl size is (3989, 1502) and data type is float64
Input matrix size (16011, 1502)
X matrix size (16011, 1500)
Y matrix size (16011,)

Comparación#

[14]:
!dvc params diff
Path         Param                   HEAD    workspace                core>
params.yaml  featurize.max_features  -       1500
params.yaml  featurize.ngrams        -       2
params.yaml  prepare.seed            -       20170428
params.yaml  prepare.split           -       0.2
params.yaml  train.min_split         -       2
params.yaml  train.n_est             -       50
params.yaml  train.seed              -       20170428

[15]:
!dvc metrics diff
                                                                  core>
[16]:
!dvc plots diff
file:///workspace/dvcdemo/dvc_plots/index.html