(Opcional) Paso 3: Métricas, parametros y gráficas#
Ultima modificación: Mayo 14, 2022
Esta es la continuación del ejemplo anterior
[1]:
%cd dvcdemo
/workspace/dvcdemo
Recolección de métricas#
[2]:
cmd = """
dvc run -n evaluate \
-d src/evaluate.py -d model.pkl -d data/features \
-M scores.json \
--plots-no-cache prc.json \
--plots-no-cache roc.json \
python3 src/evaluate.py model.pkl \
data/features scores.json prc.json roc.json
"""
!{cmd}
Running stage 'evaluate': core>
> python3 src/evaluate.py model.pkl data/features scores.json prc.json roc.json
INFO:dvclive:Report path (if generated): /workspace/dvcdemo/evaluation/report.html
Arguments error. Usage:
python evaluate.py model features
ERROR: failed to run: python3 src/evaluate.py model.pkl data/features scores.json prc.json roc.json, exited with 1
[3]:
!cat dvc.yaml
stages:
prepare:
cmd: python3 src/prepare.py data/data.xml
deps:
- data/data.xml
- src/prepare.py
params:
- prepare.seed
- prepare.split
outs:
- data/prepared
featurize:
cmd: python3 src/featurization.py data/prepared data/features
deps:
- data/prepared
- src/featurization.py
params:
- featurize.max_features
- featurize.ngrams
outs:
- data/features
train:
cmd: python3 src/train.py data/features model.pkl
deps:
- data/features
- src/train.py
params:
- train.min_split
- train.n_est
- train.seed
outs:
- model.pkl
[4]:
!dvc metrics show
core>
[5]:
!dvc plots modify prc.json -x recall -y precision
ERROR: Unable to find DVC file with output 'prc.json'
[6]:
!dvc plots modify roc.json -x fpr -y tpr
ERROR: Unable to find DVC file with output 'roc.json'
[7]:
!dvc plots show
file:///workspace/dvcdemo/dvc_plots/index.html
[8]:
!git add scores.json prc.json roc.json
fatal: pathspec 'scores.json' did not match any files
[9]:
!git commit -a -m "Create evaluation stage"
[master 0a78163] Create evaluation stage
2 files changed, 4 insertions(+), 2 deletions(-)
[10]:
!cat dvc.yaml
stages:
prepare:
cmd: python3 src/prepare.py data/data.xml
deps:
- data/data.xml
- src/prepare.py
params:
- prepare.seed
- prepare.split
outs:
- data/prepared
featurize:
cmd: python3 src/featurization.py data/prepared data/features
deps:
- data/prepared
- src/featurization.py
params:
- featurize.max_features
- featurize.ngrams
outs:
- data/features
train:
cmd: python3 src/train.py data/features model.pkl
deps:
- data/features
- src/train.py
params:
- train.min_split
- train.n_est
- train.seed
outs:
- model.pkl
[11]:
!cat params.yaml
prepare:
split: 0.20
seed: 20170428
featurize:
max_features: 100
ngrams: 1
train:
seed: 20170428
n_est: 50
min_split: 0.01
Actualización de los parámetros#
[12]:
%%writefile params.yaml
prepare:
split: 0.20
seed: 20170428
featurize:
max_features: 1500
ngrams: 2
train:
seed: 20170428
n_est: 50
min_split: 2
Overwriting params.yaml
[13]:
!dvc repro --quiet
The input data frame data/prepared/train.tsv size is (16011, 3)
The output matrix data/features/train.pkl size is (16011, 1502) and data type is float64
The input data frame data/prepared/test.tsv size is (3989, 3)
The output matrix data/features/test.pkl size is (3989, 1502) and data type is float64
Input matrix size (16011, 1502)
X matrix size (16011, 1500)
Y matrix size (16011,)
Comparación#
[14]:
!dvc params diff
Path Param HEAD workspace core>
params.yaml featurize.max_features - 1500
params.yaml featurize.ngrams - 2
params.yaml prepare.seed - 20170428
params.yaml prepare.split - 0.2
params.yaml train.min_split - 2
params.yaml train.n_est - 50
params.yaml train.seed - 20170428
[15]:
!dvc metrics diff
core>
[16]:
!dvc plots diff
file:///workspace/dvcdemo/dvc_plots/index.html