Sumarización usando Roll-up#

  • Ultima modificación: Mayo 24, 2022

Archivo de datos#

[1]:
!cat /opt/druid/quickstart/tutorial/rollup-data.json
{"timestamp":"2018-01-01T01:01:35Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":20,"bytes":9024}
{"timestamp":"2018-01-01T01:01:51Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":255,"bytes":21133}
{"timestamp":"2018-01-01T01:01:59Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":11,"bytes":5780}
{"timestamp":"2018-01-01T01:02:14Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":38,"bytes":6289}
{"timestamp":"2018-01-01T01:02:29Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":377,"bytes":359971}
{"timestamp":"2018-01-01T01:03:29Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":49,"bytes":10204}
{"timestamp":"2018-01-02T21:33:14Z","srcIP":"7.7.7.7", "dstIP":"8.8.8.8","packets":38,"bytes":6289}
{"timestamp":"2018-01-02T21:33:45Z","srcIP":"7.7.7.7", "dstIP":"8.8.8.8","packets":123,"bytes":93999}
{"timestamp":"2018-01-02T21:35:45Z","srcIP":"7.7.7.7", "dstIP":"8.8.8.8","packets":12,"bytes":2818}

Especificación para la ingestión#

[2]:
#
# Note el `"rollup" : true`  en la línea 33
#
!cat /opt/druid/quickstart/tutorial/rollup-index.json | nl
     1  {
     2    "type" : "index_parallel",
     3    "spec" : {
     4      "dataSchema" : {
     5        "dataSource" : "rollup-tutorial",
     6        "timestampSpec": {
     7          "column": "timestamp",
     8          "format": "iso"
     9        },
    10        "dimensionsSpec" : {
    11          "dimensions" : [
    12            "srcIP",
    13            "dstIP"
    14          ]
    15        },
    16        "metricsSpec" : [
    17          { "type" : "count", "name" : "count" },
    18          { "type" : "longSum", "name" : "packets", "fieldName" : "packets" },
    19          { "type" : "longSum", "name" : "bytes", "fieldName" : "bytes" }
    20        ],
    21        "granularitySpec" : {
    22          "type" : "uniform",
    23          "segmentGranularity" : "week",
    24          "queryGranularity" : "minute",
    25          "intervals" : ["2018-01-01/2018-01-03"],
    26          "rollup" : true
    27        }
    28      },
    29      "ioConfig" : {
    30        "type" : "index_parallel",
    31        "inputSource" : {
    32          "type" : "local",
    33          "baseDir" : "quickstart/tutorial",
    34          "filter" : "rollup-data.json"
    35        },
    36        "inputFormat" : {
    37          "type" : "json"
    38        },
    39        "appendToExisting" : false
    40      },
    41      "tuningConfig" : {
    42        "type" : "index_parallel",
    43        "maxRowsPerSegment" : 5000000,
    44        "maxRowsInMemory" : 25000
    45      }
    46    }
    47  }

Ejecución de la ingestión#

[3]:
!post-index-task --file /opt/druid/quickstart/tutorial/rollup-index.json --url http://localhost:8081
Beginning indexing data for rollup-tutorial
Task started: index_parallel_rollup-tutorial_hfmlejkb_2022-05-25T04:36:37.719Z
Task log:     http://localhost:8081/druid/indexer/v1/task/index_parallel_rollup-tutorial_hfmlejkb_2022-05-25T04:36:37.719Z/log
Task status:  http://localhost:8081/druid/indexer/v1/task/index_parallel_rollup-tutorial_hfmlejkb_2022-05-25T04:36:37.719Z/status
Task index_parallel_rollup-tutorial_hfmlejkb_2022-05-25T04:36:37.719Z still running...
Task index_parallel_rollup-tutorial_hfmlejkb_2022-05-25T04:36:37.719Z still running...
Task finished with status: SUCCESS
Completed indexing data for rollup-tutorial. Now loading indexed data onto the cluster...
rollup-tutorial is 0.0% finished loading...
rollup-tutorial is 0.0% finished loading...
rollup-tutorial is 0.0% finished loading...
rollup-tutorial loading complete! You may now query your data

Datos ingestados#

[4]:
!dsql -e 'select * from "rollup-tutorial"'
┌──────────────────────────┬────────┬───────┬─────────┬─────────┬─────────┐
│ __time                   │ bytes  │ count │ dstIP   │ packets │ srcIP   │
├──────────────────────────┼────────┼───────┼─────────┼─────────┼─────────┤
│ 2018-01-01T01:01:00.000Z │  35937 │     3 │ 2.2.2.2 │     286 │ 1.1.1.1 │
│ 2018-01-01T01:02:00.000Z │ 366260 │     2 │ 2.2.2.2 │     415 │ 1.1.1.1 │
│ 2018-01-01T01:03:00.000Z │  10204 │     1 │ 2.2.2.2 │      49 │ 1.1.1.1 │
│ 2018-01-02T21:33:00.000Z │ 100288 │     2 │ 8.8.8.8 │     161 │ 7.7.7.7 │
│ 2018-01-02T21:35:00.000Z │   2818 │     1 │ 8.8.8.8 │      12 │ 7.7.7.7 │
└──────────────────────────┴────────┴───────┴─────────┴─────────┴─────────┘
Retrieved 5 rows in 0.13s.

[5]:
#
# Note que los primeros registros fueron agregados usando como dimensiones
# {timestamp, srcIP, dstIP}.
#
!dsql -e 'select * from "rollup-tutorial" limit 3'
┌──────────────────────────┬────────┬───────┬─────────┬─────────┬─────────┐
│ __time                   │ bytes  │ count │ dstIP   │ packets │ srcIP   │
├──────────────────────────┼────────┼───────┼─────────┼─────────┼─────────┤
│ 2018-01-01T01:01:00.000Z │  35937 │     3 │ 2.2.2.2 │     286 │ 1.1.1.1 │
│ 2018-01-01T01:02:00.000Z │ 366260 │     2 │ 2.2.2.2 │     415 │ 1.1.1.1 │
│ 2018-01-01T01:03:00.000Z │  10204 │     1 │ 2.2.2.2 │      49 │ 1.1.1.1 │
└──────────────────────────┴────────┴───────┴─────────┴─────────┴─────────┘
Retrieved 3 rows in 0.06s.

[6]:
#
# Esto mismo pasó para los registros en 2018-01-01T01:02
#