Actualización de datos en Apache Druid#

  • Ultima modificación: Mayo 24, 2022

Archivo de datos#

[1]:
!cat /opt/druid/quickstart/tutorial/updates-data.json
{"timestamp":"2018-01-01T01:01:35Z","animal":"tiger", "number":100}
{"timestamp":"2018-01-01T03:01:35Z","animal":"aardvark", "number":42}
{"timestamp":"2018-01-01T03:01:35Z","animal":"giraffe", "number":14124}

Especificación#

[2]:
!cat /opt/druid/quickstart/tutorial/updates-init-index.json | nl
     1  {
     2    "type" : "index_parallel",
     3    "spec" : {
     4      "dataSchema" : {
     5        "dataSource" : "updates-tutorial",
     6        "timestampSpec": {
     7          "column": "timestamp",
     8          "format": "iso"
     9        },
    10        "dimensionsSpec" : {
    11          "dimensions" : [
    12            "animal"
    13          ]
    14        },
    15        "metricsSpec" : [
    16          { "type" : "count", "name" : "count" },
    17          { "type" : "longSum", "name" : "number", "fieldName" : "number" }
    18        ],
    19        "granularitySpec" : {
    20          "type" : "uniform",
    21          "segmentGranularity" : "week",
    22          "queryGranularity" : "minute",
    23          "intervals" : ["2018-01-01/2018-01-03"],
    24          "rollup" : true
    25        }
    26      },
    27      "ioConfig" : {
    28        "type" : "index_parallel",
    29        "inputSource" : {
    30          "type" : "local",
    31          "baseDir" : "quickstart/tutorial",
    32          "filter" : "updates-data.json"
    33        },
    34        "inputFormat" : {
    35          "type" : "json"
    36        },
    37        "appendToExisting" : false
    38      },
    39      "tuningConfig" : {
    40        "type" : "index_parallel",
    41        "maxRowsPerSegment" : 5000000,
    42        "maxRowsInMemory" : 25000
    43      }
    44    }
    45  }

Ingestión#

[3]:
!post-index-task --file /opt/druid/quickstart/tutorial/updates-init-index.json --url http://localhost:8081
Beginning indexing data for updates-tutorial
Task started: index_parallel_updates-tutorial_hcnhakpf_2022-05-25T04:37:16.543Z
Task log:     http://localhost:8081/druid/indexer/v1/task/index_parallel_updates-tutorial_hcnhakpf_2022-05-25T04:37:16.543Z/log
Task status:  http://localhost:8081/druid/indexer/v1/task/index_parallel_updates-tutorial_hcnhakpf_2022-05-25T04:37:16.543Z/status
Task index_parallel_updates-tutorial_hcnhakpf_2022-05-25T04:37:16.543Z still running...
Task index_parallel_updates-tutorial_hcnhakpf_2022-05-25T04:37:16.543Z still running...
Task finished with status: SUCCESS
Completed indexing data for updates-tutorial. Now loading indexed data onto the cluster...
updates-tutorial is 0.0% finished loading...
updates-tutorial is 0.0% finished loading...
updates-tutorial is 0.0% finished loading...
updates-tutorial is 0.0% finished loading...
updates-tutorial is 0.0% finished loading...
updates-tutorial is 0.0% finished loading...
updates-tutorial is 0.0% finished loading...
updates-tutorial is 0.0% finished loading...
updates-tutorial loading complete! You may now query your data
[4]:
!dsql -e 'select * from "updates-tutorial"'
┌──────────────────────────┬──────────┬───────┬────────┐
│ __time                   │ animal   │ count │ number │
├──────────────────────────┼──────────┼───────┼────────┤
│ 2018-01-01T01:01:00.000Z │ tiger    │     1 │    100 │
│ 2018-01-01T03:01:00.000Z │ aardvark │     1 │     42 │
│ 2018-01-01T03:01:00.000Z │ giraffe  │     1 │  14124 │
└──────────────────────────┴──────────┴───────┴────────┘
Retrieved 3 rows in 0.02s.

Sobrescritura de los datos iniciales#

[5]:
!cat /opt/druid/quickstart/tutorial/updates-overwrite-index.json | nl
     1  {
     2    "type" : "index_parallel",
     3    "spec" : {
     4      "dataSchema" : {
     5        "dataSource" : "updates-tutorial",
     6        "timestampSpec": {
     7          "column": "timestamp",
     8          "format": "iso"
     9        },
    10        "dimensionsSpec" : {
    11          "dimensions" : [
    12            "animal"
    13          ]
    14        },
    15        "metricsSpec" : [
    16          { "type" : "count", "name" : "count" },
    17          { "type" : "longSum", "name" : "number", "fieldName" : "number" }
    18        ],
    19        "granularitySpec" : {
    20          "type" : "uniform",
    21          "segmentGranularity" : "week",
    22          "queryGranularity" : "minute",
    23          "intervals" : ["2018-01-01/2018-01-03"],
    24          "rollup" : true
    25        }
    26      },
    27      "ioConfig" : {
    28        "type" : "index_parallel",
    29        "inputSource" : {
    30          "type" : "local",
    31          "baseDir" : "quickstart/tutorial",
    32          "filter" : "updates-data2.json"
    33        },
    34        "inputFormat" : {
    35          "type" : "json"
    36        },
    37        "appendToExisting" : false
    38      },
    39      "tuningConfig" : {
    40        "type" : "index_parallel",
    41        "maxRowsPerSegment" : 5000000,
    42        "maxRowsInMemory" : 25000
    43      }
    44    }
    45  }
[6]:
!post-index-task --file /opt/druid/quickstart/tutorial/updates-overwrite-index.json --url http://localhost:8081
Beginning indexing data for updates-tutorial
Task started: index_parallel_updates-tutorial_cennejoa_2022-05-25T04:38:07.689Z
Task log:     http://localhost:8081/druid/indexer/v1/task/index_parallel_updates-tutorial_cennejoa_2022-05-25T04:38:07.689Z/log
Task status:  http://localhost:8081/druid/indexer/v1/task/index_parallel_updates-tutorial_cennejoa_2022-05-25T04:38:07.689Z/status
Task index_parallel_updates-tutorial_cennejoa_2022-05-25T04:38:07.689Z still running...
Task index_parallel_updates-tutorial_cennejoa_2022-05-25T04:38:07.689Z still running...
Task finished with status: SUCCESS
Completed indexing data for updates-tutorial. Now loading indexed data onto the cluster...
updates-tutorial loading complete! You may now query your data
[7]:
!dsql -e 'select * from "updates-tutorial"'
┌──────────────────────────┬──────────┬───────┬────────┐
│ __time                   │ animal   │ count │ number │
├──────────────────────────┼──────────┼───────┼────────┤
│ 2018-01-01T01:01:00.000Z │ tiger    │     1 │    100 │
│ 2018-01-01T03:01:00.000Z │ aardvark │     1 │     42 │
│ 2018-01-01T03:01:00.000Z │ giraffe  │     1 │  14124 │
└──────────────────────────┴──────────┴───────┴────────┘
Retrieved 3 rows in 0.02s.

Addición de datos (append)#

[8]:
!post-index-task --file /opt/druid/quickstart/tutorial/updates-append-index2.json --url http://localhost:8081
Beginning indexing data for updates-tutorial
Task started: index_parallel_updates-tutorial_kglonpkj_2022-05-25T04:38:18.491Z
Task log:     http://localhost:8081/druid/indexer/v1/task/index_parallel_updates-tutorial_kglonpkj_2022-05-25T04:38:18.491Z/log
Task status:  http://localhost:8081/druid/indexer/v1/task/index_parallel_updates-tutorial_kglonpkj_2022-05-25T04:38:18.491Z/status
Task index_parallel_updates-tutorial_kglonpkj_2022-05-25T04:38:18.491Z still running...
Task index_parallel_updates-tutorial_kglonpkj_2022-05-25T04:38:18.491Z still running...
Task finished with status: SUCCESS
Completed indexing data for updates-tutorial. Now loading indexed data onto the cluster...
updates-tutorial loading complete! You may now query your data
[9]:
!dsql -e 'select * from "updates-tutorial"'
┌──────────────────────────┬──────────┬───────┬────────┐
│ __time                   │ animal   │ count │ number │
├──────────────────────────┼──────────┼───────┼────────┤
│ 2018-01-01T01:01:00.000Z │ tiger    │     1 │    100 │
│ 2018-01-01T03:01:00.000Z │ aardvark │     1 │     42 │
│ 2018-01-01T03:01:00.000Z │ giraffe  │     1 │  14124 │
└──────────────────────────┴──────────┴───────┴────────┘
Retrieved 3 rows in 0.02s.