Actualización de datos en Apache Druid#
Ultima modificación: Mayo 24, 2022
Archivo de datos#
[1]:
!cat /opt/druid/quickstart/tutorial/updates-data.json
{"timestamp":"2018-01-01T01:01:35Z","animal":"tiger", "number":100}
{"timestamp":"2018-01-01T03:01:35Z","animal":"aardvark", "number":42}
{"timestamp":"2018-01-01T03:01:35Z","animal":"giraffe", "number":14124}
Especificación#
[2]:
!cat /opt/druid/quickstart/tutorial/updates-init-index.json | nl
1 {
2 "type" : "index_parallel",
3 "spec" : {
4 "dataSchema" : {
5 "dataSource" : "updates-tutorial",
6 "timestampSpec": {
7 "column": "timestamp",
8 "format": "iso"
9 },
10 "dimensionsSpec" : {
11 "dimensions" : [
12 "animal"
13 ]
14 },
15 "metricsSpec" : [
16 { "type" : "count", "name" : "count" },
17 { "type" : "longSum", "name" : "number", "fieldName" : "number" }
18 ],
19 "granularitySpec" : {
20 "type" : "uniform",
21 "segmentGranularity" : "week",
22 "queryGranularity" : "minute",
23 "intervals" : ["2018-01-01/2018-01-03"],
24 "rollup" : true
25 }
26 },
27 "ioConfig" : {
28 "type" : "index_parallel",
29 "inputSource" : {
30 "type" : "local",
31 "baseDir" : "quickstart/tutorial",
32 "filter" : "updates-data.json"
33 },
34 "inputFormat" : {
35 "type" : "json"
36 },
37 "appendToExisting" : false
38 },
39 "tuningConfig" : {
40 "type" : "index_parallel",
41 "maxRowsPerSegment" : 5000000,
42 "maxRowsInMemory" : 25000
43 }
44 }
45 }
Ingestión#
[3]:
!post-index-task --file /opt/druid/quickstart/tutorial/updates-init-index.json --url http://localhost:8081
Beginning indexing data for updates-tutorial
Task started: index_parallel_updates-tutorial_hcnhakpf_2022-05-25T04:37:16.543Z
Task log: http://localhost:8081/druid/indexer/v1/task/index_parallel_updates-tutorial_hcnhakpf_2022-05-25T04:37:16.543Z/log
Task status: http://localhost:8081/druid/indexer/v1/task/index_parallel_updates-tutorial_hcnhakpf_2022-05-25T04:37:16.543Z/status
Task index_parallel_updates-tutorial_hcnhakpf_2022-05-25T04:37:16.543Z still running...
Task index_parallel_updates-tutorial_hcnhakpf_2022-05-25T04:37:16.543Z still running...
Task finished with status: SUCCESS
Completed indexing data for updates-tutorial. Now loading indexed data onto the cluster...
updates-tutorial is 0.0% finished loading...
updates-tutorial is 0.0% finished loading...
updates-tutorial is 0.0% finished loading...
updates-tutorial is 0.0% finished loading...
updates-tutorial is 0.0% finished loading...
updates-tutorial is 0.0% finished loading...
updates-tutorial is 0.0% finished loading...
updates-tutorial is 0.0% finished loading...
updates-tutorial loading complete! You may now query your data
[4]:
!dsql -e 'select * from "updates-tutorial"'
┌──────────────────────────┬──────────┬───────┬────────┐
│ __time │ animal │ count │ number │
├──────────────────────────┼──────────┼───────┼────────┤
│ 2018-01-01T01:01:00.000Z │ tiger │ 1 │ 100 │
│ 2018-01-01T03:01:00.000Z │ aardvark │ 1 │ 42 │
│ 2018-01-01T03:01:00.000Z │ giraffe │ 1 │ 14124 │
└──────────────────────────┴──────────┴───────┴────────┘
Retrieved 3 rows in 0.02s.
Sobrescritura de los datos iniciales#
[5]:
!cat /opt/druid/quickstart/tutorial/updates-overwrite-index.json | nl
1 {
2 "type" : "index_parallel",
3 "spec" : {
4 "dataSchema" : {
5 "dataSource" : "updates-tutorial",
6 "timestampSpec": {
7 "column": "timestamp",
8 "format": "iso"
9 },
10 "dimensionsSpec" : {
11 "dimensions" : [
12 "animal"
13 ]
14 },
15 "metricsSpec" : [
16 { "type" : "count", "name" : "count" },
17 { "type" : "longSum", "name" : "number", "fieldName" : "number" }
18 ],
19 "granularitySpec" : {
20 "type" : "uniform",
21 "segmentGranularity" : "week",
22 "queryGranularity" : "minute",
23 "intervals" : ["2018-01-01/2018-01-03"],
24 "rollup" : true
25 }
26 },
27 "ioConfig" : {
28 "type" : "index_parallel",
29 "inputSource" : {
30 "type" : "local",
31 "baseDir" : "quickstart/tutorial",
32 "filter" : "updates-data2.json"
33 },
34 "inputFormat" : {
35 "type" : "json"
36 },
37 "appendToExisting" : false
38 },
39 "tuningConfig" : {
40 "type" : "index_parallel",
41 "maxRowsPerSegment" : 5000000,
42 "maxRowsInMemory" : 25000
43 }
44 }
45 }
[6]:
!post-index-task --file /opt/druid/quickstart/tutorial/updates-overwrite-index.json --url http://localhost:8081
Beginning indexing data for updates-tutorial
Task started: index_parallel_updates-tutorial_cennejoa_2022-05-25T04:38:07.689Z
Task log: http://localhost:8081/druid/indexer/v1/task/index_parallel_updates-tutorial_cennejoa_2022-05-25T04:38:07.689Z/log
Task status: http://localhost:8081/druid/indexer/v1/task/index_parallel_updates-tutorial_cennejoa_2022-05-25T04:38:07.689Z/status
Task index_parallel_updates-tutorial_cennejoa_2022-05-25T04:38:07.689Z still running...
Task index_parallel_updates-tutorial_cennejoa_2022-05-25T04:38:07.689Z still running...
Task finished with status: SUCCESS
Completed indexing data for updates-tutorial. Now loading indexed data onto the cluster...
updates-tutorial loading complete! You may now query your data
[7]:
!dsql -e 'select * from "updates-tutorial"'
┌──────────────────────────┬──────────┬───────┬────────┐
│ __time │ animal │ count │ number │
├──────────────────────────┼──────────┼───────┼────────┤
│ 2018-01-01T01:01:00.000Z │ tiger │ 1 │ 100 │
│ 2018-01-01T03:01:00.000Z │ aardvark │ 1 │ 42 │
│ 2018-01-01T03:01:00.000Z │ giraffe │ 1 │ 14124 │
└──────────────────────────┴──────────┴───────┴────────┘
Retrieved 3 rows in 0.02s.
Addición de datos (append)#
[8]:
!post-index-task --file /opt/druid/quickstart/tutorial/updates-append-index2.json --url http://localhost:8081
Beginning indexing data for updates-tutorial
Task started: index_parallel_updates-tutorial_kglonpkj_2022-05-25T04:38:18.491Z
Task log: http://localhost:8081/druid/indexer/v1/task/index_parallel_updates-tutorial_kglonpkj_2022-05-25T04:38:18.491Z/log
Task status: http://localhost:8081/druid/indexer/v1/task/index_parallel_updates-tutorial_kglonpkj_2022-05-25T04:38:18.491Z/status
Task index_parallel_updates-tutorial_kglonpkj_2022-05-25T04:38:18.491Z still running...
Task index_parallel_updates-tutorial_kglonpkj_2022-05-25T04:38:18.491Z still running...
Task finished with status: SUCCESS
Completed indexing data for updates-tutorial. Now loading indexed data onto the cluster...
updates-tutorial loading complete! You may now query your data
[9]:
!dsql -e 'select * from "updates-tutorial"'
┌──────────────────────────┬──────────┬───────┬────────┐
│ __time │ animal │ count │ number │
├──────────────────────────┼──────────┼───────┼────────┤
│ 2018-01-01T01:01:00.000Z │ tiger │ 1 │ 100 │
│ 2018-01-01T03:01:00.000Z │ aardvark │ 1 │ 42 │
│ 2018-01-01T03:01:00.000Z │ giraffe │ 1 │ 14124 │
└──────────────────────────┴──────────┴───────┴────────┘
Retrieved 3 rows in 0.02s.