Transformación de datos de entrada#

  • Ultima modificación: Mayo 24, 2022

Archivo de datos#

[1]:
!cat /opt/druid/quickstart/tutorial/transform-data.json
{"timestamp":"2018-01-01T07:01:35Z","animal":"octopus",  "location":1, "number":100}
{"timestamp":"2018-01-01T05:01:35Z","animal":"mongoose", "location":2,"number":200}
{"timestamp":"2018-01-01T06:01:35Z","animal":"snake", "location":3, "number":300}
{"timestamp":"2018-01-01T01:01:35Z","animal":"lion", "location":4, "number":300}

Especificación#

[2]:
#
# Linea 33
# Linea 38
#
!cat /opt/druid/quickstart/tutorial/transform-index.json | nl
     1  {
     2    "type" : "index_parallel",
     3    "spec" : {
     4      "dataSchema" : {
     5        "dataSource" : "transform-tutorial",
     6        "timestampSpec": {
     7          "column": "timestamp",
     8          "format": "iso"
     9        },
    10        "dimensionsSpec" : {
    11          "dimensions" : [
    12            "animal",
    13            { "name": "location", "type": "long" }
    14          ]
    15        },
    16        "metricsSpec" : [
    17          { "type" : "count", "name" : "count" },
    18          { "type" : "longSum", "name" : "number", "fieldName" : "number" },
    19          { "type" : "longSum", "name" : "triple-number", "fieldName" : "triple-number" }
    20        ],
    21        "granularitySpec" : {
    22          "type" : "uniform",
    23          "segmentGranularity" : "week",
    24          "queryGranularity" : "minute",
    25          "intervals" : ["2018-01-01/2018-01-03"],
    26          "rollup" : true
    27        },
    28        "transformSpec": {
    29          "transforms": [
    30            {
    31              "type": "expression",
    32              "name": "animal",
    33              "expression": "concat('super-', animal)"
    34            },
    35            {
    36              "type": "expression",
    37              "name": "triple-number",
    38              "expression": "number * 3"
    39            }
    40          ],
    41          "filter": {
    42            "type":"or",
    43            "fields": [
    44              { "type": "selector", "dimension": "animal", "value": "super-mongoose" },
    45              { "type": "selector", "dimension": "triple-number", "value": "300" },
    46              { "type": "selector", "dimension": "location", "value": "3" }
    47            ]
    48          }
    49        }
    50      },
    51      "ioConfig" : {
    52        "type" : "index_parallel",
    53        "inputSource" : {
    54          "type" : "local",
    55          "baseDir" : "quickstart/tutorial",
    56          "filter" : "transform-data.json"
    57        },
    58        "inputFormat" : {
    59          "type" : "json"
    60        },
    61        "appendToExisting" : false
    62      },
    63      "tuningConfig" : {
    64        "type" : "index_parallel",
    65        "maxRowsPerSegment" : 5000000,
    66        "maxRowsInMemory" : 25000
    67      }
    68    }
    69  }
[3]:
!post-index-task --file /opt/druid/quickstart/tutorial/transform-index.json --url http://localhost:8081
Beginning indexing data for transform-tutorial
Task started: index_parallel_transform-tutorial_aeceionl_2022-05-25T04:38:39.312Z
Task log:     http://localhost:8081/druid/indexer/v1/task/index_parallel_transform-tutorial_aeceionl_2022-05-25T04:38:39.312Z/log
Task status:  http://localhost:8081/druid/indexer/v1/task/index_parallel_transform-tutorial_aeceionl_2022-05-25T04:38:39.312Z/status
Task index_parallel_transform-tutorial_aeceionl_2022-05-25T04:38:39.312Z still running...
Task index_parallel_transform-tutorial_aeceionl_2022-05-25T04:38:39.312Z still running...
Task finished with status: SUCCESS
Completed indexing data for transform-tutorial. Now loading indexed data onto the cluster...
transform-tutorial is 0.0% finished loading...
transform-tutorial is 0.0% finished loading...
transform-tutorial is 0.0% finished loading...
transform-tutorial loading complete! You may now query your data

Consulta de los datos#

[4]:
!dsql -e 'select * from "transform-tutorial"'
┌──────────────────────────┬────────────────┬───────┬──────────┬────────┬───────────────┐
│ __time                   │ animal         │ count │ location │ number │ triple-number │
├──────────────────────────┼────────────────┼───────┼──────────┼────────┼───────────────┤
│ 2018-01-01T05:01:00.000Z │ super-mongoose │     1 │        2 │    200 │           600 │
│ 2018-01-01T06:01:00.000Z │ super-snake    │     1 │        3 │    300 │           900 │
│ 2018-01-01T07:01:00.000Z │ super-octopus  │     1 │        1 │    100 │           300 │
└──────────────────────────┴────────────────┴───────┴──────────┴────────┴───────────────┘
Retrieved 3 rows in 0.02s.