Transacciones ACID (Insert/ Update / Delete) en Hive#

  • Última modificación: Mayo 17, 2022 | YouTube

El lenguaje SQL estándar provee directivas para la insertar, actualizar y borrar registros en una tabla. En este tutorial se presentan ejemplos representativos de estas instrucciones en Hive.

Cell magic %%hive#

[1]:
from IPython.core.magic import Magics, cell_magic, line_magic, magics_class
from pexpect import spawn

TIMEOUT = 60
PROG = "hive"
PROMPT = ["\r\n    > ", "\r\nhive> "]
QUIT = "quit;"


@magics_class
class Magic(Magics):
    def __init__(self, shell):
        super().__init__(shell)
        self.app = spawn(PROG, timeout=60)
        self.app.expect(PROMPT)

    @cell_magic
    def hive(self, line, cell):
        cell_lines = [cell_line.strip() for cell_line in cell.split("\n")]
        cell_lines = [cell_line for cell_line in cell_lines if cell_line != ""]
        for cell_line in cell_lines:
            self.app.sendline(cell_line)
            self.app.expect(PROMPT, timeout=TIMEOUT)
            output = self.app.before.decode()
            output = output.replace("\r\n", "\n")
            output = output.split("\n")
            output = [output_line.strip() for output_line in output]
            for output_line in output:
                if output_line not in cell_lines:
                    print(output_line)
        return None

    @line_magic
    def quit(self, line):
        self.app.sendline(QUIT)


def load_ipython_extension(ip):
    ip.register_magics(Magic(ip))


load_ipython_extension(ip=get_ipython())

Creación de la tabla#

[2]:
%%hive
DROP DATABASE IF EXISTS demo CASCADE;
CREATE DATABASE demo;
USE demo;

CREATE TABLE persons (
    id        INT,
    firstname STRING,
    surname   STRING,
    birthday  TIMESTAMP,
    quantity  INT
)
PARTITIONED BY (color STRING)
CLUSTERED BY(id) INTO 3 BUCKETS
STORED AS ORC
LOCATION '/tmp/hive-partitioned'
TBLPROPERTIES ('transactional'='true');
OK
Time taken: 5.841 seconds
OK
Time taken: 0.334 seconds
OK
Time taken: 0.023 seconds
OK
Time taken: 0.25 seconds

Preparación#

Se deben habilitar las características de Hive para manejo de transacciones ACID.

[3]:
%%hive
SET hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
SET hive.support.concurrency=true;
SET hive.enforce.bucketing=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.compactor.initiator.on=true;
SET hive.compactor.worker.threads=1;

INSERT#

INSERT INTO TABLE tablename VALUES PARTITION () values_row [, values_row ...]

values_row:
   (value [, value ...])

Note que a diferencia de SQL, aca no es posible indicar para que columnas se van a insertar los valores, de tal manera que siempre se deben dar valores para todas las columnas.

[4]:
%%hive
--
-- Inserta el registro en la tabla.
-- Los valores están en el mismo orden de los campos.
--
INSERT INTO persons PARTITION (color='green') VALUES
    (1,"Vivian","Hamilton","1971-07-08",1),
    (2,"Karen","Holcomb","1974-05-23",4),
    (12,"Hope","Coffey","1973-12-24",5),
    (17,"Chanda","Boyer","1973-04-01",4);

INSERT INTO persons PARTITION (color='black') VALUES
    (4,"Roth","Fry","1975-01-29",1),
    (10,"Kylan","Sexton","1975-02-28",4);

INSERT INTO persons PARTITION (color='blue') VALUES
    (5,"Zoe","Conway","1974-07-03",2),
    (7,"Driscoll","Klein","1970-10-05",5),
    (15,"Hope","Silva","1970-07-01",5);

INSERT INTO persons PARTITION (color='orange') VALUES
    (3,"Cody","Garrett","1973-04-22",1),
    (16,"Ayanna","Jarvis","1974-02-11",5);

INSERT INTO persons PARTITION (color='violet') VALUES
    (6,"Gretchen","Kinney","1974-10-18",1);

INSERT INTO persons PARTITION (color='red') VALUES
    (8,"Karyn","Diaz","1969-02-24",1),
    (14,"Clio","Noel","1972-12-12",5);

INSERT INTO persons PARTITION (color='indigo') VALUES
    (9,"Merritt","Guy","1974-10-17",4),
    (11,"Jordan","Estes","1969-12-07",4);

INSERT INTO persons PARTITION (color='gray') VALUES
    (13,"Vivian","Crane","1970-08-27",5);

INSERT INTO persons PARTITION (color='yellow') VALUES
    (18,"Chadwick","Knight","1973-04-29",1);

WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20220517144650_1350f9e4-7c80-4a1f-b79b-94852ac5f89f
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 3
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1652793922537_0014, Tracking URL = http://4feb4ed7d52d:8088/proxy/application_1652793922537_0014/
Kill Command = /opt/hadoop/bin/hadoop job  -kill job_1652793922537_0014
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 3
2022-05-17 14:46:58,056 Stage-1 map = 0%,  reduce = 0%
2022-05-17 14:47:01,178 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.77 sec
2022-05-17 14:47:07,402 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 4.71 sec
2022-05-17 14:47:08,427 Stage-1 map = 100%,  reduce = 67%, Cumulative CPU 7.99 sec
2022-05-17 14:47:09,453 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 10.5 sec
MapReduce Total cumulative CPU time: 10 seconds 500 msec
Ended Job = job_1652793922537_0014
Loading data to table demo.persons partition (color=green)
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1  Reduce: 3   Cumulative CPU: 10.5 sec   HDFS Read: 20555 HDFS Write: 2996 SUCCESS
Total MapReduce CPU Time Spent: 10 seconds 500 msec
OK
Time taken: 20.393 seconds
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20220517144711_74e414ca-7765-4970-9306-a584392f1c7e
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 3
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1652793922537_0015, Tracking URL = http://4feb4ed7d52d:8088/proxy/application_1652793922537_0015/
Kill Command = /opt/hadoop/bin/hadoop job  -kill job_1652793922537_0015
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 3
2022-05-17 14:47:19,919 Stage-1 map = 0%,  reduce = 0%
2022-05-17 14:47:24,026 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.92 sec
2022-05-17 14:47:29,169 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 4.49 sec
2022-05-17 14:47:30,192 Stage-1 map = 100%,  reduce = 67%, Cumulative CPU 7.17 sec
2022-05-17 14:47:31,216 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 9.6 sec
MapReduce Total cumulative CPU time: 9 seconds 600 msec
Ended Job = job_1652793922537_0015
Loading data to table demo.persons partition (color=black)
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1  Reduce: 3   Cumulative CPU: 9.6 sec   HDFS Read: 20491 HDFS Write: 1574 SUCCESS
Total MapReduce CPU Time Spent: 9 seconds 600 msec
OK
Time taken: 21.193 seconds
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20220517144732_1995c38c-7790-49b6-8534-99d31b09d868
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 3
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1652793922537_0016, Tracking URL = http://4feb4ed7d52d:8088/proxy/application_1652793922537_0016/
Kill Command = /opt/hadoop/bin/hadoop job  -kill job_1652793922537_0016
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 3
2022-05-17 14:47:41,805 Stage-1 map = 0%,  reduce = 0%
2022-05-17 14:47:45,926 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.72 sec
2022-05-17 14:47:51,043 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 4.53 sec
2022-05-17 14:47:52,065 Stage-1 map = 100%,  reduce = 67%, Cumulative CPU 7.2 sec
2022-05-17 14:47:53,089 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 9.7 sec
MapReduce Total cumulative CPU time: 9 seconds 700 msec
Ended Job = job_1652793922537_0016
Loading data to table demo.persons partition (color=blue)
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1  Reduce: 3   Cumulative CPU: 9.7 sec   HDFS Read: 20581 HDFS Write: 2912 SUCCESS
Total MapReduce CPU Time Spent: 9 seconds 700 msec
OK
Time taken: 21.718 seconds
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20220517144754_d4929f45-b1d3-4145-93f0-601b0d1fb2e7
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 3
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1652793922537_0017, Tracking URL = http://4feb4ed7d52d:8088/proxy/application_1652793922537_0017/
Kill Command = /opt/hadoop/bin/hadoop job  -kill job_1652793922537_0017
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 3
2022-05-17 14:48:04,101 Stage-1 map = 0%,  reduce = 0%
2022-05-17 14:48:08,213 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.85 sec
2022-05-17 14:48:13,354 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 4.67 sec
2022-05-17 14:48:14,377 Stage-1 map = 100%,  reduce = 67%, Cumulative CPU 7.5 sec
2022-05-17 14:48:15,403 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 9.94 sec
MapReduce Total cumulative CPU time: 9 seconds 940 msec
Ended Job = job_1652793922537_0017
Loading data to table demo.persons partition (color=orange)
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1  Reduce: 3   Cumulative CPU: 9.94 sec   HDFS Read: 20586 HDFS Write: 2220 SUCCESS
Total MapReduce CPU Time Spent: 9 seconds 940 msec
OK
Time taken: 21.955 seconds
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20220517144816_75278cf6-d2e7-418a-89ff-c9903503ce09
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 3
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1652793922537_0018, Tracking URL = http://4feb4ed7d52d:8088/proxy/application_1652793922537_0018/
Kill Command = /opt/hadoop/bin/hadoop job  -kill job_1652793922537_0018
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 3
2022-05-17 14:48:25,381 Stage-1 map = 0%,  reduce = 0%
2022-05-17 14:48:29,492 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.73 sec
2022-05-17 14:48:35,674 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 4.63 sec
2022-05-17 14:48:36,699 Stage-1 map = 100%,  reduce = 67%, Cumulative CPU 7.36 sec
2022-05-17 14:48:37,724 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 9.79 sec
MapReduce Total cumulative CPU time: 9 seconds 790 msec
Ended Job = job_1652793922537_0018
Loading data to table demo.persons partition (color=violet)
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1  Reduce: 3   Cumulative CPU: 9.79 sec   HDFS Read: 20559 HDFS Write: 1538 SUCCESS
Total MapReduce CPU Time Spent: 9 seconds 790 msec
OK
Time taken: 22.258 seconds
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20220517144839_295ad16e-eb7c-44cd-8a8c-17096df9c480
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 3
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1652793922537_0019, Tracking URL = http://4feb4ed7d52d:8088/proxy/application_1652793922537_0019/
Kill Command = /opt/hadoop/bin/hadoop job  -kill job_1652793922537_0019
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 3
2022-05-17 14:48:47,750 Stage-1 map = 0%,  reduce = 0%
2022-05-17 14:48:51,842 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.68 sec
2022-05-17 14:48:56,975 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 4.5 sec
2022-05-17 14:48:59,021 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 9.79 sec
MapReduce Total cumulative CPU time: 9 seconds 790 msec
Ended Job = job_1652793922537_0019
Loading data to table demo.persons partition (color=red)
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1  Reduce: 3   Cumulative CPU: 9.79 sec   HDFS Read: 20526 HDFS Write: 1579 SUCCESS
Total MapReduce CPU Time Spent: 9 seconds 790 msec
OK
Time taken: 22.078 seconds
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20220517144901_ebf6a1d4-e8cd-4d93-b1ab-63d762d4c433
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 3
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1652793922537_0020, Tracking URL = http://4feb4ed7d52d:8088/proxy/application_1652793922537_0020/
Kill Command = /opt/hadoop/bin/hadoop job  -kill job_1652793922537_0020
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 3
2022-05-17 14:49:09,651 Stage-1 map = 0%,  reduce = 0%
2022-05-17 14:49:13,773 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.8 sec
2022-05-17 14:49:18,974 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 4.65 sec
2022-05-17 14:49:21,022 Stage-1 map = 100%,  reduce = 67%, Cumulative CPU 7.39 sec
2022-05-17 14:49:22,048 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 9.75 sec
MapReduce Total cumulative CPU time: 9 seconds 750 msec
Ended Job = job_1652793922537_0020
Loading data to table demo.persons partition (color=indigo)
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1  Reduce: 3   Cumulative CPU: 9.75 sec   HDFS Read: 20584 HDFS Write: 2226 SUCCESS
Total MapReduce CPU Time Spent: 9 seconds 750 msec
OK
Time taken: 21.801 seconds
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20220517144923_d95da8ef-8248-4d0a-8a41-8d78367a7e08
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 3
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1652793922537_0021, Tracking URL = http://4feb4ed7d52d:8088/proxy/application_1652793922537_0021/
Kill Command = /opt/hadoop/bin/hadoop job  -kill job_1652793922537_0021
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 3
2022-05-17 14:49:31,764 Stage-1 map = 0%,  reduce = 0%
2022-05-17 14:49:35,871 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.73 sec
2022-05-17 14:49:41,019 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 4.37 sec
2022-05-17 14:49:43,072 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 9.33 sec
MapReduce Total cumulative CPU time: 9 seconds 330 msec
Ended Job = job_1652793922537_0021
Loading data to table demo.persons partition (color=gray)
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1  Reduce: 3   Cumulative CPU: 9.33 sec   HDFS Read: 20527 HDFS Write: 1531 SUCCESS
Total MapReduce CPU Time Spent: 9 seconds 330 msec
OK
Time taken: 20.871 seconds
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20220517144944_3244cc24-4ca7-4412-ade3-5b94faa22883
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 3
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1652793922537_0022, Tracking URL = http://4feb4ed7d52d:8088/proxy/application_1652793922537_0022/
Kill Command = /opt/hadoop/bin/hadoop job  -kill job_1652793922537_0022
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 3
2022-05-17 14:49:53,950 Stage-1 map = 0%,  reduce = 0%
2022-05-17 14:49:58,052 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.8 sec
2022-05-17 14:50:03,175 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 4.6 sec
2022-05-17 14:50:04,200 Stage-1 map = 100%,  reduce = 67%, Cumulative CPU 7.29 sec
2022-05-17 14:50:05,301 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 9.74 sec
MapReduce Total cumulative CPU time: 9 seconds 740 msec
Ended Job = job_1652793922537_0022
Loading data to table demo.persons partition (color=yellow)
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1  Reduce: 3   Cumulative CPU: 9.74 sec   HDFS Read: 20551 HDFS Write: 1546 SUCCESS
Total MapReduce CPU Time Spent: 9 seconds 740 msec
OK
Time taken: 22.103 seconds
[5]:
%%hive
SELECT * FROM persons;
OK
10      Kylan   Sexton  1975-02-28 00:00:00     4       black
4       Roth    Fry     1975-01-29 00:00:00     1       black
15      Hope    Silva   1970-07-01 00:00:00     5       blue
7       Driscoll        Klein   1970-10-05 00:00:00     5       blue
5       Zoe     Conway  1974-07-03 00:00:00     2       blue
13      Vivian  Crane   1970-08-27 00:00:00     5       gray
12      Hope    Coffey  1973-12-24 00:00:00     5       green
1       Vivian  Hamilton        1971-07-08 00:00:00     1       green
17      Chanda  Boyer   1973-04-01 00:00:00     4       green
2       Karen   Holcomb 1974-05-23 00:00:00     4       green
9       Merritt Guy     1974-10-17 00:00:00     4       indigo
11      Jordan  Estes   1969-12-07 00:00:00     4       indigo
3       Cody    Garrett 1973-04-22 00:00:00     1       orange
16      Ayanna  Jarvis  1974-02-11 00:00:00     5       orange
14      Clio    Noel    1972-12-12 00:00:00     5       red
8       Karyn   Diaz    1969-02-24 00:00:00     1       red
6       Gretchen        Kinney  1974-10-18 00:00:00     1       violet
18      Chadwick        Knight  1973-04-29 00:00:00     1       yellow
Time taken: 0.207 seconds, Fetched: 18 row(s)
[6]:
!hdfs dfs -ls /tmp/hive-partitioned
Found 9 items
drwxrwxrwx   - root supergroup          0 2022-05-17 14:47 /tmp/hive-partitioned/color=black
drwxrwxrwx   - root supergroup          0 2022-05-17 14:47 /tmp/hive-partitioned/color=blue
drwxrwxrwx   - root supergroup          0 2022-05-17 14:49 /tmp/hive-partitioned/color=gray
drwxrwxrwx   - root supergroup          0 2022-05-17 14:47 /tmp/hive-partitioned/color=green
drwxrwxrwx   - root supergroup          0 2022-05-17 14:49 /tmp/hive-partitioned/color=indigo
drwxrwxrwx   - root supergroup          0 2022-05-17 14:48 /tmp/hive-partitioned/color=orange
drwxrwxrwx   - root supergroup          0 2022-05-17 14:49 /tmp/hive-partitioned/color=red
drwxrwxrwx   - root supergroup          0 2022-05-17 14:48 /tmp/hive-partitioned/color=violet
drwxrwxrwx   - root supergroup          0 2022-05-17 14:50 /tmp/hive-partitioned/color=yellow
[7]:
!hdfs dfs -ls /tmp/hive-partitioned/color=black
Found 1 items
drwxr-xr-x   - root supergroup          0 2022-05-17 14:47 /tmp/hive-partitioned/color=black/delta_0000002_0000002_0000
[8]:
!hdfs dfs -ls /tmp/hive-partitioned/color=black/delta_0000002_0000002_0000
Found 3 items
-rw-r--r--   1 root supergroup        223 2022-05-17 14:47 /tmp/hive-partitioned/color=black/delta_0000002_0000002_0000/bucket_00000
-rw-r--r--   1 root supergroup        936 2022-05-17 14:47 /tmp/hive-partitioned/color=black/delta_0000002_0000002_0000/bucket_00001
-rw-r--r--   1 root supergroup        223 2022-05-17 14:47 /tmp/hive-partitioned/color=black/delta_0000002_0000002_0000/bucket_00002

UPDATE#

UPDATE tablename SET column = value [, column = value ...] [WHERE expression]

Véase https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions

[9]:
%%hive
UPDATE persons SET quantity = 100 WHERE color = 'red';
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20220517145011_7cabef7b-52ff-4671-a383-b636bd0cb2a6
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 3
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1652793922537_0023, Tracking URL = http://4feb4ed7d52d:8088/proxy/application_1652793922537_0023/
Kill Command = /opt/hadoop/bin/hadoop job  -kill job_1652793922537_0023
Hadoop job information for Stage-1: number of mappers: 3; number of reducers: 3
2022-05-17 14:50:15,598 Stage-1 map = 0%,  reduce = 0%
2022-05-17 14:50:19,686 Stage-1 map = 33%,  reduce = 0%, Cumulative CPU 1.61 sec
2022-05-17 14:50:20,730 Stage-1 map = 67%,  reduce = 0%, Cumulative CPU 3.48 sec
2022-05-17 14:50:22,782 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 5.85 sec
2022-05-17 14:50:24,843 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 7.21 sec
2022-05-17 14:50:25,877 Stage-1 map = 100%,  reduce = 67%, Cumulative CPU 8.7 sec
2022-05-17 14:50:26,901 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 10.69 sec
MapReduce Total cumulative CPU time: 10 seconds 690 msec
Ended Job = job_1652793922537_0023
Loading data to table demo.persons partition (color=null)


Time taken to load dynamic partitions: 0.116 seconds
Time taken for adding to write entity : 0.0 seconds
MapReduce Jobs Launched:
Stage-Stage-1: Map: 3  Reduce: 3   Cumulative CPU: 10.69 sec   HDFS Read: 33100 HDFS Write: 1067 SUCCESS
Total MapReduce CPU Time Spent: 10 seconds 690 msec
OK
Time taken: 16.249 seconds
[10]:
%%hive
SELECT * FROM persons;
OK
10      Kylan   Sexton  1975-02-28 00:00:00     4       black
4       Roth    Fry     1975-01-29 00:00:00     1       black
15      Hope    Silva   1970-07-01 00:00:00     5       blue
7       Driscoll        Klein   1970-10-05 00:00:00     5       blue
5       Zoe     Conway  1974-07-03 00:00:00     2       blue
13      Vivian  Crane   1970-08-27 00:00:00     5       gray
12      Hope    Coffey  1973-12-24 00:00:00     5       green
1       Vivian  Hamilton        1971-07-08 00:00:00     1       green
17      Chanda  Boyer   1973-04-01 00:00:00     4       green
2       Karen   Holcomb 1974-05-23 00:00:00     4       green
9       Merritt Guy     1974-10-17 00:00:00     4       indigo
11      Jordan  Estes   1969-12-07 00:00:00     4       indigo
3       Cody    Garrett 1973-04-22 00:00:00     1       orange
16      Ayanna  Jarvis  1974-02-11 00:00:00     5       orange
14      Clio    Noel    1972-12-12 00:00:00     100     red
8       Karyn   Diaz    1969-02-24 00:00:00     100     red
6       Gretchen        Kinney  1974-10-18 00:00:00     1       violet
18      Chadwick        Knight  1973-04-29 00:00:00     1       yellow
Time taken: 0.127 seconds, Fetched: 18 row(s)

DELETE#

DELETE FROM tablename [WHERE expression]

Véase https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions

[11]:
%%hive
DELETE FROM persons WHERE id = 10;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20220517145028_7b530327-329d-4010-a61a-86454a1a66be
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 3
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1652793922537_0024, Tracking URL = http://4feb4ed7d52d:8088/proxy/application_1652793922537_0024/
Kill Command = /opt/hadoop/bin/hadoop job  -kill job_1652793922537_0024
Hadoop job information for Stage-1: number of mappers: 27; number of reducers: 3
2022-05-17 14:50:37,402 Stage-1 map = 0%,  reduce = 0%
2022-05-17 14:50:42,561 Stage-1 map = 4%,  reduce = 0%, Cumulative CPU 2.71 sec
2022-05-17 14:50:43,607 Stage-1 map = 7%,  reduce = 0%, Cumulative CPU 5.56 sec
2022-05-17 14:50:44,639 Stage-1 map = 11%,  reduce = 0%, Cumulative CPU 9.09 sec
2022-05-17 14:50:47,999 Stage-1 map = 15%,  reduce = 0%, Cumulative CPU 15.2 sec
2022-05-17 14:50:49,018 Stage-1 map = 22%,  reduce = 0%, Cumulative CPU 27.3 sec
2022-05-17 14:50:50,093 Stage-1 map = 26%,  reduce = 0%, Cumulative CPU 31.3 sec
2022-05-17 14:50:51,125 Stage-1 map = 33%,  reduce = 0%, Cumulative CPU 40.71 sec
2022-05-17 14:50:53,204 Stage-1 map = 37%,  reduce = 0%, Cumulative CPU 44.28 sec
2022-05-17 14:50:54,238 Stage-1 map = 41%,  reduce = 0%, Cumulative CPU 48.05 sec
2022-05-17 14:50:55,280 Stage-1 map = 44%,  reduce = 0%, Cumulative CPU 51.66 sec
2022-05-17 14:50:56,305 Stage-1 map = 52%,  reduce = 0%, Cumulative CPU 59.47 sec
2022-05-17 14:50:58,362 Stage-1 map = 56%,  reduce = 0%, Cumulative CPU 62.34 sec
2022-05-17 14:50:59,394 Stage-1 map = 59%,  reduce = 0%, Cumulative CPU 65.29 sec
2022-05-17 14:51:01,470 Stage-1 map = 67%,  reduce = 0%, Cumulative CPU 73.18 sec
2022-05-17 14:51:02,509 Stage-1 map = 70%,  reduce = 0%, Cumulative CPU 76.32 sec
2022-05-17 14:51:04,590 Stage-1 map = 78%,  reduce = 0%, Cumulative CPU 83.16 sec
2022-05-17 14:51:05,619 Stage-1 map = 81%,  reduce = 8%, Cumulative CPU 85.91 sec
2022-05-17 14:51:07,690 Stage-1 map = 81%,  reduce = 17%, Cumulative CPU 86.2 sec
2022-05-17 14:51:08,714 Stage-1 map = 89%,  reduce = 17%, Cumulative CPU 93.14 sec
2022-05-17 14:51:09,747 Stage-1 map = 93%,  reduce = 17%, Cumulative CPU 95.85 sec
2022-05-17 14:51:11,796 Stage-1 map = 93%,  reduce = 19%, Cumulative CPU 95.9 sec
2022-05-17 14:51:12,816 Stage-1 map = 100%,  reduce = 42%, Cumulative CPU 103.73 sec
2022-05-17 14:51:13,842 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 107.47 sec
MapReduce Total cumulative CPU time: 1 minutes 47 seconds 470 msec
Ended Job = job_1652793922537_0024
Loading data to table demo.persons partition (color=null)


Time taken to load dynamic partitions: 0.08 seconds
Time taken for adding to write entity : 0.0 seconds
MapReduce Jobs Launched:
Stage-Stage-1: Map: 27  Reduce: 3   Cumulative CPU: 107.47 sec   HDFS Read: 215812 HDFS Write: 602 SUCCESS
Total MapReduce CPU Time Spent: 1 minutes 47 seconds 470 msec
OK
Time taken: 46.458 seconds
[12]:
%%hive
SELECT * FROM persons ORDER BY id;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20220517145115_135ba0fe-fdea-45fc-960f-cbe253040dd8
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1652793922537_0025, Tracking URL = http://4feb4ed7d52d:8088/proxy/application_1652793922537_0025/
Kill Command = /opt/hadoop/bin/hadoop job  -kill job_1652793922537_0025
Hadoop job information for Stage-1: number of mappers: 27; number of reducers: 1
2022-05-17 14:51:24,736 Stage-1 map = 0%,  reduce = 0%
2022-05-17 14:51:28,845 Stage-1 map = 4%,  reduce = 0%, Cumulative CPU 1.81 sec
2022-05-17 14:51:29,883 Stage-1 map = 7%,  reduce = 0%, Cumulative CPU 4.03 sec
2022-05-17 14:51:31,967 Stage-1 map = 11%,  reduce = 0%, Cumulative CPU 6.33 sec
2022-05-17 14:51:32,997 Stage-1 map = 19%,  reduce = 0%, Cumulative CPU 11.12 sec
2022-05-17 14:51:34,044 Stage-1 map = 22%,  reduce = 0%, Cumulative CPU 13.19 sec
2022-05-17 14:51:35,100 Stage-1 map = 26%,  reduce = 0%, Cumulative CPU 15.72 sec
2022-05-17 14:51:36,151 Stage-1 map = 33%,  reduce = 0%, Cumulative CPU 20.93 sec
2022-05-17 14:51:37,194 Stage-1 map = 41%,  reduce = 0%, Cumulative CPU 25.54 sec
2022-05-17 14:51:38,227 Stage-1 map = 44%,  reduce = 0%, Cumulative CPU 27.83 sec
2022-05-17 14:51:40,326 Stage-1 map = 48%,  reduce = 0%, Cumulative CPU 30.42 sec
2022-05-17 14:51:41,390 Stage-1 map = 52%,  reduce = 0%, Cumulative CPU 33.17 sec
2022-05-17 14:51:42,432 Stage-1 map = 63%,  reduce = 0%, Cumulative CPU 40.69 sec
2022-05-17 14:51:45,564 Stage-1 map = 70%,  reduce = 0%, Cumulative CPU 46.69 sec
2022-05-17 14:51:46,592 Stage-1 map = 81%,  reduce = 0%, Cumulative CPU 54.02 sec
2022-05-17 14:51:49,680 Stage-1 map = 81%,  reduce = 27%, Cumulative CPU 54.36 sec
2022-05-17 14:51:50,711 Stage-1 map = 89%,  reduce = 27%, Cumulative CPU 60.48 sec
2022-05-17 14:51:51,756 Stage-1 map = 100%,  reduce = 27%, Cumulative CPU 68.54 sec
2022-05-17 14:51:52,783 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 70.51 sec
MapReduce Total cumulative CPU time: 1 minutes 10 seconds 510 msec
Ended Job = job_1652793922537_0025
MapReduce Jobs Launched:
Stage-Stage-1: Map: 27  Reduce: 1   Cumulative CPU: 70.51 sec   HDFS Read: 201369 HDFS Write: 1030 SUCCESS
Total MapReduce CPU Time Spent: 1 minutes 10 seconds 510 msec
OK
1       Vivian  Hamilton        1971-07-08 00:00:00     1       green
2       Karen   Holcomb 1974-05-23 00:00:00     4       green
3       Cody    Garrett 1973-04-22 00:00:00     1       orange
4       Roth    Fry     1975-01-29 00:00:00     1       black
5       Zoe     Conway  1974-07-03 00:00:00     2       blue
6       Gretchen        Kinney  1974-10-18 00:00:00     1       violet
7       Driscoll        Klein   1970-10-05 00:00:00     5       blue
8       Karyn   Diaz    1969-02-24 00:00:00     100     red
9       Merritt Guy     1974-10-17 00:00:00     4       indigo
11      Jordan  Estes   1969-12-07 00:00:00     4       indigo
12      Hope    Coffey  1973-12-24 00:00:00     5       green
13      Vivian  Crane   1970-08-27 00:00:00     5       gray
14      Clio    Noel    1972-12-12 00:00:00     100     red
15      Hope    Silva   1970-07-01 00:00:00     5       blue
16      Ayanna  Jarvis  1974-02-11 00:00:00     5       orange
17      Chanda  Boyer   1973-04-01 00:00:00     4       green
18      Chadwick        Knight  1973-04-29 00:00:00     1       yellow
Time taken: 38.642 seconds, Fetched: 17 row(s)

MERGE#

MERGE INTO <target table> AS T USING <source expression/table> AS S
ON <boolean expression1>
WHEN MATCHED [AND <boolean expression2>] THEN UPDATE SET <set clause list>
WHEN MATCHED [AND <boolean expression3>] THEN DELETE
WHEN NOT MATCHED [AND <boolean expression4>] THEN INSERT VALUES<value list>

Véase https://community.hortonworks.com/articles/97113/hive-acid-merge-by-example.html


[13]:
%%hive
-- limpia la base de datos
DROP DATABASE IF EXISTS demo CASCADE;
OK
Time taken: 0.55 seconds
[14]:
%quit
[15]:
!rm *.log