Delete) en Hive#

Última modificación: Mayo 17, 2022 | YouTube

El lenguaje SQL estándar provee directivas para la insertar, actualizar y borrar registros en una tabla. En este tutorial se presentan ejemplos representativos de estas instrucciones en Hive.

Cell magic `%%hive`#

[1]:

from IPython.core.magic import Magics, cell_magic, line_magic, magics_class
from pexpect import spawn

TIMEOUT = 60
PROG = "hive"
PROMPT = ["\r\n    > ", "\r\nhive> "]
QUIT = "quit;"


@magics_class
class Magic(Magics):
    def __init__(self, shell):
        super().__init__(shell)
        self.app = spawn(PROG, timeout=60)
        self.app.expect(PROMPT)

    @cell_magic
    def hive(self, line, cell):
        cell_lines = [cell_line.strip() for cell_line in cell.split("\n")]
        cell_lines = [cell_line for cell_line in cell_lines if cell_line != ""]
        for cell_line in cell_lines:
            self.app.sendline(cell_line)
            self.app.expect(PROMPT, timeout=TIMEOUT)
            output = self.app.before.decode()
            output = output.replace("\r\n", "\n")
            output = output.split("\n")
            output = [output_line.strip() for output_line in output]
            for output_line in output:
                if output_line not in cell_lines:
                    print(output_line)
        return None

    @line_magic
    def quit(self, line):
        self.app.sendline(QUIT)


def load_ipython_extension(ip):
    ip.register_magics(Magic(ip))


load_ipython_extension(ip=get_ipython())

Creación de la tabla#

[2]:

%%hive
DROP DATABASE IF EXISTS demo CASCADE;
CREATE DATABASE demo;
USE demo;

CREATE TABLE persons (
    id        INT,
    firstname STRING,
    surname   STRING,
    birthday  TIMESTAMP,
    quantity  INT
)
PARTITIONED BY (color STRING)
CLUSTERED BY(id) INTO 3 BUCKETS
STORED AS ORC
LOCATION '/tmp/hive-partitioned'
TBLPROPERTIES ('transactional'='true');

OK
Time taken: 5.841 seconds
OK
Time taken: 0.334 seconds
OK
Time taken: 0.023 seconds
OK
Time taken: 0.25 seconds

Preparación#

Se deben habilitar las características de Hive para manejo de transacciones ACID.

[3]:

%%hive
SET hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
SET hive.support.concurrency=true;
SET hive.enforce.bucketing=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.compactor.initiator.on=true;
SET hive.compactor.worker.threads=1;

INSERT#

INSERT INTO TABLE tablename VALUES PARTITION () values_row [, values_row ...]

values_row:
   (value [, value ...])

Note que a diferencia de SQL, aca no es posible indicar para que columnas se van a insertar los valores, de tal manera que siempre se deben dar valores para todas las columnas.

[4]:

%%hive
--
-- Inserta el registro en la tabla.
-- Los valores están en el mismo orden de los campos.
--
INSERT INTO persons PARTITION (color='green') VALUES
    (1,"Vivian","Hamilton","1971-07-08",1),
    (2,"Karen","Holcomb","1974-05-23",4),
    (12,"Hope","Coffey","1973-12-24",5),
    (17,"Chanda","Boyer","1973-04-01",4);

INSERT INTO persons PARTITION (color='black') VALUES
    (4,"Roth","Fry","1975-01-29",1),
    (10,"Kylan","Sexton","1975-02-28",4);

INSERT INTO persons PARTITION (color='blue') VALUES
    (5,"Zoe","Conway","1974-07-03",2),
    (7,"Driscoll","Klein","1970-10-05",5),
    (15,"Hope","Silva","1970-07-01",5);

INSERT INTO persons PARTITION (color='orange') VALUES
    (3,"Cody","Garrett","1973-04-22",1),
    (16,"Ayanna","Jarvis","1974-02-11",5);

INSERT INTO persons PARTITION (color='violet') VALUES
    (6,"Gretchen","Kinney","1974-10-18",1);

INSERT INTO persons PARTITION (color='red') VALUES
    (8,"Karyn","Diaz","1969-02-24",1),
    (14,"Clio","Noel","1972-12-12",5);

INSERT INTO persons PARTITION (color='indigo') VALUES
    (9,"Merritt","Guy","1974-10-17",4),
    (11,"Jordan","Estes","1969-12-07",4);

INSERT INTO persons PARTITION (color='gray') VALUES
    (13,"Vivian","Crane","1970-08-27",5);

INSERT INTO persons PARTITION (color='yellow') VALUES
    (18,"Chadwick","Knight","1973-04-29",1);

WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20220517144650_1350f9e4-7c80-4a1f-b79b-94852ac5f89f
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 3
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1652793922537_0014, Tracking URL = http://4feb4ed7d52d:8088/proxy/application_1652793922537_0014/
Kill Command = /opt/hadoop/bin/hadoop job  -kill job_1652793922537_0014
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 3
2022-05-17 14:46:58,056 Stage-1 map = 0%,  reduce = 0%
2022-05-17 14:47:01,178 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.77 sec
2022-05-17 14:47:07,402 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 4.71 sec
2022-05-17 14:47:08,427 Stage-1 map = 100%,  reduce = 67%, Cumulative CPU 7.99 sec
2022-05-17 14:47:09,453 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 10.5 sec
MapReduce Total cumulative CPU time: 10 seconds 500 msec
Ended Job = job_1652793922537_0014
Loading data to table demo.persons partition (color=green)
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1  Reduce: 3   Cumulative CPU: 10.5 sec   HDFS Read: 20555 HDFS Write: 2996 SUCCESS
Total MapReduce CPU Time Spent: 10 seconds 500 msec
OK
Time taken: 20.393 seconds
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20220517144711_74e414ca-7765-4970-9306-a584392f1c7e
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 3
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1652793922537_0015, Tracking URL = http://4feb4ed7d52d:8088/proxy/application_1652793922537_0015/
Kill Command = /opt/hadoop/bin/hadoop job  -kill job_1652793922537_0015
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 3
2022-05-17 14:47:19,919 Stage-1 map = 0%,  reduce = 0%
2022-05-17 14:47:24,026 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.92 sec
2022-05-17 14:47:29,169 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 4.49 sec
2022-05-17 14:47:30,192 Stage-1 map = 100%,  reduce = 67%, Cumulative CPU 7.17 sec
2022-05-17 14:47:31,216 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 9.6 sec
MapReduce Total cumulative CPU time: 9 seconds 600 msec
Ended Job = job_1652793922537_0015
Loading data to table demo.persons partition (color=black)
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1  Reduce: 3   Cumulative CPU: 9.6 sec   HDFS Read: 20491 HDFS Write: 1574 SUCCESS
Total MapReduce CPU Time Spent: 9 seconds 600 msec
OK
Time taken: 21.193 seconds
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20220517144732_1995c38c-7790-49b6-8534-99d31b09d868
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 3
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1652793922537_0016, Tracking URL = http://4feb4ed7d52d:8088/proxy/application_1652793922537_0016/
Kill Command = /opt/hadoop/bin/hadoop job  -kill job_1652793922537_0016
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 3
2022-05-17 14:47:41,805 Stage-1 map = 0%,  reduce = 0%
2022-05-17 14:47:45,926 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.72 sec
2022-05-17 14:47:51,043 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 4.53 sec
2022-05-17 14:47:52,065 Stage-1 map = 100%,  reduce = 67%, Cumulative CPU 7.2 sec
2022-05-17 14:47:53,089 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 9.7 sec
MapReduce Total cumulative CPU time: 9 seconds 700 msec
Ended Job = job_1652793922537_0016
Loading data to table demo.persons partition (color=blue)
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1  Reduce: 3   Cumulative CPU: 9.7 sec   HDFS Read: 20581 HDFS Write: 2912 SUCCESS
Total MapReduce CPU Time Spent: 9 seconds 700 msec
OK
Time taken: 21.718 seconds
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20220517144754_d4929f45-b1d3-4145-93f0-601b0d1fb2e7
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 3
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1652793922537_0017, Tracking URL = http://4feb4ed7d52d:8088/proxy/application_1652793922537_0017/
Kill Command = /opt/hadoop/bin/hadoop job  -kill job_1652793922537_0017
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 3
2022-05-17 14:48:04,101 Stage-1 map = 0%,  reduce = 0%
2022-05-17 14:48:08,213 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.85 sec
2022-05-17 14:48:13,354 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 4.67 sec
2022-05-17 14:48:14,377 Stage-1 map = 100%,  reduce = 67%, Cumulative CPU 7.5 sec
2022-05-17 14:48:15,403 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 9.94 sec
MapReduce Total cumulative CPU time: 9 seconds 940 msec
Ended Job = job_1652793922537_0017
Loading data to table demo.persons partition (color=orange)
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1  Reduce: 3   Cumulative CPU: 9.94 sec   HDFS Read: 20586 HDFS Write: 2220 SUCCESS
Total MapReduce CPU Time Spent: 9 seconds 940 msec
OK
Time taken: 21.955 seconds
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20220517144816_75278cf6-d2e7-418a-89ff-c9903503ce09
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 3
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1652793922537_0018, Tracking URL = http://4feb4ed7d52d:8088/proxy/application_1652793922537_0018/
Kill Command = /opt/hadoop/bin/hadoop job  -kill job_1652793922537_0018
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 3
2022-05-17 14:48:25,381 Stage-1 map = 0%,  reduce = 0%
2022-05-17 14:48:29,492 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.73 sec
2022-05-17 14:48:35,674 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 4.63 sec
2022-05-17 14:48:36,699 Stage-1 map = 100%,  reduce = 67%, Cumulative CPU 7.36 sec
2022-05-17 14:48:37,724 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 9.79 sec
MapReduce Total cumulative CPU time: 9 seconds 790 msec
Ended Job = job_1652793922537_0018
Loading data to table demo.persons partition (color=violet)
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1  Reduce: 3   Cumulative CPU: 9.79 sec   HDFS Read: 20559 HDFS Write: 1538 SUCCESS
Total MapReduce CPU Time Spent: 9 seconds 790 msec
OK
Time taken: 22.258 seconds
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20220517144839_295ad16e-eb7c-44cd-8a8c-17096df9c480
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 3
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1652793922537_0019, Tracking URL = http://4feb4ed7d52d:8088/proxy/application_1652793922537_0019/
Kill Command = /opt/hadoop/bin/hadoop job  -kill job_1652793922537_0019
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 3
2022-05-17 14:48:47,750 Stage-1 map = 0%,  reduce = 0%
2022-05-17 14:48:51,842 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.68 sec
2022-05-17 14:48:56,975 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 4.5 sec
2022-05-17 14:48:59,021 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 9.79 sec
MapReduce Total cumulative CPU time: 9 seconds 790 msec
Ended Job = job_1652793922537_0019
Loading data to table demo.persons partition (color=red)
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1  Reduce: 3   Cumulative CPU: 9.79 sec   HDFS Read: 20526 HDFS Write: 1579 SUCCESS
Total MapReduce CPU Time Spent: 9 seconds 790 msec
OK
Time taken: 22.078 seconds
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20220517144901_ebf6a1d4-e8cd-4d93-b1ab-63d762d4c433
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 3
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1652793922537_0020, Tracking URL = http://4feb4ed7d52d:8088/proxy/application_1652793922537_0020/
Kill Command = /opt/hadoop/bin/hadoop job  -kill job_1652793922537_0020
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 3
2022-05-17 14:49:09,651 Stage-1 map = 0%,  reduce = 0%
2022-05-17 14:49:13,773 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.8 sec
2022-05-17 14:49:18,974 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 4.65 sec
2022-05-17 14:49:21,022 Stage-1 map = 100%,  reduce = 67%, Cumulative CPU 7.39 sec
2022-05-17 14:49:22,048 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 9.75 sec
MapReduce Total cumulative CPU time: 9 seconds 750 msec
Ended Job = job_1652793922537_0020
Loading data to table demo.persons partition (color=indigo)
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1  Reduce: 3   Cumulative CPU: 9.75 sec   HDFS Read: 20584 HDFS Write: 2226 SUCCESS
Total MapReduce CPU Time Spent: 9 seconds 750 msec
OK
Time taken: 21.801 seconds
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20220517144923_d95da8ef-8248-4d0a-8a41-8d78367a7e08
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 3
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1652793922537_0021, Tracking URL = http://4feb4ed7d52d:8088/proxy/application_1652793922537_0021/
Kill Command = /opt/hadoop/bin/hadoop job  -kill job_1652793922537_0021
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 3
2022-05-17 14:49:31,764 Stage-1 map = 0%,  reduce = 0%
2022-05-17 14:49:35,871 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.73 sec
2022-05-17 14:49:41,019 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 4.37 sec
2022-05-17 14:49:43,072 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 9.33 sec
MapReduce Total cumulative CPU time: 9 seconds 330 msec
Ended Job = job_1652793922537_0021
Loading data to table demo.persons partition (color=gray)
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1  Reduce: 3   Cumulative CPU: 9.33 sec   HDFS Read: 20527 HDFS Write: 1531 SUCCESS
Total MapReduce CPU Time Spent: 9 seconds 330 msec
OK
Time taken: 20.871 seconds
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20220517144944_3244cc24-4ca7-4412-ade3-5b94faa22883
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 3
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1652793922537_0022, Tracking URL = http://4feb4ed7d52d:8088/proxy/application_1652793922537_0022/
Kill Command = /opt/hadoop/bin/hadoop job  -kill job_1652793922537_0022
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 3
2022-05-17 14:49:53,950 Stage-1 map = 0%,  reduce = 0%
2022-05-17 14:49:58,052 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.8 sec
2022-05-17 14:50:03,175 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 4.6 sec
2022-05-17 14:50:04,200 Stage-1 map = 100%,  reduce = 67%, Cumulative CPU 7.29 sec
2022-05-17 14:50:05,301 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 9.74 sec
MapReduce Total cumulative CPU time: 9 seconds 740 msec
Ended Job = job_1652793922537_0022
Loading data to table demo.persons partition (color=yellow)
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1  Reduce: 3   Cumulative CPU: 9.74 sec   HDFS Read: 20551 HDFS Write: 1546 SUCCESS
Total MapReduce CPU Time Spent: 9 seconds 740 msec
OK
Time taken: 22.103 seconds

[5]:

%%hive
SELECT * FROM persons;

OK
10      Kylan   Sexton  1975-02-28 00:00:00     4       black
4       Roth    Fry     1975-01-29 00:00:00     1       black
15      Hope    Silva   1970-07-01 00:00:00     5       blue
7       Driscoll        Klein   1970-10-05 00:00:00     5       blue
5       Zoe     Conway  1974-07-03 00:00:00     2       blue
13      Vivian  Crane   1970-08-27 00:00:00     5       gray
12      Hope    Coffey  1973-12-24 00:00:00     5       green
1       Vivian  Hamilton        1971-07-08 00:00:00     1       green
17      Chanda  Boyer   1973-04-01 00:00:00     4       green
2       Karen   Holcomb 1974-05-23 00:00:00     4       green
9       Merritt Guy     1974-10-17 00:00:00     4       indigo
11      Jordan  Estes   1969-12-07 00:00:00     4       indigo
3       Cody    Garrett 1973-04-22 00:00:00     1       orange
16      Ayanna  Jarvis  1974-02-11 00:00:00     5       orange
14      Clio    Noel    1972-12-12 00:00:00     5       red
8       Karyn   Diaz    1969-02-24 00:00:00     1       red
6       Gretchen        Kinney  1974-10-18 00:00:00     1       violet
18      Chadwick        Knight  1973-04-29 00:00:00     1       yellow
Time taken: 0.207 seconds, Fetched: 18 row(s)

[6]:

!hdfs dfs -ls /tmp/hive-partitioned

Found 9 items
drwxrwxrwx   - root supergroup          0 2022-05-17 14:47 /tmp/hive-partitioned/color=black
drwxrwxrwx   - root supergroup          0 2022-05-17 14:47 /tmp/hive-partitioned/color=blue
drwxrwxrwx   - root supergroup          0 2022-05-17 14:49 /tmp/hive-partitioned/color=gray
drwxrwxrwx   - root supergroup          0 2022-05-17 14:47 /tmp/hive-partitioned/color=green
drwxrwxrwx   - root supergroup          0 2022-05-17 14:49 /tmp/hive-partitioned/color=indigo
drwxrwxrwx   - root supergroup          0 2022-05-17 14:48 /tmp/hive-partitioned/color=orange
drwxrwxrwx   - root supergroup          0 2022-05-17 14:49 /tmp/hive-partitioned/color=red
drwxrwxrwx   - root supergroup          0 2022-05-17 14:48 /tmp/hive-partitioned/color=violet
drwxrwxrwx   - root supergroup          0 2022-05-17 14:50 /tmp/hive-partitioned/color=yellow

[7]:

!hdfs dfs -ls /tmp/hive-partitioned/color=black

Found 1 items
drwxr-xr-x   - root supergroup          0 2022-05-17 14:47 /tmp/hive-partitioned/color=black/delta_0000002_0000002_0000

[8]:

!hdfs dfs -ls /tmp/hive-partitioned/color=black/delta_0000002_0000002_0000

Found 3 items
-rw-r--r--   1 root supergroup        223 2022-05-17 14:47 /tmp/hive-partitioned/color=black/delta_0000002_0000002_0000/bucket_00000
-rw-r--r--   1 root supergroup        936 2022-05-17 14:47 /tmp/hive-partitioned/color=black/delta_0000002_0000002_0000/bucket_00001
-rw-r--r--   1 root supergroup        223 2022-05-17 14:47 /tmp/hive-partitioned/color=black/delta_0000002_0000002_0000/bucket_00002

UPDATE#

UPDATE tablename SET column = value [, column = value ...] [WHERE expression]

Véase https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions

[9]:

%%hive
UPDATE persons SET quantity = 100 WHERE color = 'red';

WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20220517145011_7cabef7b-52ff-4671-a383-b636bd0cb2a6
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 3
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1652793922537_0023, Tracking URL = http://4feb4ed7d52d:8088/proxy/application_1652793922537_0023/
Kill Command = /opt/hadoop/bin/hadoop job  -kill job_1652793922537_0023
Hadoop job information for Stage-1: number of mappers: 3; number of reducers: 3
2022-05-17 14:50:15,598 Stage-1 map = 0%,  reduce = 0%
2022-05-17 14:50:19,686 Stage-1 map = 33%,  reduce = 0%, Cumulative CPU 1.61 sec
2022-05-17 14:50:20,730 Stage-1 map = 67%,  reduce = 0%, Cumulative CPU 3.48 sec
2022-05-17 14:50:22,782 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 5.85 sec
2022-05-17 14:50:24,843 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 7.21 sec
2022-05-17 14:50:25,877 Stage-1 map = 100%,  reduce = 67%, Cumulative CPU 8.7 sec
2022-05-17 14:50:26,901 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 10.69 sec
MapReduce Total cumulative CPU time: 10 seconds 690 msec
Ended Job = job_1652793922537_0023
Loading data to table demo.persons partition (color=null)


Time taken to load dynamic partitions: 0.116 seconds
Time taken for adding to write entity : 0.0 seconds
MapReduce Jobs Launched:
Stage-Stage-1: Map: 3  Reduce: 3   Cumulative CPU: 10.69 sec   HDFS Read: 33100 HDFS Write: 1067 SUCCESS
Total MapReduce CPU Time Spent: 10 seconds 690 msec
OK
Time taken: 16.249 seconds

[10]:

%%hive
SELECT * FROM persons;

OK
10      Kylan   Sexton  1975-02-28 00:00:00     4       black
4       Roth    Fry     1975-01-29 00:00:00     1       black
15      Hope    Silva   1970-07-01 00:00:00     5       blue
7       Driscoll        Klein   1970-10-05 00:00:00     5       blue
5       Zoe     Conway  1974-07-03 00:00:00     2       blue
13      Vivian  Crane   1970-08-27 00:00:00     5       gray
12      Hope    Coffey  1973-12-24 00:00:00     5       green
1       Vivian  Hamilton        1971-07-08 00:00:00     1       green
17      Chanda  Boyer   1973-04-01 00:00:00     4       green
2       Karen   Holcomb 1974-05-23 00:00:00     4       green
9       Merritt Guy     1974-10-17 00:00:00     4       indigo
11      Jordan  Estes   1969-12-07 00:00:00     4       indigo
3       Cody    Garrett 1973-04-22 00:00:00     1       orange
16      Ayanna  Jarvis  1974-02-11 00:00:00     5       orange
14      Clio    Noel    1972-12-12 00:00:00     100     red
8       Karyn   Diaz    1969-02-24 00:00:00     100     red
6       Gretchen        Kinney  1974-10-18 00:00:00     1       violet
18      Chadwick        Knight  1973-04-29 00:00:00     1       yellow
Time taken: 0.127 seconds, Fetched: 18 row(s)

DELETE#

DELETE FROM tablename [WHERE expression]