Manejo de tablas en Apache HBase#

  • Última modificación: Mayo 25, 2022

Basado en https://hbase.apache.org/book.html#quickstart

Cell magic %%hbase#

[1]:
from IPython.core.magic import Magics, cell_magic, line_magic, magics_class
from pexpect import spawn

TIMEOUT = 60
PROG = "hbase shell"
PROMPT = ["hbase\(main\):\d*:0>", "hbase\(main\):\d*:0>"]
QUIT = "exit"


@magics_class
class Magic(Magics):
    def __init__(self, shell):
        super().__init__(shell)
        self.app = spawn(PROG, timeout=60)
        self.app.expect(PROMPT)

    @cell_magic
    def hbase(self, line, cell):
        cell_lines = [cell_line.strip() for cell_line in cell.split("\n")]
        cell_lines = [cell_line for cell_line in cell_lines if cell_line != ""]
        for cell_line in cell_lines:
            self.app.sendline(cell_line)
            self.app.expect(PROMPT, timeout=TIMEOUT)
            output = self.app.before.decode()
            output = output.replace("\r\n", "\n")
            output = output.split("\n")
            output = [output_line.strip() for output_line in output]
            for output_line in output:
                if output_line not in cell_lines:
                    print(output_line)
        return None

    @line_magic
    def quit(self, line):
        self.app.sendline(QUIT)


def load_ipython_extension(ip):
    ip.register_magics(Magic(ip))


load_ipython_extension(ip=get_ipython())

Estructura de la tabla#

  • Formato tabular típico:

ID        Name      Home phone        Office phone    Office address
--------------------------------------------------------------------
1000      John Doe  1-425-000-0001    1-425-000-0002  1111 San Grabriel Dr
  • Formato columnar:

          Column family:Personal      Column Family:Office
RowKey    Name      Home phone        Office phone    Office address
---------------------------------------------------------------------------
1000      John Doe  1-425-000-0001    1-425-000-0002  1111 San Grabriel Dr

Creación de la tabla#

[2]:
%%hbase
create 'Contacts', 'Personal', 'Office'
Created table Contacts
Took 1.4929 seconds
=> Hbase::Table - Contacts

Listado de las tablas#

[3]:
%%hbase
list
TABLE
Contacts
1 row(s)
Took 0.0193 seconds
=> ["Contacts"]

Inserción manual de datos#

put ’<table name>’,’row1’,’<colfamily:colname>’,’<value>’
[4]:
%%hbase
put 'Contacts', '1000', 'Personal:Name', 'John Dole'
put 'Contacts', '1000', 'Personal:Phone', '1-425-000-0001'
put 'Contacts', '1000', 'Office:Phone', '1-425-000-0002'
put 'Contacts', '1000', 'Office:Address', '1111 San Gabriel Dr.'
Took 0.1265 seconds

Took 0.0040 seconds

Took 0.0108 seconds

r.' 'Contacts', '1000', 'Office:Address', '1111 San Gabriel D
Took 0.0033 seconds

Inspección de la tabla#

[5]:
%%hbase
scan 'Contacts'
ROW                   COLUMN+CELL
1000                 column=Office:Address, timestamp=2022-05-25T19:16:33.687Z,
value=1111 San Gabriel Dr.
1000                 column=Office:Phone, timestamp=2022-05-25T19:16:33.585Z, v
alue=1-425-000-0002
1000                 column=Personal:Name, timestamp=2022-05-25T19:16:33.389Z,
value=John Dole
1000                 column=Personal:Phone, timestamp=2022-05-25T19:16:33.480Z,
value=1-425-000-0001
1 row(s)
Took 0.0368 seconds

Conteo#

[6]:
%%hbase
count 'Contacts'
1 row(s)
Took 0.0093 seconds
=> 1

Extracción del contenido de una fila#

[7]:
%%hbase
get 'Contacts', '1000'
COLUMN                CELL
Office:Address       timestamp=2022-05-25T19:16:33.687Z, value=1111 San Gabriel
Dr.
Office:Phone         timestamp=2022-05-25T19:16:33.585Z, value=1-425-000-0002
Personal:Name        timestamp=2022-05-25T19:16:33.389Z, value=John Dole
Personal:Phone       timestamp=2022-05-25T19:16:33.480Z, value=1-425-000-0001
1 row(s)
Took 0.0174 seconds

Lectura de una columna específica#

[8]:
%%hbase
get 'Contacts', '1000', {COLUMN=>'Personal:Name'}
COLUMN                CELL
Personal:Name        timestamp=2022-05-25T19:16:33.389Z, value=John Dole
1 row(s)
Took 0.0043 seconds

Existencia de una tabla#

[9]:
%%hbase
exists 'Contacts'
Table Contacts does exist
Took 0.0066 seconds
=> true

Desactivación de la tabla#

[10]:
%%hbase
disable 'Contacts'
Took 1.1455 seconds

[11]:
%%hbase
is_disabled 'Contacts'
true
Took 0.0056 seconds
=> 1

Desactivación de todas las tablas#

disable_all 'C.*'

Activación de la tabla#

[12]:
%%hbase
enable 'Contacts'
Took 0.6287 seconds

[13]:
%%hbase
scan 'Contacts'
ROW                   COLUMN+CELL
1000                 column=Office:Address, timestamp=2022-05-25T19:16:33.687Z,
value=1111 San Gabriel Dr.
1000                 column=Office:Phone, timestamp=2022-05-25T19:16:33.585Z, v
alue=1-425-000-0002
1000                 column=Personal:Name, timestamp=2022-05-25T19:16:33.389Z,
value=John Dole
1000                 column=Personal:Phone, timestamp=2022-05-25T19:16:33.480Z,
value=1-425-000-0001
1 row(s)
Took 0.0106 seconds

[14]:
%%hbase
is_enabled 'Contacts'
true
Took 0.0143 seconds
=> true

Descripción de una tabla#

[15]:
%%hbase
describe 'Contacts'
Table Contacts is ENABLED
Contacts
COLUMN FAMILIES DESCRIPTION
{NAME => 'Office', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', KEEP_DELET
ED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NON
E', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER
=> 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_O
N_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', B
LOCKCACHE => 'true', BLOCKSIZE => '65536'}

{NAME => 'Personal', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', KEEP_DEL
ETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'N
ONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILT
ER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS
_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE',
BLOCKCACHE => 'true', BLOCKSIZE => '65536'}

2 row(s)

QUOTAS
0 row(s)
Took 0.0502 seconds

Borrado de la tabla#

[16]:
%%hbase
disable 'Contacts'
drop 'Contacts'
Took 0.3249 seconds

Took 0.1268 seconds

[17]:
%%hbase
list
TABLE
0 row(s)
Took 0.0025 seconds
=> []

Borrado de todas las tablas#

drop_all 'C.*'

Cierre del prompt#

Use:

hbase(main):025:0> exit
[4]:
%quit