Manejo de tablas en Apache HBase#
Última modificación: Mayo 25, 2022
Basado en https://hbase.apache.org/book.html#quickstart
Cell magic %%hbase
#
[1]:
from IPython.core.magic import Magics, cell_magic, line_magic, magics_class
from pexpect import spawn
TIMEOUT = 60
PROG = "hbase shell"
PROMPT = ["hbase\(main\):\d*:0>", "hbase\(main\):\d*:0>"]
QUIT = "exit"
@magics_class
class Magic(Magics):
def __init__(self, shell):
super().__init__(shell)
self.app = spawn(PROG, timeout=60)
self.app.expect(PROMPT)
@cell_magic
def hbase(self, line, cell):
cell_lines = [cell_line.strip() for cell_line in cell.split("\n")]
cell_lines = [cell_line for cell_line in cell_lines if cell_line != ""]
for cell_line in cell_lines:
self.app.sendline(cell_line)
self.app.expect(PROMPT, timeout=TIMEOUT)
output = self.app.before.decode()
output = output.replace("\r\n", "\n")
output = output.split("\n")
output = [output_line.strip() for output_line in output]
for output_line in output:
if output_line not in cell_lines:
print(output_line)
return None
@line_magic
def quit(self, line):
self.app.sendline(QUIT)
def load_ipython_extension(ip):
ip.register_magics(Magic(ip))
load_ipython_extension(ip=get_ipython())
Estructura de la tabla#
Formato tabular típico:
ID Name Home phone Office phone Office address
--------------------------------------------------------------------
1000 John Doe 1-425-000-0001 1-425-000-0002 1111 San Grabriel Dr
Formato columnar:
Column family:Personal Column Family:Office
RowKey Name Home phone Office phone Office address
---------------------------------------------------------------------------
1000 John Doe 1-425-000-0001 1-425-000-0002 1111 San Grabriel Dr
Creación de la tabla#
[2]:
%%hbase
create 'Contacts', 'Personal', 'Office'
Created table Contacts
Took 1.4929 seconds
=> Hbase::Table - Contacts
Listado de las tablas#
[3]:
%%hbase
list
TABLE
Contacts
1 row(s)
Took 0.0193 seconds
=> ["Contacts"]
Inserción manual de datos#
put ’<table name>’,’row1’,’<colfamily:colname>’,’<value>’
[4]:
%%hbase
put 'Contacts', '1000', 'Personal:Name', 'John Dole'
put 'Contacts', '1000', 'Personal:Phone', '1-425-000-0001'
put 'Contacts', '1000', 'Office:Phone', '1-425-000-0002'
put 'Contacts', '1000', 'Office:Address', '1111 San Gabriel Dr.'
Took 0.1265 seconds
Took 0.0040 seconds
Took 0.0108 seconds
r.' 'Contacts', '1000', 'Office:Address', '1111 San Gabriel D
Took 0.0033 seconds
Inspección de la tabla#
[5]:
%%hbase
scan 'Contacts'
ROW COLUMN+CELL
1000 column=Office:Address, timestamp=2022-05-25T19:16:33.687Z,
value=1111 San Gabriel Dr.
1000 column=Office:Phone, timestamp=2022-05-25T19:16:33.585Z, v
alue=1-425-000-0002
1000 column=Personal:Name, timestamp=2022-05-25T19:16:33.389Z,
value=John Dole
1000 column=Personal:Phone, timestamp=2022-05-25T19:16:33.480Z,
value=1-425-000-0001
1 row(s)
Took 0.0368 seconds
Conteo#
[6]:
%%hbase
count 'Contacts'
1 row(s)
Took 0.0093 seconds
=> 1
Extracción del contenido de una fila#
[7]:
%%hbase
get 'Contacts', '1000'
COLUMN CELL
Office:Address timestamp=2022-05-25T19:16:33.687Z, value=1111 San Gabriel
Dr.
Office:Phone timestamp=2022-05-25T19:16:33.585Z, value=1-425-000-0002
Personal:Name timestamp=2022-05-25T19:16:33.389Z, value=John Dole
Personal:Phone timestamp=2022-05-25T19:16:33.480Z, value=1-425-000-0001
1 row(s)
Took 0.0174 seconds
Lectura de una columna específica#
[8]:
%%hbase
get 'Contacts', '1000', {COLUMN=>'Personal:Name'}
COLUMN CELL
Personal:Name timestamp=2022-05-25T19:16:33.389Z, value=John Dole
1 row(s)
Took 0.0043 seconds
Existencia de una tabla#
[9]:
%%hbase
exists 'Contacts'
Table Contacts does exist
Took 0.0066 seconds
=> true
Desactivación de la tabla#
[10]:
%%hbase
disable 'Contacts'
Took 1.1455 seconds
[11]:
%%hbase
is_disabled 'Contacts'
true
Took 0.0056 seconds
=> 1
Desactivación de todas las tablas#
disable_all 'C.*'
Activación de la tabla#
[12]:
%%hbase
enable 'Contacts'
Took 0.6287 seconds
[13]:
%%hbase
scan 'Contacts'
ROW COLUMN+CELL
1000 column=Office:Address, timestamp=2022-05-25T19:16:33.687Z,
value=1111 San Gabriel Dr.
1000 column=Office:Phone, timestamp=2022-05-25T19:16:33.585Z, v
alue=1-425-000-0002
1000 column=Personal:Name, timestamp=2022-05-25T19:16:33.389Z,
value=John Dole
1000 column=Personal:Phone, timestamp=2022-05-25T19:16:33.480Z,
value=1-425-000-0001
1 row(s)
Took 0.0106 seconds
[14]:
%%hbase
is_enabled 'Contacts'
true
Took 0.0143 seconds
=> true
Descripción de una tabla#
[15]:
%%hbase
describe 'Contacts'
Table Contacts is ENABLED
Contacts
COLUMN FAMILIES DESCRIPTION
{NAME => 'Office', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', KEEP_DELET
ED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NON
E', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER
=> 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_O
N_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', B
LOCKCACHE => 'true', BLOCKSIZE => '65536'}
{NAME => 'Personal', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', KEEP_DEL
ETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'N
ONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILT
ER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS
_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE',
BLOCKCACHE => 'true', BLOCKSIZE => '65536'}
2 row(s)
QUOTAS
0 row(s)
Took 0.0502 seconds
Borrado de la tabla#
[16]:
%%hbase
disable 'Contacts'
drop 'Contacts'
Took 0.3249 seconds
Took 0.1268 seconds
[17]:
%%hbase
list
TABLE
0 row(s)
Took 0.0025 seconds
=> []
Borrado de todas las tablas#
drop_all 'C.*'
Cierre del prompt#
Use:
hbase(main):025:0> exit
[4]:
%quit