{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Consultas de Selección: Cruce de Tablas, JOINs\n", "\n", "## Conexión a la Base de Datos" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import psycopg2\n", "import sqlalchemy\n", "import matplotlib as plt\n", "\n", "%matplotlib inline\n", "\n", "from sqlalchemy import create_engine\n", "\n", "POSTGRES_ADDRESS = 'localhost' # Este es el servidor, puede ser una IP\n", "POSTGRES_PORT = '5432'\n", "POSTGRES_USERNAME = 'postgres' \n", "POSTGRES_PASSWORD = '1234' \n", "POSTGRES_DBNAME = 'dvdrental' \n", "\n", "# Ahora se configura la cadena de conexión.\n", "# Esta es una cadena que contiene los parámetros necesarios para establecer una\n", "# conexión con posgres.\n", "postgres_str = ('postgresql://{username}:{password}@{ipaddress}:{port}/{dbname}'.format(username=POSTGRES_USERNAME,\n", " password=POSTGRES_PASSWORD,\n", " ipaddress=POSTGRES_ADDRESS,\n", " port=POSTGRES_PORT,\n", " dbname=POSTGRES_DBNAME))\n", "# A continuación se crea la conexión\n", "cnx = create_engine(postgres_str)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Cruce de Tablas\n", "\n", "Existen varios tipos de Joins (cruce, unión o combinación) en SQL. En PostgreSQL, se implementa la unión interna (inner join), la unión izquierda (left join), la unión derecha (right join) y la unión externa completa (full outer join).\n", "\n", "La unión de PostgreSQL se utiliza para combinar columnas de una ( autounión - self join ) o más tablas según los valores de las columnas comunes entre tablas relacionadas. Las columnas comunes suelen ser las columnas de clave principal de la primera tabla (origen) y las columnas de clave externa o foránea de la segunda tabla (destino)." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Suponga que tenemos dos canastas\n", "sql = '''\n", "DROP TABLE IF EXISTS canastaX;\n", "CREATE TABLE canastaX (\n", " x INT PRIMARY KEY,\n", " frutaX VARCHAR (100) NOT NULL\n", ");\n", "\n", "DROP TABLE IF EXISTS canastaY;\n", "CREATE TABLE canastaY (\n", " y INT PRIMARY KEY,\n", " frutaY VARCHAR (100) NOT NULL\n", ");\n", "\n", "INSERT INTO canastaX (x, frutaX)\n", "VALUES\n", " (1, 'Manzana'),\n", " (2, 'Naranja'),\n", " (3, 'Banano'),\n", " (4, 'Uva');\n", "\n", "INSERT INTO canastaY (y, frutaY)\n", "VALUES\n", " (1, 'Naranja'),\n", " (2, 'Manzana'),\n", " (3, 'Melón'),\n", " (4, 'Pera');\n", "\n", "'''\n", "\n", "cnx.execute(sql)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
xfrutax
01Manzana
12Naranja
23Banano
34Uva
\n", "
" ], "text/plain": [ " x frutax\n", "0 1 Manzana\n", "1 2 Naranja\n", "2 3 Banano\n", "3 4 Uva" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Verificamos que las tablas tengan datos\n", "sql = ''' \n", "SELECT * FROM canastaX;\n", "'''\n", "pd.read_sql_query(sql, cnx)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
yfrutay
01Naranja
12Manzana
23Melón
34Pera
\n", "
" ], "text/plain": [ " y frutay\n", "0 1 Naranja\n", "1 2 Manzana\n", "2 3 Melón\n", "3 4 Pera" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Verificamos que las tablas tengan datos\n", "sql = ''' \n", "SELECT * FROM canastaY;\n", "'''\n", "pd.read_sql_query(sql, cnx)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Unión interna - Inner Join\n", "\n", "La combinación interna examina cada fila de la primera tabla (canastaX), compara el valor de la columna frutaX con el valor de la columna frutaY de cada fila de la segunda tabla (canastaY). Si estos valores son iguales, la combinación interna crea una nueva fila que contiene columnas de ambas tablas y agrega esta nueva fila al conjunto de resultados. En teoría de conjuntos esta es la intersección de canastaX con canastaY.\n" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
xfrutaxyfrutay
01Manzana2Manzana
12Naranja1Naranja
\n", "
" ], "text/plain": [ " x frutax y frutay\n", "0 1 Manzana 2 Manzana\n", "1 2 Naranja 1 Naranja" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sql = ''' \n", "SELECT x, frutax, y, frutay\n", "FROM canastaX\n", "INNER JOIN canastaY\n", " ON frutax = frutay;\n", "'''\n", "pd.read_sql_query(sql, cnx)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
customer_idfirst_namelast_nameamountpayment_date
0416JefferyPinson2.992007-02-14 21:21:59.996577
1516ElmerNoe4.992007-02-14 21:23:39.996577
2239MinnieRomero4.992007-02-14 21:29:00.996577
3592TerranceRoush6.992007-02-14 21:41:12.996577
449JoyceEdwards0.992007-02-14 21:44:52.996577
..................
14591163CathySpencer0.002007-05-14 13:44:29.996577
14592168ReginaBerry0.992007-05-14 13:44:29.996577
14593175AnnetteOlson3.982007-05-14 13:44:29.996577
14594175AnnetteOlson0.002007-05-14 13:44:29.996577
14595178MarionSnyder4.992007-05-14 13:44:29.996577
\n", "

14596 rows × 5 columns

\n", "
" ], "text/plain": [ " customer_id first_name last_name amount payment_date\n", "0 416 Jeffery Pinson 2.99 2007-02-14 21:21:59.996577\n", "1 516 Elmer Noe 4.99 2007-02-14 21:23:39.996577\n", "2 239 Minnie Romero 4.99 2007-02-14 21:29:00.996577\n", "3 592 Terrance Roush 6.99 2007-02-14 21:41:12.996577\n", "4 49 Joyce Edwards 0.99 2007-02-14 21:44:52.996577\n", "... ... ... ... ... ...\n", "14591 163 Cathy Spencer 0.00 2007-05-14 13:44:29.996577\n", "14592 168 Regina Berry 0.99 2007-05-14 13:44:29.996577\n", "14593 175 Annette Olson 3.98 2007-05-14 13:44:29.996577\n", "14594 175 Annette Olson 0.00 2007-05-14 13:44:29.996577\n", "14595 178 Marion Snyder 4.99 2007-05-14 13:44:29.996577\n", "\n", "[14596 rows x 5 columns]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Con la base de datos de alquiler de peliculas\n", "sql = ''' \n", "SELECT\n", "\tcustomer.customer_id,\n", "\tfirst_name,\n", "\tlast_name,\n", "\tamount,\n", "\tpayment_date\n", "FROM\n", "\tcustomer\n", "INNER JOIN payment \n", " ON payment.customer_id = customer.customer_id\n", "ORDER BY payment_date;\n", "'''\n", "pd.read_sql_query(sql, cnx)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
customer_idfirst_namelast_nameamountpayment_date
0416JefferyPinson2.992007-02-14 21:21:59.996577
1516ElmerNoe4.992007-02-14 21:23:39.996577
2239MinnieRomero4.992007-02-14 21:29:00.996577
3592TerranceRoush6.992007-02-14 21:41:12.996577
449JoyceEdwards0.992007-02-14 21:44:52.996577
..................
14591163CathySpencer0.002007-05-14 13:44:29.996577
14592168ReginaBerry0.992007-05-14 13:44:29.996577
14593175AnnetteOlson3.982007-05-14 13:44:29.996577
14594175AnnetteOlson0.002007-05-14 13:44:29.996577
14595178MarionSnyder4.992007-05-14 13:44:29.996577
\n", "

14596 rows × 5 columns

\n", "
" ], "text/plain": [ " customer_id first_name last_name amount payment_date\n", "0 416 Jeffery Pinson 2.99 2007-02-14 21:21:59.996577\n", "1 516 Elmer Noe 4.99 2007-02-14 21:23:39.996577\n", "2 239 Minnie Romero 4.99 2007-02-14 21:29:00.996577\n", "3 592 Terrance Roush 6.99 2007-02-14 21:41:12.996577\n", "4 49 Joyce Edwards 0.99 2007-02-14 21:44:52.996577\n", "... ... ... ... ... ...\n", "14591 163 Cathy Spencer 0.00 2007-05-14 13:44:29.996577\n", "14592 168 Regina Berry 0.99 2007-05-14 13:44:29.996577\n", "14593 175 Annette Olson 3.98 2007-05-14 13:44:29.996577\n", "14594 175 Annette Olson 0.00 2007-05-14 13:44:29.996577\n", "14595 178 Marion Snyder 4.99 2007-05-14 13:44:29.996577\n", "\n", "[14596 rows x 5 columns]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Dado que ambas tablas tienen la misma columna customer_id, puede usar la sintaxis USING.\n", "sql = ''' \n", "SELECT\n", "\tcustomer_id,\n", "\tfirst_name,\n", "\tlast_name,\n", "\tamount,\n", "\tpayment_date\n", "FROM\n", "\tcustomer\n", "INNER JOIN payment USING(customer_id)\n", "ORDER BY payment_date;\n", "'''\n", "pd.read_sql_query(sql, cnx)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
customer_idcustomer_first_namecustomer_last_namestaff_first_namestaff_last_nameamountpayment_date
0416JefferyPinsonJonStephens2.992007-02-14 21:21:59.996577
1516ElmerNoeJonStephens4.992007-02-14 21:23:39.996577
2239MinnieRomeroMikeHillyer4.992007-02-14 21:29:00.996577
3592TerranceRoushJonStephens6.992007-02-14 21:41:12.996577
449JoyceEdwardsMikeHillyer0.992007-02-14 21:44:52.996577
........................
14591163CathySpencerMikeHillyer0.002007-05-14 13:44:29.996577
14592168ReginaBerryMikeHillyer0.992007-05-14 13:44:29.996577
14593175AnnetteOlsonJonStephens3.982007-05-14 13:44:29.996577
14594175AnnetteOlsonJonStephens0.002007-05-14 13:44:29.996577
14595178MarionSnyderMikeHillyer4.992007-05-14 13:44:29.996577
\n", "

14596 rows × 7 columns

\n", "
" ], "text/plain": [ " customer_id customer_first_name customer_last_name staff_first_name \\\n", "0 416 Jeffery Pinson Jon \n", "1 516 Elmer Noe Jon \n", "2 239 Minnie Romero Mike \n", "3 592 Terrance Roush Jon \n", "4 49 Joyce Edwards Mike \n", "... ... ... ... ... \n", "14591 163 Cathy Spencer Mike \n", "14592 168 Regina Berry Mike \n", "14593 175 Annette Olson Jon \n", "14594 175 Annette Olson Jon \n", "14595 178 Marion Snyder Mike \n", "\n", " staff_last_name amount payment_date \n", "0 Stephens 2.99 2007-02-14 21:21:59.996577 \n", "1 Stephens 4.99 2007-02-14 21:23:39.996577 \n", "2 Hillyer 4.99 2007-02-14 21:29:00.996577 \n", "3 Stephens 6.99 2007-02-14 21:41:12.996577 \n", "4 Hillyer 0.99 2007-02-14 21:44:52.996577 \n", "... ... ... ... \n", "14591 Hillyer 0.00 2007-05-14 13:44:29.996577 \n", "14592 Hillyer 0.99 2007-05-14 13:44:29.996577 \n", "14593 Stephens 3.98 2007-05-14 13:44:29.996577 \n", "14594 Stephens 0.00 2007-05-14 13:44:29.996577 \n", "14595 Hillyer 4.99 2007-05-14 13:44:29.996577 \n", "\n", "[14596 rows x 7 columns]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Dado que ambas tablas tienen la misma columna customer_id, puede usar la sintaxis USING.\n", "sql = ''' \n", "SELECT\n", "\tc.customer_id,\n", "\tc.first_name customer_first_name,\n", "\tc.last_name customer_last_name,\n", "\ts.first_name staff_first_name,\n", "\ts.last_name staff_last_name,\n", "\tamount,\n", "\tpayment_date\n", "FROM\n", "\tcustomer c\n", "INNER JOIN payment p \n", " ON p.customer_id = c.customer_id\n", "INNER JOIN staff s \n", " ON p.staff_id = s.staff_id\n", "ORDER BY payment_date;\n", "'''\n", "pd.read_sql_query(sql, cnx)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Unión a la izquierda - Left Join\n", "\n", "La siguiente consulta usa la cláusula de combinación izquierda para unir la tabla canastaX con la canastaY. En el contexto de unión a la izquierda, la primera tabla se denomina tabla izquierda y la segunda tabla se denomina tabla derecha. \n", "\n", "Se mapean todos los elementos de la tabla izquierda con los que se puedan de la tabla derecha, el resto se mapea con NULL.\n", "\n", "El siguiente diagrama de Venn ilustra la unión izquierda:\n", "\n", "![imagen.png](https://sp.postgresqltutorial.com/wp-content/uploads/2018/12/PostgreSQL-Join-Left-Join.png)\n", "\n", " [Fuente de la imagen](https://www.postgresqltutorial.com/postgresql-joins/).\n", " \n", "* Nota: LEFT JOIN es el mismo que LEFT OUTER JOIN, puede usarlos indistintamente." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
xfrutaxyfrutay
01Manzana2.0Manzana
12Naranja1.0Naranja
23BananoNaNNone
34UvaNaNNone
\n", "
" ], "text/plain": [ " x frutax y frutay\n", "0 1 Manzana 2.0 Manzana\n", "1 2 Naranja 1.0 Naranja\n", "2 3 Banano NaN None\n", "3 4 Uva NaN None" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sql = ''' \n", "SELECT x, frutaX, y, frutaY\n", "FROM canastaX\n", "LEFT JOIN canastaY \n", " ON frutaX = frutaY;\n", "'''\n", "\n", "pd.read_sql_query(sql, cnx)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
film_idtitleinventory_id
013Ali Forever67.0
113Ali Forever68.0
213Ali Forever69.0
313Ali Forever70.0
414Alice FantasiaNaN
515Alien Center72.0
615Alien Center73.0
715Alien Center74.0
815Alien Center75.0
915Alien Center76.0
1015Alien Center71.0
\n", "
" ], "text/plain": [ " film_id title inventory_id\n", "0 13 Ali Forever 67.0\n", "1 13 Ali Forever 68.0\n", "2 13 Ali Forever 69.0\n", "3 13 Ali Forever 70.0\n", "4 14 Alice Fantasia NaN\n", "5 15 Alien Center 72.0\n", "6 15 Alien Center 73.0\n", "7 15 Alien Center 74.0\n", "8 15 Alien Center 75.0\n", "9 15 Alien Center 76.0\n", "10 15 Alien Center 71.0" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Películas que están en inventario\n", "sql = ''' \n", "SELECT\n", "\tfilm.film_id,\n", "\ttitle,\n", "\tinventory_id\n", "FROM\n", "\tfilm\n", "LEFT JOIN inventory \n", " ON inventory.film_id = film.film_id\n", "WHERE film.film_id in (13,14,15)\n", "ORDER BY title;\n", "'''\n", "\n", "pd.read_sql_query(sql, cnx)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
film_idtitleinventory_id
014Alice FantasiaNone
133Apollo TeenNone
236Argonauts TownNone
338Ark RidgemontNone
441Arsenic IndependenceNone
587Boondock BallroomNone
6108Butch PantherNone
7128Catch AmistadNone
8144Chinatown GladiatorNone
9148Chocolate DuckNone
10171Commandments ExpressNone
11192Crossing DivorceNone
12195Crowds TelemarkNone
13198Crystal BreakingNone
14217Dazed PunkNone
15221Deliverance MulhollandNone
16318Firehouse VietnamNone
17325Floats GardenNone
18332Frankenstein StrangerNone
19359Gladiator WestwardNone
20386Gump DateNone
21404Hate HandicapNone
22419Hocus FridaNone
23495Kentuckian GiantNone
24497Kill BrotherhoodNone
25607Muppet MileNone
26642Order BetrayedNone
27669Pearl DestinyNone
28671Perdition FargoNone
29701Psycho ShrunkNone
30712Raiders AntitrustNone
31713Rainbow ShockNone
32742Roof ChampionNone
33801Sister FreddyNone
34802Sky MiracleNone
35860Suicides SilenceNone
36874Tadpole ParkNone
37909Treasure CommandNone
38943Villain DesperateNone
39950Volume HouseNone
40954Wake JawsNone
41955Walls ArtistNone
\n", "
" ], "text/plain": [ " film_id title inventory_id\n", "0 14 Alice Fantasia None\n", "1 33 Apollo Teen None\n", "2 36 Argonauts Town None\n", "3 38 Ark Ridgemont None\n", "4 41 Arsenic Independence None\n", "5 87 Boondock Ballroom None\n", "6 108 Butch Panther None\n", "7 128 Catch Amistad None\n", "8 144 Chinatown Gladiator None\n", "9 148 Chocolate Duck None\n", "10 171 Commandments Express None\n", "11 192 Crossing Divorce None\n", "12 195 Crowds Telemark None\n", "13 198 Crystal Breaking None\n", "14 217 Dazed Punk None\n", "15 221 Deliverance Mulholland None\n", "16 318 Firehouse Vietnam None\n", "17 325 Floats Garden None\n", "18 332 Frankenstein Stranger None\n", "19 359 Gladiator Westward None\n", "20 386 Gump Date None\n", "21 404 Hate Handicap None\n", "22 419 Hocus Frida None\n", "23 495 Kentuckian Giant None\n", "24 497 Kill Brotherhood None\n", "25 607 Muppet Mile None\n", "26 642 Order Betrayed None\n", "27 669 Pearl Destiny None\n", "28 671 Perdition Fargo None\n", "29 701 Psycho Shrunk None\n", "30 712 Raiders Antitrust None\n", "31 713 Rainbow Shock None\n", "32 742 Roof Champion None\n", "33 801 Sister Freddy None\n", "34 802 Sky Miracle None\n", "35 860 Suicides Silence None\n", "36 874 Tadpole Park None\n", "37 909 Treasure Command None\n", "38 943 Villain Desperate None\n", "39 950 Volume House None\n", "40 954 Wake Jaws None\n", "41 955 Walls Artist None" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Películas que NO están en inventario\n", "sql = ''' \n", "SELECT\n", "\tf.film_id,\n", "\ttitle,\n", "\tinventory_id\n", "FROM\n", "\tfilm f\n", "LEFT JOIN inventory i USING (film_id)\n", "WHERE i.film_id IS NULL\n", "ORDER BY title;\n", "'''\n", "\n", "pd.read_sql_query(sql, cnx)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Unión a la derecha - Right Join\n", "\n", "Es una versión inversa de la combinación izquierda. La combinación de la derecha comienza a realizar el mapeo de derecha a izquierda." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
xfrutaxyfrutay
01Manzana2Manzana
12Naranja1Naranja
\n", "
" ], "text/plain": [ " x frutax y frutay\n", "0 1 Manzana 2 Manzana\n", "1 2 Naranja 1 Naranja" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sql = ''' \n", "SELECT x, frutaX, y, frutaY\n", "FROM canastaX\n", "RIGTH JOIN canastaY \n", " ON frutaX = frutaY;\n", "'''\n", "\n", "pd.read_sql_query(sql, cnx)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Unión Externa Completa\n", "\n", "La combinación externa completa o la combinación completa devuelve un conjunto de resultados que contiene todas las filas de las tablas izquierda y derecha, con las filas coincidentes de ambos lados si están disponibles. En caso de que no haya coincidencias, las columnas de la tabla se llenarán con NULL.\n", "\n", "![imagen.png](https://sp.postgresqltutorial.com/wp-content/uploads/2018/12/PostgreSQL-Join-Full-Outer-Join.png)\n", "\n", "[Fuente](https://www.postgresqltutorial.com/postgresql-joins/)\n", "\n", "[Más Ejemplos](https://www.postgresqltutorial.com/postgresql-full-outer-join/)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
xfrutaxyfrutay
01.0Manzana2.0Manzana
12.0Naranja1.0Naranja
23.0BananoNaNNone
34.0UvaNaNNone
4NaNNone4.0Pera
5NaNNone3.0Melón
\n", "
" ], "text/plain": [ " x frutax y frutay\n", "0 1.0 Manzana 2.0 Manzana\n", "1 2.0 Naranja 1.0 Naranja\n", "2 3.0 Banano NaN None\n", "3 4.0 Uva NaN None\n", "4 NaN None 4.0 Pera\n", "5 NaN None 3.0 Melón" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sql = ''' \n", "SELECT x, frutaX, y, frutaY\n", "FROM canastaX\n", "FULL OUTER JOIN canastaY \n", " ON frutaX = frutaY;\n", "'''\n", "\n", "pd.read_sql_query(sql, cnx)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "El siguiente diagrama de Venn ilustra la combinación externa completa que devuelve filas de una tabla que no tienen las filas correspondientes en la otra tabla\n", "\n", "![imagen.png](https://sp.postgresqltutorial.com/wp-content/uploads/2018/12/PostgreSQL-Join-Full-Outer-Join-with-Where.png)\n", "\n", "[Fuente](https://www.postgresqltutorial.com/postgresql-joins/)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
xfrutaxyfrutay
03.0BananoNaNNone
14.0UvaNaNNone
2NaNNone4.0Pera
3NaNNone3.0Melón
\n", "
" ], "text/plain": [ " x frutax y frutay\n", "0 3.0 Banano NaN None\n", "1 4.0 Uva NaN None\n", "2 NaN None 4.0 Pera\n", "3 NaN None 3.0 Melón" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Para consultar las filas en una tabla que no tienen filas \n", "# coincidentes en la otra, usa la combinación completa\n", "\n", "sql = ''' \n", "SELECT x, frutaX, y, frutaY\n", "FROM canastaX\n", "FULL OUTER JOIN canastaY \n", " ON frutaX = frutaY\n", "WHERE x IS NULL OR y IS NULL;\n", "'''\n", "\n", "pd.read_sql_query(sql, cnx)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Autocombinación\n", "\n", "Cuando unes una tabla a sí misma (también conocida como autounión), necesitas usar alias de tabla. En el siguiente ejemplo un empleado puede ser el supervisor, gerente o administrador de otro." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sql = ''' \n", "DROP TABLE IF EXISTS employee;\n", "\n", "CREATE TABLE employee (\n", "\temployee_id INT PRIMARY KEY,\n", "\tfirst_name VARCHAR (255) NOT NULL,\n", "\tlast_name VARCHAR (255) NOT NULL,\n", "\tmanager_id INT,\n", "\tFOREIGN KEY (manager_id) \n", "\tREFERENCES employee (employee_id) \n", "\tON DELETE CASCADE\n", ");\n", "\n", "INSERT INTO employee (\n", "\temployee_id,\n", "\tfirst_name,\n", "\tlast_name,\n", "\tmanager_id\n", ")\n", "VALUES\n", "\t(1, 'Windy', 'Hays', NULL),\n", "\t(2, 'Ava', 'Christensen', 1),\n", "\t(3, 'Hassan', 'Conner', 1),\n", "\t(4, 'Anna', 'Reeves', 2),\n", "\t(5, 'Sau', 'Norman', 2),\n", "\t(6, 'Kelsie', 'Hays', 3),\n", "\t(7, 'Tory', 'Goff', 3),\n", "\t(8, 'Salley', 'Lester', 3);\n", "'''\n", "\n", "cnx.execute(sql)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "En la tabla employee, la columna manager_id hace referencia a la columna employee_id. El valor en la columna manager_id muestra el gerente al que el empleado reporta directamente. Cuando el valor de la columna manager_id es nulo, ese empleado no informa a nadie. En otras palabras, él o ella es el gerente superior." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
employeemanager
0Sau NormanAva Christensen
1Anna ReevesAva Christensen
2Salley LesterHassan Conner
3Kelsie HaysHassan Conner
4Tory GoffHassan Conner
5Ava ChristensenWindy Hays
6Hassan ConnerWindy Hays
\n", "
" ], "text/plain": [ " employee manager\n", "0 Sau Norman Ava Christensen\n", "1 Anna Reeves Ava Christensen\n", "2 Salley Lester Hassan Conner\n", "3 Kelsie Hays Hassan Conner\n", "4 Tory Goff Hassan Conner\n", "5 Ava Christensen Windy Hays\n", "6 Hassan Conner Windy Hays" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sql = ''' \n", "SELECT\n", " e.first_name || ' ' || e.last_name employee,\n", " m .first_name || ' ' || m .last_name manager\n", "FROM\n", " employee e\n", "INNER JOIN employee m ON m .employee_id = e.manager_id\n", "ORDER BY manager;\n", "'''\n", "\n", "pd.read_sql_query(sql, cnx)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sql = '''\n", "DROP TABLE employee;\n", "'''\n", "\n", "cnx.execute(sql)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Cláusula CROSS JOIN\n", "\n", "Permite producir un Producto cartesiano de filas en dos o más tablas.\n", "\n", "Si T1 tiene n filas y T2 tiene m filas, el conjunto de resultados tendrá nxm filas. Por ejemplo, T1 tiene 100 filas y T2 tiene 100 filas, el conjunto de resultados tendrá 100 x 100= 10000 filas." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sql = '''\n", "DROP TABLE IF EXISTS T1;\n", "CREATE TABLE T1 (label CHAR(1) PRIMARY KEY);\n", "\n", "DROP TABLE IF EXISTS T2;\n", "CREATE TABLE T2 (score INT PRIMARY KEY);\n", "\n", "INSERT INTO T1 (label)\n", "VALUES\n", "\t('A'),\n", "\t('B');\n", "\n", "INSERT INTO T2 (score)\n", "VALUES\n", "\t(1),\n", "\t(2),\n", "\t(3);\n", "'''\n", "\n", "cnx.execute(sql)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
labelscore
0A1
1B1
2A2
3B2
4A3
5B3
\n", "
" ], "text/plain": [ " label score\n", "0 A 1\n", "1 B 1\n", "2 A 2\n", "3 B 2\n", "4 A 3\n", "5 B 3" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sql = ''' \n", "SELECT *\n", "FROM T1\n", "CROSS JOIN T2;\n", "'''\n", "\n", "pd.read_sql_query(sql, cnx)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Eliminar las tablas del ejemplo\n", "sql = '''\n", "DROP TABLE IF EXISTS T1;\n", "DROP TABLE IF EXISTS T2;\n", "'''\n", "\n", "cnx.execute(sql)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Cláusula NATURAL JOIN\n", "\n", "Una combinación natural es una combinación que crea una combinación implícita basada en los mismos nombres de columna en las tablas unidas." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Cada categoría tiene cero o muchos productos y cada producto pertenece a una y solo una categoría.\n", "\n", "sql = '''\n", "DROP TABLE IF EXISTS categories;\n", "CREATE TABLE categories (\n", "\tcategory_id serial PRIMARY KEY,\n", "\tcategory_name VARCHAR (255) NOT NULL\n", ");\n", "\n", "DROP TABLE IF EXISTS products;\n", "CREATE TABLE products (\n", "\tproduct_id serial PRIMARY KEY,\n", "\tproduct_name VARCHAR (255) NOT NULL,\n", "\tcategory_id INT NOT NULL,\n", "\tFOREIGN KEY (category_id) REFERENCES categories (category_id)\n", ");\n", "'''\n", "\n", "cnx.execute(sql)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sql = '''\n", "INSERT INTO categories (category_name)\n", "VALUES\n", "\t('Smart Phone'),\n", "\t('Laptop'),\n", "\t('Tablet');\n", "\n", "INSERT INTO products (product_name, category_id)\n", "VALUES\n", "\t('iPhone', 1),\n", "\t('Samsung Galaxy', 1),\n", "\t('HP Elite', 2),\n", "\t('Lenovo Thinkpad', 2),\n", "\t('iPad', 3),\n", "\t('Kindle Fire', 3);\n", "'''\n", "\n", "cnx.execute(sql)" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
category_idproduct_idproduct_namecategory_name
011iPhoneSmart Phone
112Samsung GalaxySmart Phone
223HP EliteLaptop
324Lenovo ThinkpadLaptop
435iPadTablet
536Kindle FireTablet
\n", "
" ], "text/plain": [ " category_id product_id product_name category_name\n", "0 1 1 iPhone Smart Phone\n", "1 1 2 Samsung Galaxy Smart Phone\n", "2 2 3 HP Elite Laptop\n", "3 2 4 Lenovo Thinkpad Laptop\n", "4 3 5 iPad Tablet\n", "5 3 6 Kindle Fire Tablet" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# NATURAL JOIN\n", "sql = ''' \n", "SELECT * FROM products\n", "NATURAL JOIN categories;\n", "'''\n", "\n", "pd.read_sql_query(sql, cnx)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
category_idproduct_idproduct_namecategory_name
011iPhoneSmart Phone
112Samsung GalaxySmart Phone
223HP EliteLaptop
324Lenovo ThinkpadLaptop
435iPadTablet
536Kindle FireTablet
\n", "
" ], "text/plain": [ " category_id product_id product_name category_name\n", "0 1 1 iPhone Smart Phone\n", "1 1 2 Samsung Galaxy Smart Phone\n", "2 2 3 HP Elite Laptop\n", "3 2 4 Lenovo Thinkpad Laptop\n", "4 3 5 iPad Tablet\n", "5 3 6 Kindle Fire Tablet" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Equivalente al NATURAL JOIN\n", "sql = ''' \n", "SELECT * FROM products\n", "INNER JOIN categories USING (category_id);\n", "'''\n", "\n", "pd.read_sql_query(sql, cnx)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sql = '''\n", "DROP TABLE IF EXISTS products;\n", "DROP TABLE IF EXISTS categories;\n", "'''\n", "\n", "cnx.execute(sql)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 4 }