Operaciones básicas sobre archivos de texto#
Descarga de un archivo de ejemplo con wget#
[1]:
%%bash
wget https://raw.githubusercontent.com/jdvelasq/playground/master/datasets/orders.csv
iconv -f ISO-8859-1 -t utf-8 orders.csv -o orders1.csv
mv orders1.csv orders.csv
--2024-03-25 23:55:11-- https://raw.githubusercontent.com/jdvelasq/playground/master/datasets/orders.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.108.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 434349 (424K) [text/plain]
Saving to: ‘orders.csv’
0K .......... .......... .......... .......... .......... 11% 2.07M 0s
50K .......... .......... .......... .......... .......... 23% 5.61M 0s
100K .......... .......... .......... .......... .......... 35% 3.39M 0s
150K .......... .......... .......... .......... .......... 47% 6.96M 0s
200K .......... .......... .......... .......... .......... 58% 10.4M 0s
250K .......... .......... .......... .......... .......... 70% 3.94M 0s
300K .......... .......... .......... .......... .......... 82% 11.4M 0s
350K .......... .......... .......... .......... .......... 94% 9.77M 0s
400K .......... .......... .... 100% 11.7M=0.08s
2024-03-25 23:55:11 (5.05 MB/s) - ‘orders.csv’ saved [434349/434349]
cat#
[2]:
%%bash
# Imprime todo el contenido del archivo en pantalla
# No se usa ya que el archivo es muy grande
# cat order.csv
head#
[3]:
%%bash
#
# Visualización de la cabecera del archivo
#
head orders.csv
Order ID;Order Date;Customer Name;City;Country;Region;Segment;Ship Date;Ship Mode;State
BN-2011-7407039;1/01/11;Ruby Patel;Stockholm;Sweden;North;Home Office;5/01/11;Economy Plus;Stockholm
AZ-2011-9050313;3/01/11;Summer Hayward;Southport;United Kingdom;North;Consumer;7/01/11;Economy;England
AZ-2011-6674300;4/01/11;Devin Huddleston;Valence;France;Central;Consumer;8/01/11;Economy;Auvergne-Rhne-Alpes
BN-2011-2819714;4/01/11;Mary Parker;Birmingham;United Kingdom;North;Corporate;9/01/11;Economy;England
AZ-2011-617423;5/01/11;Daniel Burke;Echirolles;France;Central;Home Office;7/01/11;Priority;Auvergne-Rhne-Alpes
AZ-2011-2918397;7/01/11;Fredrick Beveridge;La Seyne-sur-Mer;France;Central;Corporate;8/01/11;Priority;Provence-Alpes-Cte d'Azur
BN-2011-3248724;8/01/11;Archer Hort;Toulouse;France;Central;Consumer;14/01/11;Economy;Languedoc-Roussillon-Midi-Pyrnes
AZ-2011-6712797;11/01/11;Evie Flockhart;Genoa;Italy;South;Consumer;16/01/11;Economy;Liguria
AZ-2011-4827146;11/01/11;Faith Greenwood;Vienna;Austria;Central;Consumer;15/01/11;Economy;Vienna
[4]:
%%bash
#
# Primeras 5 lineas del archivo
#
head -n 5 orders.csv
Order ID;Order Date;Customer Name;City;Country;Region;Segment;Ship Date;Ship Mode;State
BN-2011-7407039;1/01/11;Ruby Patel;Stockholm;Sweden;North;Home Office;5/01/11;Economy Plus;Stockholm
AZ-2011-9050313;3/01/11;Summer Hayward;Southport;United Kingdom;North;Consumer;7/01/11;Economy;England
AZ-2011-6674300;4/01/11;Devin Huddleston;Valence;France;Central;Consumer;8/01/11;Economy;Auvergne-Rhne-Alpes
BN-2011-2819714;4/01/11;Mary Parker;Birmingham;United Kingdom;North;Corporate;9/01/11;Economy;England
tail#
[5]:
%%bash
#
# Visualización de la cola del archivo
#
tail orders.csv
AZ-2014-157670;30/12/14;Max Ludwig;Edinburgh;United Kingdom;North;Home Office;4/01/15;Economy;Scotland
AZ-2014-436448;30/12/14;Georgia Arundale;Naples;Italy;South;Corporate;5/01/15;Economy;Campania
AZ-2014-3870231;30/12/14;Thomas Thompson;Basel;Switzerland;Central;Corporate;2/01/15;Priority;Basel-Stadt
BN-2014-8679573;30/12/14;Dennis Conaway;The Hague;Netherlands;Central;Consumer;1/01/15;Priority;South Holland
AZ-2014-4217323;31/12/14;Evie Morton;Caen;France;Central;Consumer;2/01/15;Economy Plus;Normandy
AZ-2014-8174835;31/12/14;Eloise Sykes;Bielefeld;Germany;Central;Consumer;4/01/15;Economy;North Rhine-Westphalia
AZ-2014-766953;31/12/14;Jose Gambino;Maidenhead;United Kingdom;North;Corporate;5/01/15;Economy;England
AZ-2014-1412225;31/12/14;Leon Barnes;Worcester;United Kingdom;North;Consumer;1/01/15;Priority;England
AZ-2014-7604524;31/12/14;Rebecca Chamberlain;Hamburg;Germany;Central;Home Office;4/01/15;Economy;Hamburg
BN-2014-4140795;31/12/14;Daniel Hamilton;Eindhoven;Netherlands;Central;Home Office;5/01/15;Economy Plus;North Brabant
[6]:
%%bash
#
# Visualización de las últimas 3 lineas del archivo
#
tail -n 3 orders.csv
AZ-2014-1412225;31/12/14;Leon Barnes;Worcester;United Kingdom;North;Consumer;1/01/15;Priority;England
AZ-2014-7604524;31/12/14;Rebecca Chamberlain;Hamburg;Germany;Central;Home Office;4/01/15;Economy;Hamburg
BN-2014-4140795;31/12/14;Daniel Hamilton;Eindhoven;Netherlands;Central;Home Office;5/01/15;Economy Plus;North Brabant
[7]:
%%bash
#
# Visualización desde la linea 4115 hasta el final
#
tail +4115 orders.csv
AZ-2014-766953;31/12/14;Jose Gambino;Maidenhead;United Kingdom;North;Corporate;5/01/15;Economy;England
AZ-2014-1412225;31/12/14;Leon Barnes;Worcester;United Kingdom;North;Consumer;1/01/15;Priority;England
AZ-2014-7604524;31/12/14;Rebecca Chamberlain;Hamburg;Germany;Central;Home Office;4/01/15;Economy;Hamburg
BN-2014-4140795;31/12/14;Daniel Hamilton;Eindhoven;Netherlands;Central;Home Office;5/01/15;Economy Plus;North Brabant
more y less#
Unix también proporciona los comandos more
(versión más antigua) y less
para la visualización del contenido de archivos. Simplemente digite less
``nombrearchivo`` para iniciar la visualización.
Use la tecla
Space
para avanzar una página.Use
Ctrl-F
(Forward) yCtrl-B
(Backward) para avanzar o retroceder una página.Use las teclas arriba y abajo para moverse una línea a la vez.
Digite el número de línea y
G
(go to) para ir a una línea determinada.Digite
q
para salir deless
nl#
[8]:
%%bash
#
# Numeración de las lineas
#
nl orders.csv | tail -n 5
4114 AZ-2014-8174835;31/12/14;Eloise Sykes;Bielefeld;Germany;Central;Consumer;4/01/15;Economy;North Rhine-Westphalia
4115 AZ-2014-766953;31/12/14;Jose Gambino;Maidenhead;United Kingdom;North;Corporate;5/01/15;Economy;England
4116 AZ-2014-1412225;31/12/14;Leon Barnes;Worcester;United Kingdom;North;Consumer;1/01/15;Priority;England
4117 AZ-2014-7604524;31/12/14;Rebecca Chamberlain;Hamburg;Germany;Central;Home Office;4/01/15;Economy;Hamburg
4118 BN-2014-4140795;31/12/14;Daniel Hamilton;Eindhoven;Netherlands;Central;Home Office;5/01/15;Economy Plus;North Brabant
[9]:
%%bash
#
# Visualización de las columnas del archivo
#
head -n 1 orders.csv | tr ';' '\n'
Order ID
Order Date
Customer Name
City
Country
Region
Segment
Ship Date
Ship Mode
State
[10]:
%%bash
# Conteo de la cantidad de columnas
head -n 1 orders.csv | tr ';' '\n' | wc -l
10
[11]:
%%bash
#
# Obtención de un subconjunto de registros al inicio del archivo
#
head -n 11 orders.csv > orders-head.csv
cat orders-head.csv
Order ID;Order Date;Customer Name;City;Country;Region;Segment;Ship Date;Ship Mode;State
BN-2011-7407039;1/01/11;Ruby Patel;Stockholm;Sweden;North;Home Office;5/01/11;Economy Plus;Stockholm
AZ-2011-9050313;3/01/11;Summer Hayward;Southport;United Kingdom;North;Consumer;7/01/11;Economy;England
AZ-2011-6674300;4/01/11;Devin Huddleston;Valence;France;Central;Consumer;8/01/11;Economy;Auvergne-Rhne-Alpes
BN-2011-2819714;4/01/11;Mary Parker;Birmingham;United Kingdom;North;Corporate;9/01/11;Economy;England
AZ-2011-617423;5/01/11;Daniel Burke;Echirolles;France;Central;Home Office;7/01/11;Priority;Auvergne-Rhne-Alpes
AZ-2011-2918397;7/01/11;Fredrick Beveridge;La Seyne-sur-Mer;France;Central;Corporate;8/01/11;Priority;Provence-Alpes-Cte d'Azur
BN-2011-3248724;8/01/11;Archer Hort;Toulouse;France;Central;Consumer;14/01/11;Economy;Languedoc-Roussillon-Midi-Pyrnes
AZ-2011-6712797;11/01/11;Evie Flockhart;Genoa;Italy;South;Consumer;16/01/11;Economy;Liguria
AZ-2011-4827146;11/01/11;Faith Greenwood;Vienna;Austria;Central;Consumer;15/01/11;Economy;Vienna
AZ-2011-6439906;11/01/11;Summer Hayward;Murcia;Spain;South;Consumer;15/01/11;Economy;Murcia
[12]:
%%bash
#
# Obtención de un subconjunto de registros al final del archivo
#
tail -n 10 orders.csv > orders-tail.csv
cat orders-tail.csv
AZ-2014-157670;30/12/14;Max Ludwig;Edinburgh;United Kingdom;North;Home Office;4/01/15;Economy;Scotland
AZ-2014-436448;30/12/14;Georgia Arundale;Naples;Italy;South;Corporate;5/01/15;Economy;Campania
AZ-2014-3870231;30/12/14;Thomas Thompson;Basel;Switzerland;Central;Corporate;2/01/15;Priority;Basel-Stadt
BN-2014-8679573;30/12/14;Dennis Conaway;The Hague;Netherlands;Central;Consumer;1/01/15;Priority;South Holland
AZ-2014-4217323;31/12/14;Evie Morton;Caen;France;Central;Consumer;2/01/15;Economy Plus;Normandy
AZ-2014-8174835;31/12/14;Eloise Sykes;Bielefeld;Germany;Central;Consumer;4/01/15;Economy;North Rhine-Westphalia
AZ-2014-766953;31/12/14;Jose Gambino;Maidenhead;United Kingdom;North;Corporate;5/01/15;Economy;England
AZ-2014-1412225;31/12/14;Leon Barnes;Worcester;United Kingdom;North;Consumer;1/01/15;Priority;England
AZ-2014-7604524;31/12/14;Rebecca Chamberlain;Hamburg;Germany;Central;Home Office;4/01/15;Economy;Hamburg
BN-2014-4140795;31/12/14;Daniel Hamilton;Eindhoven;Netherlands;Central;Home Office;5/01/15;Economy Plus;North Brabant
[13]:
%%bash
#
# Obtención de un subconjunto de registros en un punto intermedio
#
head -n 11 orders.csv | tail -n 6 > orders-med.csv
cat orders-med.csv
AZ-2011-617423;5/01/11;Daniel Burke;Echirolles;France;Central;Home Office;7/01/11;Priority;Auvergne-Rhne-Alpes
AZ-2011-2918397;7/01/11;Fredrick Beveridge;La Seyne-sur-Mer;France;Central;Corporate;8/01/11;Priority;Provence-Alpes-Cte d'Azur
BN-2011-3248724;8/01/11;Archer Hort;Toulouse;France;Central;Consumer;14/01/11;Economy;Languedoc-Roussillon-Midi-Pyrnes
AZ-2011-6712797;11/01/11;Evie Flockhart;Genoa;Italy;South;Consumer;16/01/11;Economy;Liguria
AZ-2011-4827146;11/01/11;Faith Greenwood;Vienna;Austria;Central;Consumer;15/01/11;Economy;Vienna
AZ-2011-6439906;11/01/11;Summer Hayward;Murcia;Spain;South;Consumer;15/01/11;Economy;Murcia
[14]:
%%bash
#
# Obtención de un grupo de columnas con cut
#
cut -d"," -f2 orders-head.csv
Order ID;Order Date;Customer Name;City;Country;Region;Segment;Ship Date;Ship Mode;State
BN-2011-7407039;1/01/11;Ruby Patel;Stockholm;Sweden;North;Home Office;5/01/11;Economy Plus;Stockholm
AZ-2011-9050313;3/01/11;Summer Hayward;Southport;United Kingdom;North;Consumer;7/01/11;Economy;England
AZ-2011-6674300;4/01/11;Devin Huddleston;Valence;France;Central;Consumer;8/01/11;Economy;Auvergne-Rhne-Alpes
BN-2011-2819714;4/01/11;Mary Parker;Birmingham;United Kingdom;North;Corporate;9/01/11;Economy;England
AZ-2011-617423;5/01/11;Daniel Burke;Echirolles;France;Central;Home Office;7/01/11;Priority;Auvergne-Rhne-Alpes
AZ-2011-2918397;7/01/11;Fredrick Beveridge;La Seyne-sur-Mer;France;Central;Corporate;8/01/11;Priority;Provence-Alpes-Cte d'Azur
BN-2011-3248724;8/01/11;Archer Hort;Toulouse;France;Central;Consumer;14/01/11;Economy;Languedoc-Roussillon-Midi-Pyrnes
AZ-2011-6712797;11/01/11;Evie Flockhart;Genoa;Italy;South;Consumer;16/01/11;Economy;Liguria
AZ-2011-4827146;11/01/11;Faith Greenwood;Vienna;Austria;Central;Consumer;15/01/11;Economy;Vienna
AZ-2011-6439906;11/01/11;Summer Hayward;Murcia;Spain;South;Consumer;15/01/11;Economy;Murcia
[15]:
%%bash
#
# Obtención de un grupo de columnas con cut
#
cut -d"," -f2,4-6 orders-head.csv
Order ID;Order Date;Customer Name;City;Country;Region;Segment;Ship Date;Ship Mode;State
BN-2011-7407039;1/01/11;Ruby Patel;Stockholm;Sweden;North;Home Office;5/01/11;Economy Plus;Stockholm
AZ-2011-9050313;3/01/11;Summer Hayward;Southport;United Kingdom;North;Consumer;7/01/11;Economy;England
AZ-2011-6674300;4/01/11;Devin Huddleston;Valence;France;Central;Consumer;8/01/11;Economy;Auvergne-Rhne-Alpes
BN-2011-2819714;4/01/11;Mary Parker;Birmingham;United Kingdom;North;Corporate;9/01/11;Economy;England
AZ-2011-617423;5/01/11;Daniel Burke;Echirolles;France;Central;Home Office;7/01/11;Priority;Auvergne-Rhne-Alpes
AZ-2011-2918397;7/01/11;Fredrick Beveridge;La Seyne-sur-Mer;France;Central;Corporate;8/01/11;Priority;Provence-Alpes-Cte d'Azur
BN-2011-3248724;8/01/11;Archer Hort;Toulouse;France;Central;Consumer;14/01/11;Economy;Languedoc-Roussillon-Midi-Pyrnes
AZ-2011-6712797;11/01/11;Evie Flockhart;Genoa;Italy;South;Consumer;16/01/11;Economy;Liguria
AZ-2011-4827146;11/01/11;Faith Greenwood;Vienna;Austria;Central;Consumer;15/01/11;Economy;Vienna
AZ-2011-6439906;11/01/11;Summer Hayward;Murcia;Spain;South;Consumer;15/01/11;Economy;Murcia
[16]:
%%bash
#
# Obtención de un grupo de columnas con cut
#
head orders.csv | cut -d";" -f2
Order Date
1/01/11
3/01/11
4/01/11
4/01/11
5/01/11
7/01/11
8/01/11
11/01/11
11/01/11
[17]:
%%bash
#
# Ordenamiento de líneas
#
head -n 20 orders.csv | tail +2 | cut -d";" -f6 | sort
Central
Central
Central
Central
Central
Central
Central
Central
Central
North
North
North
North
North
North
North
North
South
South
[18]:
%%bash
#
# Obtención de líneas únicas (parte 1)
#
head -n 20 orders.csv | tail +2 | cut -d";" -f6
North
North
Central
North
Central
Central
Central
South
Central
South
North
Central
North
North
Central
North
Central
Central
North
[19]:
%%bash
#
# Obtención de líneas únicas (parte 2)
#
head -n 20 orders.csv | tail +2 | cut -d";" -f6 | uniq
North
Central
North
Central
South
Central
South
North
Central
North
Central
North
Central
North
[20]:
%%bash
#
# Obtención de líneas únicas (parte 3)
#
head -n 20 orders.csv | tail +2 | cut -d";" -f6 | sort | uniq
Central
North
South
[21]:
%%bash
#
# Conteo de la cantidad de regiones
#
head -n 20 orders.csv | tail +2 | cut -d";" -f6 | sort | uniq | wc -l
3
[22]:
%%bash
#
# Búsqueda de patrones con grep
#
grep 'Central' orders.csv | head
AZ-2011-6674300;4/01/11;Devin Huddleston;Valence;France;Central;Consumer;8/01/11;Economy;Auvergne-Rhne-Alpes
AZ-2011-617423;5/01/11;Daniel Burke;Echirolles;France;Central;Home Office;7/01/11;Priority;Auvergne-Rhne-Alpes
AZ-2011-2918397;7/01/11;Fredrick Beveridge;La Seyne-sur-Mer;France;Central;Corporate;8/01/11;Priority;Provence-Alpes-Cte d'Azur
BN-2011-3248724;8/01/11;Archer Hort;Toulouse;France;Central;Consumer;14/01/11;Economy;Languedoc-Roussillon-Midi-Pyrnes
AZ-2011-4827146;11/01/11;Faith Greenwood;Vienna;Austria;Central;Consumer;15/01/11;Economy;Vienna
AZ-2011-5702370;12/01/11;Hershel Snyder;Lohne;Germany;Central;Corporate;19/01/11;Economy;Lower Saxony
BN-2011-4913858;13/01/11;Julian Dobie;Dordrecht;Netherlands;Central;Consumer;19/01/11;Economy;South Holland
AZ-2011-5960662;14/01/11;Ella Troy;Vienna;Austria;Central;Home Office;19/01/11;Economy;Vienna
AZ-2011-7675351;15/01/11;Everett Dunbar;Langen;Germany;Central;Corporate;20/01/11;Economy Plus;Lower Saxony
AZ-2011-2002251;20/01/11;Nathan Iqbal;Villiers-sur-Marne;France;Central;Consumer;25/01/11;Economy;Ile-de-France
[23]:
%%bash
#
# Lineas que tengan Germany
#
cat orders.csv | grep Germany | head -n 5
AZ-2011-5702370;12/01/11;Hershel Snyder;Lohne;Germany;Central;Corporate;19/01/11;Economy;Lower Saxony
AZ-2011-7675351;15/01/11;Everett Dunbar;Langen;Germany;Central;Corporate;20/01/11;Economy Plus;Lower Saxony
AZ-2011-5357101;21/01/11;Noah Chamberlain;Bielefeld;Germany;Central;Consumer;26/01/11;Economy;North Rhine-Westphalia
AZ-2011-4205736;25/01/11;David Harney;Menden;Germany;Central;Corporate;1/02/11;Economy;North Rhine-Westphalia
AZ-2011-2825684;1/02/11;Hollie Norris;Halle;Germany;Central;Consumer;7/02/11;Economy;North Rhine-Westphalia
[24]:
%%bash
#
# Se desean ver las 3 lineas antes de la primera aparición de Germany
#
nl orders.csv | grep Germany | head -n 5
13 AZ-2011-5702370;12/01/11;Hershel Snyder;Lohne;Germany;Central;Corporate;19/01/11;Economy;Lower Saxony
19 AZ-2011-7675351;15/01/11;Everett Dunbar;Langen;Germany;Central;Corporate;20/01/11;Economy Plus;Lower Saxony
26 AZ-2011-5357101;21/01/11;Noah Chamberlain;Bielefeld;Germany;Central;Consumer;26/01/11;Economy;North Rhine-Westphalia
31 AZ-2011-4205736;25/01/11;David Harney;Menden;Germany;Central;Corporate;1/02/11;Economy;North Rhine-Westphalia
37 AZ-2011-2825684;1/02/11;Hollie Norris;Halle;Germany;Central;Consumer;7/02/11;Economy;North Rhine-Westphalia
[25]:
%%bash
#
# La primera aparición es en la línea 13, entonces se desean visualizar las
# lineas 10, 11 y 12
#
nl orders.csv | tail +10 | head -n 3
10 AZ-2011-4827146;11/01/11;Faith Greenwood;Vienna;Austria;Central;Consumer;15/01/11;Economy;Vienna
11 AZ-2011-6439906;11/01/11;Summer Hayward;Murcia;Spain;South;Consumer;15/01/11;Economy;Murcia
12 AZ-2011-7053593;11/01/11;Gracie Powell;Woking;United Kingdom;North;Consumer;11/01/11;Immediate;England
Borrado de los archivos temporales generados
[26]:
%%bash
rm *.csv*