{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Regresión Lineal\n",
"\n",
"Ecuación de Regresión:\n",
"\\begin{equation}\n",
"Y_i = \\beta_0 + \\beta_1 X_i + \\epsilon_i\n",
"\\end{equation}\n",
"\n",
"\n",
"Ecuación de la Pendiente:\n",
"\\begin{equation}\n",
"\\hat{\\beta}_1 = \\frac{(X_i - \\bar{X})} {(Y_i - \\bar{Y})}\n",
"\\end{equation}\n",
"\n",
"Este ejercicio se a adaptado de \"Linear Regression in Julia\" por Silaparasetty, V.\n",
"\n",
"[Descargar una muestra de los precios de acciones New York Stock Exchange](https://raw.githubusercontent.com/fernanvilla/data/main/nystocks.csv)\n",
"\n",
"[El conjunto completo de datos de precios](https://www.kaggle.com/dgawlik/nyse)\n",
" \n",
"[Otro Ejemplo Recomendado de Regresión Lineal](https://www.machinelearningplus.com/linear-regression-in-julia/)\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"\u001b[32m\u001b[1m Resolving\u001b[22m\u001b[39m package versions...\n",
"\u001b[32m\u001b[1mNo Changes\u001b[22m\u001b[39m to `C:\\Users\\Fernan\\.julia\\environments\\v1.5\\Project.toml`\n",
"\u001b[32m\u001b[1mNo Changes\u001b[22m\u001b[39m to `C:\\Users\\Fernan\\.julia\\environments\\v1.5\\Manifest.toml`\n",
"\u001b[32m\u001b[1m Resolving\u001b[22m\u001b[39m package versions...\n",
"\u001b[32m\u001b[1mNo Changes\u001b[22m\u001b[39m to `C:\\Users\\Fernan\\.julia\\environments\\v1.5\\Project.toml`\n",
"\u001b[32m\u001b[1mNo Changes\u001b[22m\u001b[39m to `C:\\Users\\Fernan\\.julia\\environments\\v1.5\\Manifest.toml`\n",
"\u001b[32m\u001b[1m Resolving\u001b[22m\u001b[39m package versions...\n",
"\u001b[32m\u001b[1mNo Changes\u001b[22m\u001b[39m to `C:\\Users\\Fernan\\.julia\\environments\\v1.5\\Project.toml`\n",
"\u001b[32m\u001b[1mNo Changes\u001b[22m\u001b[39m to `C:\\Users\\Fernan\\.julia\\environments\\v1.5\\Manifest.toml`\n",
"\u001b[32m\u001b[1m Resolving\u001b[22m\u001b[39m package versions...\n",
"\u001b[32m\u001b[1mNo Changes\u001b[22m\u001b[39m to `C:\\Users\\Fernan\\.julia\\environments\\v1.5\\Project.toml`\n",
"\u001b[32m\u001b[1mNo Changes\u001b[22m\u001b[39m to `C:\\Users\\Fernan\\.julia\\environments\\v1.5\\Manifest.toml`\n",
"\u001b[32m\u001b[1m Resolving\u001b[22m\u001b[39m package versions...\n",
"\u001b[32m\u001b[1mNo Changes\u001b[22m\u001b[39m to `C:\\Users\\Fernan\\.julia\\environments\\v1.5\\Project.toml`\n",
"\u001b[32m\u001b[1mNo Changes\u001b[22m\u001b[39m to `C:\\Users\\Fernan\\.julia\\environments\\v1.5\\Manifest.toml`\n",
"\u001b[32m\u001b[1m Resolving\u001b[22m\u001b[39m package versions...\n",
"\u001b[32m\u001b[1mNo Changes\u001b[22m\u001b[39m to `C:\\Users\\Fernan\\.julia\\environments\\v1.5\\Project.toml`\n",
"\u001b[32m\u001b[1mNo Changes\u001b[22m\u001b[39m to `C:\\Users\\Fernan\\.julia\\environments\\v1.5\\Manifest.toml`\n",
"\u001b[32m\u001b[1m Resolving\u001b[22m\u001b[39m package versions...\n",
"\u001b[32m\u001b[1mNo Changes\u001b[22m\u001b[39m to `C:\\Users\\Fernan\\.julia\\environments\\v1.5\\Project.toml`\n",
"\u001b[32m\u001b[1mNo Changes\u001b[22m\u001b[39m to `C:\\Users\\Fernan\\.julia\\environments\\v1.5\\Manifest.toml`\n",
"\u001b[32m\u001b[1m Resolving\u001b[22m\u001b[39m package versions...\n",
"\u001b[32m\u001b[1mNo Changes\u001b[22m\u001b[39m to `C:\\Users\\Fernan\\.julia\\environments\\v1.5\\Project.toml`\n",
"\u001b[32m\u001b[1mNo Changes\u001b[22m\u001b[39m to `C:\\Users\\Fernan\\.julia\\environments\\v1.5\\Manifest.toml`\n",
"\u001b[32m\u001b[1m Resolving\u001b[22m\u001b[39m package versions...\n",
"\u001b[32m\u001b[1mNo Changes\u001b[22m\u001b[39m to `C:\\Users\\Fernan\\.julia\\environments\\v1.5\\Project.toml`\n",
"\u001b[32m\u001b[1mNo Changes\u001b[22m\u001b[39m to `C:\\Users\\Fernan\\.julia\\environments\\v1.5\\Manifest.toml`\n",
"\u001b[32m\u001b[1m Resolving\u001b[22m\u001b[39m package versions...\n",
"\u001b[32m\u001b[1mNo Changes\u001b[22m\u001b[39m to `C:\\Users\\Fernan\\.julia\\environments\\v1.5\\Project.toml`\n",
"\u001b[32m\u001b[1mNo Changes\u001b[22m\u001b[39m to `C:\\Users\\Fernan\\.julia\\environments\\v1.5\\Manifest.toml`\n",
"\u001b[32m\u001b[1m Resolving\u001b[22m\u001b[39m package versions...\n",
"\u001b[32m\u001b[1mNo Changes\u001b[22m\u001b[39m to `C:\\Users\\Fernan\\.julia\\environments\\v1.5\\Project.toml`\n",
"\u001b[32m\u001b[1mNo Changes\u001b[22m\u001b[39m to `C:\\Users\\Fernan\\.julia\\environments\\v1.5\\Manifest.toml`\n",
"\u001b[32m\u001b[1m Building\u001b[22m\u001b[39m CodecZlib → `C:\\Users\\Fernan\\.julia\\packages\\CodecZlib\\5t9zO\\deps\\build.log`\n"
]
}
],
"source": [
"# Import Packages\n",
"using Pkg # Package to install new packages\n",
"\n",
"# Install packages \n",
"Pkg.add(\"DataFrames\")\n",
"Pkg.add(\"CSV\")\n",
"Pkg.add(\"CSVFiles\")\n",
"Pkg.add(\"Plots\")\n",
"Pkg.add(\"Lathe\")\n",
"Pkg.add(\"GLM\")\n",
"Pkg.add(\"StatsPlots\")\n",
"Pkg.add(\"MLBase\")\n",
"Pkg.add(\"Missings\")\n",
"Pkg.add(\"Statistics\")\n",
"Pkg.add(\"Plots\")\n",
"Pkg.build(\"CodecZlib\")"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"# Cargar los paquetes instalados\n",
"using DataFrames\n",
"using CSV\n",
"using CSVFiles\n",
"using Plots\n",
"using Lathe\n",
"using GLM\n",
"using Statistics\n",
"using StatsPlots\n",
"using MLBase\n",
"using Missings"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"5×7 DataFrame\n",
"│ Row │ date │ symbol │ open │ close │ low │ high │ volume │\n",
"│ │ \u001b[90mString\u001b[39m │ \u001b[90mString\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mInt64\u001b[39m │\n",
"├─────┼────────────┼────────┼─────────┼─────────┼─────────┼─────────┼───────────┤\n",
"│ 1 │ 04-01-2010 │ A │ 31.39 │ 31.3 │ 31.13 │ 31.63 │ 3815500 │\n",
"│ 2 │ 04-01-2010 │ AAP │ 40.7 │ 40.38 │ 40.36 │ 41.04 │ 1701700 │\n",
"│ 3 │ 04-01-2010 │ AAPL │ 213.43 │ 214.01 │ 212.38 │ 214.5 │ 123432400 │\n",
"│ 4 │ 04-01-2010 │ ABC │ 26.29 │ 26.63 │ 26.14 │ 26.69 │ 2455900 │\n",
"│ 5 │ 04-01-2010 │ ABT │ 54.19 │ 54.46 │ 53.92 │ 54.56 │ 10829000 │\n"
]
}
],
"source": [
"# Carga el archivo CSV en un DataFrame\n",
"# para más detalles consultar -> https://juliapackages.com/p/csvfiles\n",
"\n",
"using CSVFiles, DataFrames\n",
"\n",
"df = DataFrame(load(\"./Downloads/nystocks.csv\"))\n",
"\n",
"println(first(df,5))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exploración de los Datos"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"7-element Array{String,1}:\n",
" \"date\"\n",
" \"symbol\"\n",
" \"open\"\n",
" \"close\"\n",
" \"low\"\n",
" \"high\"\n",
" \"volume\""
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Variables Disponibles\n",
"names(df)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
| date | symbol | open | close | low | high | volume |
---|
| String | String | Float64 | Float64 | Float64 | Float64 | Int64 |
---|
5 rows × 7 columns
1 | 04-01-2010 | A | 31.39 | 31.3 | 31.13 | 31.63 | 3815500 |
---|
2 | 04-01-2010 | AAP | 40.7 | 40.38 | 40.36 | 41.04 | 1701700 |
---|
3 | 04-01-2010 | AAPL | 213.43 | 214.01 | 212.38 | 214.5 | 123432400 |
---|
4 | 04-01-2010 | ABC | 26.29 | 26.63 | 26.14 | 26.69 | 2455900 |
---|
5 | 04-01-2010 | ABT | 54.19 | 54.46 | 53.92 | 54.56 | 10829000 |
---|
"
],
"text/latex": [
"\\begin{tabular}{r|ccccccc}\n",
"\t& date & symbol & open & close & low & high & volume\\\\\n",
"\t\\hline\n",
"\t& String & String & Float64 & Float64 & Float64 & Float64 & Int64\\\\\n",
"\t\\hline\n",
"\t1 & 04-01-2010 & A & 31.39 & 31.3 & 31.13 & 31.63 & 3815500 \\\\\n",
"\t2 & 04-01-2010 & AAP & 40.7 & 40.38 & 40.36 & 41.04 & 1701700 \\\\\n",
"\t3 & 04-01-2010 & AAPL & 213.43 & 214.01 & 212.38 & 214.5 & 123432400 \\\\\n",
"\t4 & 04-01-2010 & ABC & 26.29 & 26.63 & 26.14 & 26.69 & 2455900 \\\\\n",
"\t5 & 04-01-2010 & ABT & 54.19 & 54.46 & 53.92 & 54.56 & 10829000 \\\\\n",
"\\end{tabular}\n"
],
"text/plain": [
"5×7 DataFrame. Omitted printing of 1 columns\n",
"│ Row │ date │ symbol │ open │ close │ low │ high │\n",
"│ │ \u001b[90mString\u001b[39m │ \u001b[90mString\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │\n",
"├─────┼────────────┼────────┼─────────┼─────────┼─────────┼─────────┤\n",
"│ 1 │ 04-01-2010 │ A │ 31.39 │ 31.3 │ 31.13 │ 31.63 │\n",
"│ 2 │ 04-01-2010 │ AAP │ 40.7 │ 40.38 │ 40.36 │ 41.04 │\n",
"│ 3 │ 04-01-2010 │ AAPL │ 213.43 │ 214.01 │ 212.38 │ 214.5 │\n",
"│ 4 │ 04-01-2010 │ ABC │ 26.29 │ 26.63 │ 26.14 │ 26.69 │\n",
"│ 5 │ 04-01-2010 │ ABT │ 54.19 │ 54.46 │ 53.92 │ 54.56 │"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Presentar las primeras 5 filas\n",
"first(df,5)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
" | date | symbol | open | close | low | high | volume |
---|
| String | String | Float64 | Float64 | Float64 | Float64 | Int64 |
---|
5 rows × 7 columns
1 | 06-01-2010 | BMY | 25.17 | 25.22 | 25.07 | 25.29 | 15528900 |
---|
2 | 06-01-2010 | BSX | 9.07 | 9.16 | 8.99 | 9.28 | 12923000 |
---|
3 | 06-01-2010 | BWA | 35.39 | 36.69 | 35.3 | 36.78 | 4171000 |
---|
4 | 06-01-2010 | BXP | 68.23 | 68.44 | 68.03 | 68.94 | 1814900 |
---|
5 | 06-01-2010 | C | 3.56 | 3.64 | 3.51 | 3.68 | 67433800 |
---|
"
],
"text/latex": [
"\\begin{tabular}{r|ccccccc}\n",
"\t& date & symbol & open & close & low & high & volume\\\\\n",
"\t\\hline\n",
"\t& String & String & Float64 & Float64 & Float64 & Float64 & Int64\\\\\n",
"\t\\hline\n",
"\t1 & 06-01-2010 & BMY & 25.17 & 25.22 & 25.07 & 25.29 & 15528900 \\\\\n",
"\t2 & 06-01-2010 & BSX & 9.07 & 9.16 & 8.99 & 9.28 & 12923000 \\\\\n",
"\t3 & 06-01-2010 & BWA & 35.39 & 36.69 & 35.3 & 36.78 & 4171000 \\\\\n",
"\t4 & 06-01-2010 & BXP & 68.23 & 68.44 & 68.03 & 68.94 & 1814900 \\\\\n",
"\t5 & 06-01-2010 & C & 3.56 & 3.64 & 3.51 & 3.68 & 67433800 \\\\\n",
"\\end{tabular}\n"
],
"text/plain": [
"5×7 DataFrame\n",
"│ Row │ date │ symbol │ open │ close │ low │ high │ volume │\n",
"│ │ \u001b[90mString\u001b[39m │ \u001b[90mString\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mInt64\u001b[39m │\n",
"├─────┼────────────┼────────┼─────────┼─────────┼─────────┼─────────┼──────────┤\n",
"│ 1 │ 06-01-2010 │ BMY │ 25.17 │ 25.22 │ 25.07 │ 25.29 │ 15528900 │\n",
"│ 2 │ 06-01-2010 │ BSX │ 9.07 │ 9.16 │ 8.99 │ 9.28 │ 12923000 │\n",
"│ 3 │ 06-01-2010 │ BWA │ 35.39 │ 36.69 │ 35.3 │ 36.78 │ 4171000 │\n",
"│ 4 │ 06-01-2010 │ BXP │ 68.23 │ 68.44 │ 68.03 │ 68.94 │ 1814900 │\n",
"│ 5 │ 06-01-2010 │ C │ 3.56 │ 3.64 │ 3.51 │ 3.68 │ 67433800 │"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Las últimas 5 filas\n",
"last(df,5)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
" | variable | mean | min | median | max | nunique | nmissing | eltype |
---|
| Symbol | Union… | Any | Union… | Any | Union… | Nothing | DataType |
---|
7 rows × 8 columns
1 | date | | 04-01-2010 | | 06-01-2010 | 5 | | String |
---|
2 | symbol | | A | | ZION | 467 | | String |
---|
3 | open | 46.9074 | 1.53 | 37.07 | 627.181 | | | Float64 |
---|
4 | close | 47.0407 | 1.61 | 37.25 | 626.751 | | | Float64 |
---|
5 | low | 46.4453 | 1.51 | 36.74 | 624.241 | | | Float64 |
---|
6 | high | 47.4197 | 1.61 | 37.76 | 629.511 | | | Float64 |
---|
7 | volume | 7.01361e6 | 10000 | 3.0912e6 | 215620200 | | | Int64 |
---|
"
],
"text/latex": [
"\\begin{tabular}{r|cccccccc}\n",
"\t& variable & mean & min & median & max & nunique & nmissing & eltype\\\\\n",
"\t\\hline\n",
"\t& Symbol & Union… & Any & Union… & Any & Union… & Nothing & DataType\\\\\n",
"\t\\hline\n",
"\t1 & date & & 04-01-2010 & & 06-01-2010 & 5 & & String \\\\\n",
"\t2 & symbol & & A & & ZION & 467 & & String \\\\\n",
"\t3 & open & 46.9074 & 1.53 & 37.07 & 627.181 & & & Float64 \\\\\n",
"\t4 & close & 47.0407 & 1.61 & 37.25 & 626.751 & & & Float64 \\\\\n",
"\t5 & low & 46.4453 & 1.51 & 36.74 & 624.241 & & & Float64 \\\\\n",
"\t6 & high & 47.4197 & 1.61 & 37.76 & 629.511 & & & Float64 \\\\\n",
"\t7 & volume & 7.01361e6 & 10000 & 3.0912e6 & 215620200 & & & Int64 \\\\\n",
"\\end{tabular}\n"
],
"text/plain": [
"7×8 DataFrame. Omitted printing of 2 columns\n",
"│ Row │ variable │ mean │ min │ median │ max │ nunique │\n",
"│ │ \u001b[90mSymbol\u001b[39m │ \u001b[90mUnion…\u001b[39m │ \u001b[90mAny\u001b[39m │ \u001b[90mUnion…\u001b[39m │ \u001b[90mAny\u001b[39m │ \u001b[90mUnion…\u001b[39m │\n",
"├─────┼──────────┼───────────┼────────────┼──────────┼────────────┼─────────┤\n",
"│ 1 │ date │ │ 04-01-2010 │ │ 06-01-2010 │ 5 │\n",
"│ 2 │ symbol │ │ A │ │ ZION │ 467 │\n",
"│ 3 │ open │ 46.9074 │ 1.53 │ 37.07 │ 627.181 │ │\n",
"│ 4 │ close │ 47.0407 │ 1.61 │ 37.25 │ 626.751 │ │\n",
"│ 5 │ low │ 46.4453 │ 1.51 │ 36.74 │ 624.241 │ │\n",
"│ 6 │ high │ 47.4197 │ 1.61 │ 37.76 │ 629.511 │ │\n",
"│ 7 │ volume │ 7.01361e6 │ 10000 │ 3.0912e6 │ 215620200 │ │"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Algunos Indicadores Estadísticos\n",
"describe(df)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"GroupedDataFrame with 467 groups based on key: symbol
First Group (3 rows): symbol = \"A\"
| date | symbol | open | close | low | high | volume |
---|
| String | String | Float64 | Float64 | Float64 | Float64 | Int64 |
---|
1 | 04-01-2010 | A | 31.39 | 31.3 | 31.13 | 31.63 | 3815500 |
---|
2 | 05-01-2010 | A | 31.21 | 30.96 | 30.76 | 31.22 | 4186000 |
---|
3 | 06-01-2010 | A | 30.85 | 30.85 | 30.76 | 31.0 | 3243700 |
---|
⋮
Last Group (1 row): symbol = \"CHTR\"
| date | symbol | open | close | low | high | volume |
---|
| String | String | Float64 | Float64 | Float64 | Float64 | Int64 |
---|
1 | 05-01-2010 | CHTR | 35.0 | 35.0 | 35.0 | 35.0 | 10000 |
---|
"
],
"text/latex": [
"GroupedDataFrame with 467 groups based on key: symbol\n",
"\n",
"First Group (3 rows): symbol = \"A\"\n",
"\n",
"\\begin{tabular}{r|ccccccc}\n",
"\t& date & symbol & open & close & low & high & volume\\\\\n",
"\t\\hline\n",
"\t& String & String & Float64 & Float64 & Float64 & Float64 & Int64\\\\\n",
"\t\\hline\n",
"\t1 & 04-01-2010 & A & 31.39 & 31.3 & 31.13 & 31.63 & 3815500 \\\\\n",
"\t2 & 05-01-2010 & A & 31.21 & 30.96 & 30.76 & 31.22 & 4186000 \\\\\n",
"\t3 & 06-01-2010 & A & 30.85 & 30.85 & 30.76 & 31.0 & 3243700 \\\\\n",
"\\end{tabular}\n",
"\n",
"$\\dots$\n",
"\n",
"Last Group (1 row): symbol = \"CHTR\"\n",
"\n",
"\\begin{tabular}{r|ccccccc}\n",
"\t& date & symbol & open & close & low & high & volume\\\\\n",
"\t\\hline\n",
"\t& String & String & Float64 & Float64 & Float64 & Float64 & Int64\\\\\n",
"\t\\hline\n",
"\t1 & 05-01-2010 & CHTR & 35.0 & 35.0 & 35.0 & 35.0 & 10000 \\\\\n",
"\\end{tabular}\n"
],
"text/plain": [
"GroupedDataFrame with 467 groups based on key: symbol\n",
"First Group (3 rows): symbol = \"A\"\n",
"│ Row │ date │ symbol │ open │ close │ low │ high │ volume │\n",
"│ │ \u001b[90mString\u001b[39m │ \u001b[90mString\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mInt64\u001b[39m │\n",
"├─────┼────────────┼────────┼─────────┼─────────┼─────────┼─────────┼─────────┤\n",
"│ 1 │ 04-01-2010 │ A │ 31.39 │ 31.3 │ 31.13 │ 31.63 │ 3815500 │\n",
"│ 2 │ 05-01-2010 │ A │ 31.21 │ 30.96 │ 30.76 │ 31.22 │ 4186000 │\n",
"│ 3 │ 06-01-2010 │ A │ 30.85 │ 30.85 │ 30.76 │ 31.0 │ 3243700 │\n",
"⋮\n",
"Last Group (1 row): symbol = \"CHTR\"\n",
"│ Row │ date │ symbol │ open │ close │ low │ high │ volume │\n",
"│ │ \u001b[90mString\u001b[39m │ \u001b[90mString\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mInt64\u001b[39m │\n",
"├─────┼────────────┼────────┼─────────┼─────────┼─────────┼─────────┼────────┤\n",
"│ 1 │ 05-01-2010 │ CHTR │ 35.0 │ 35.0 │ 35.0 │ 35.0 │ 10000 │"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Separar por grupos\n",
"agrupar = groupby(df, :symbol)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"SubDataFrame{DataFrame,DataFrames.Index,Array{Int64,1}}\n"
]
},
{
"data": {
"text/html": [
" | date | symbol | open | close | low | high | volume |
---|
| String | String | Float64 | Float64 | Float64 | Float64 | Int64 |
---|
3 rows × 7 columns
1 | 04-01-2010 | BMY | 25.41 | 25.63 | 25.3 | 25.7 | 14376100 |
---|
2 | 05-01-2010 | BMY | 25.51 | 25.23 | 25.01 | 25.55 | 16973600 |
---|
3 | 06-01-2010 | BMY | 25.17 | 25.22 | 25.07 | 25.29 | 15528900 |
---|
"
],
"text/latex": [
"\\begin{tabular}{r|ccccccc}\n",
"\t& date & symbol & open & close & low & high & volume\\\\\n",
"\t\\hline\n",
"\t& String & String & Float64 & Float64 & Float64 & Float64 & Int64\\\\\n",
"\t\\hline\n",
"\t1 & 04-01-2010 & BMY & 25.41 & 25.63 & 25.3 & 25.7 & 14376100 \\\\\n",
"\t2 & 05-01-2010 & BMY & 25.51 & 25.23 & 25.01 & 25.55 & 16973600 \\\\\n",
"\t3 & 06-01-2010 & BMY & 25.17 & 25.22 & 25.07 & 25.29 & 15528900 \\\\\n",
"\\end{tabular}\n"
],
"text/plain": [
"3×7 SubDataFrame\n",
"│ Row │ date │ symbol │ open │ close │ low │ high │ volume │\n",
"│ │ \u001b[90mString\u001b[39m │ \u001b[90mString\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mInt64\u001b[39m │\n",
"├─────┼────────────┼────────┼─────────┼─────────┼─────────┼─────────┼──────────┤\n",
"│ 1 │ 04-01-2010 │ BMY │ 25.41 │ 25.63 │ 25.3 │ 25.7 │ 14376100 │\n",
"│ 2 │ 05-01-2010 │ BMY │ 25.51 │ 25.23 │ 25.01 │ 25.55 │ 16973600 │\n",
"│ 3 │ 06-01-2010 │ BMY │ 25.17 │ 25.22 │ 25.07 │ 25.29 │ 15528900 │"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Obtener un grupo\n",
"losBXP = get(agrupar, (symbol=:\"BMY\",), nothing)\n",
"println(typeof(losBXP))\n",
"losBXP\n"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"SubDataFrame{DataFrame,DataFrames.Index,Array{Int64,1}}\n"
]
},
{
"data": {
"text/html": [
" | date | symbol | open | close | low | high | volume |
---|
| String | String | Float64 | Float64 | Float64 | Float64 | Int64 |
---|
3 rows × 7 columns
1 | 04-01-2010 | BXP | 67.59 | 67.1 | 66.53 | 68.33 | 1511500 |
---|
2 | 05-01-2010 | BXP | 67.24 | 68.12 | 66.45 | 68.2 | 2173700 |
---|
3 | 06-01-2010 | BXP | 68.23 | 68.44 | 68.03 | 68.94 | 1814900 |
---|
"
],
"text/latex": [
"\\begin{tabular}{r|ccccccc}\n",
"\t& date & symbol & open & close & low & high & volume\\\\\n",
"\t\\hline\n",
"\t& String & String & Float64 & Float64 & Float64 & Float64 & Int64\\\\\n",
"\t\\hline\n",
"\t1 & 04-01-2010 & BXP & 67.59 & 67.1 & 66.53 & 68.33 & 1511500 \\\\\n",
"\t2 & 05-01-2010 & BXP & 67.24 & 68.12 & 66.45 & 68.2 & 2173700 \\\\\n",
"\t3 & 06-01-2010 & BXP & 68.23 & 68.44 & 68.03 & 68.94 & 1814900 \\\\\n",
"\\end{tabular}\n"
],
"text/plain": [
"3×7 SubDataFrame\n",
"│ Row │ date │ symbol │ open │ close │ low │ high │ volume │\n",
"│ │ \u001b[90mString\u001b[39m │ \u001b[90mString\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mInt64\u001b[39m │\n",
"├─────┼────────────┼────────┼─────────┼─────────┼─────────┼─────────┼─────────┤\n",
"│ 1 │ 04-01-2010 │ BXP │ 67.59 │ 67.1 │ 66.53 │ 68.33 │ 1511500 │\n",
"│ 2 │ 05-01-2010 │ BXP │ 67.24 │ 68.12 │ 66.45 │ 68.2 │ 2173700 │\n",
"│ 3 │ 06-01-2010 │ BXP │ 68.23 │ 68.44 │ 68.03 │ 68.94 │ 1814900 │"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Obtener un grupo\n",
"losBXP = get(agrupar, (symbol=:\"BXP\",), nothing)\n",
"println(typeof(losBXP))\n",
"losBXP"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"SubDataFrame{DataFrame,DataFrames.Index,Array{Int64,1}}\n"
]
},
{
"data": {
"text/html": [
" | date | symbol | open | close | low | high | volume |
---|
| String | String | Float64 | Float64 | Float64 | Float64 | Int64 |
---|
3 rows × 7 columns
1 | 04-01-2010 | BMY | 25.41 | 25.63 | 25.3 | 25.7 | 14376100 |
---|
2 | 05-01-2010 | BMY | 25.51 | 25.23 | 25.01 | 25.55 | 16973600 |
---|
3 | 06-01-2010 | BMY | 25.17 | 25.22 | 25.07 | 25.29 | 15528900 |
---|
"
],
"text/latex": [
"\\begin{tabular}{r|ccccccc}\n",
"\t& date & symbol & open & close & low & high & volume\\\\\n",
"\t\\hline\n",
"\t& String & String & Float64 & Float64 & Float64 & Float64 & Int64\\\\\n",
"\t\\hline\n",
"\t1 & 04-01-2010 & BMY & 25.41 & 25.63 & 25.3 & 25.7 & 14376100 \\\\\n",
"\t2 & 05-01-2010 & BMY & 25.51 & 25.23 & 25.01 & 25.55 & 16973600 \\\\\n",
"\t3 & 06-01-2010 & BMY & 25.17 & 25.22 & 25.07 & 25.29 & 15528900 \\\\\n",
"\\end{tabular}\n"
],
"text/plain": [
"3×7 SubDataFrame\n",
"│ Row │ date │ symbol │ open │ close │ low │ high │ volume │\n",
"│ │ \u001b[90mString\u001b[39m │ \u001b[90mString\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mInt64\u001b[39m │\n",
"├─────┼────────────┼────────┼─────────┼─────────┼─────────┼─────────┼──────────┤\n",
"│ 1 │ 04-01-2010 │ BMY │ 25.41 │ 25.63 │ 25.3 │ 25.7 │ 14376100 │\n",
"│ 2 │ 05-01-2010 │ BMY │ 25.51 │ 25.23 │ 25.01 │ 25.55 │ 16973600 │\n",
"│ 3 │ 06-01-2010 │ BMY │ 25.17 │ 25.22 │ 25.07 │ 25.29 │ 15528900 │"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"losBMY = agrupar[(symbol= \"BMY\",)]\n",
"println(typeof(losBMY))\n",
"losBMY"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Relación entre los precios de apertura vs los de cierre"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"