{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Regresión Lineal\n", "\n", "Ecuación de Regresión:\n", "\\begin{equation}\n", "Y_i = \\beta_0 + \\beta_1 X_i + \\epsilon_i\n", "\\end{equation}\n", "\n", "\n", "Ecuación de la Pendiente:\n", "\\begin{equation}\n", "\\hat{\\beta}_1 = \\frac{(X_i - \\bar{X})} {(Y_i - \\bar{Y})}\n", "\\end{equation}\n", "\n", "Este ejercicio se a adaptado de \"Linear Regression in Julia\" por Silaparasetty, V.\n", "\n", "[Descargar una muestra de los precios de acciones New York Stock Exchange](https://raw.githubusercontent.com/fernanvilla/data/main/nystocks.csv)\n", "\n", "[El conjunto completo de datos de precios](https://www.kaggle.com/dgawlik/nyse)\n", " \n", "[Otro Ejemplo Recomendado de Regresión Lineal](https://www.machinelearningplus.com/linear-regression-in-julia/)\n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32m\u001b[1m Resolving\u001b[22m\u001b[39m package versions...\n", "\u001b[32m\u001b[1mNo Changes\u001b[22m\u001b[39m to `C:\\Users\\Fernan\\.julia\\environments\\v1.5\\Project.toml`\n", "\u001b[32m\u001b[1mNo Changes\u001b[22m\u001b[39m to `C:\\Users\\Fernan\\.julia\\environments\\v1.5\\Manifest.toml`\n", "\u001b[32m\u001b[1m Resolving\u001b[22m\u001b[39m package versions...\n", "\u001b[32m\u001b[1mNo Changes\u001b[22m\u001b[39m to `C:\\Users\\Fernan\\.julia\\environments\\v1.5\\Project.toml`\n", "\u001b[32m\u001b[1mNo Changes\u001b[22m\u001b[39m to `C:\\Users\\Fernan\\.julia\\environments\\v1.5\\Manifest.toml`\n", "\u001b[32m\u001b[1m Resolving\u001b[22m\u001b[39m package versions...\n", "\u001b[32m\u001b[1mNo Changes\u001b[22m\u001b[39m to `C:\\Users\\Fernan\\.julia\\environments\\v1.5\\Project.toml`\n", "\u001b[32m\u001b[1mNo Changes\u001b[22m\u001b[39m to `C:\\Users\\Fernan\\.julia\\environments\\v1.5\\Manifest.toml`\n", "\u001b[32m\u001b[1m Resolving\u001b[22m\u001b[39m package versions...\n", "\u001b[32m\u001b[1mNo Changes\u001b[22m\u001b[39m to `C:\\Users\\Fernan\\.julia\\environments\\v1.5\\Project.toml`\n", "\u001b[32m\u001b[1mNo Changes\u001b[22m\u001b[39m to `C:\\Users\\Fernan\\.julia\\environments\\v1.5\\Manifest.toml`\n", "\u001b[32m\u001b[1m Resolving\u001b[22m\u001b[39m package versions...\n", "\u001b[32m\u001b[1mNo Changes\u001b[22m\u001b[39m to `C:\\Users\\Fernan\\.julia\\environments\\v1.5\\Project.toml`\n", "\u001b[32m\u001b[1mNo Changes\u001b[22m\u001b[39m to `C:\\Users\\Fernan\\.julia\\environments\\v1.5\\Manifest.toml`\n", "\u001b[32m\u001b[1m Resolving\u001b[22m\u001b[39m package versions...\n", "\u001b[32m\u001b[1mNo Changes\u001b[22m\u001b[39m to `C:\\Users\\Fernan\\.julia\\environments\\v1.5\\Project.toml`\n", "\u001b[32m\u001b[1mNo Changes\u001b[22m\u001b[39m to `C:\\Users\\Fernan\\.julia\\environments\\v1.5\\Manifest.toml`\n", "\u001b[32m\u001b[1m Resolving\u001b[22m\u001b[39m package versions...\n", "\u001b[32m\u001b[1mNo Changes\u001b[22m\u001b[39m to `C:\\Users\\Fernan\\.julia\\environments\\v1.5\\Project.toml`\n", "\u001b[32m\u001b[1mNo Changes\u001b[22m\u001b[39m to `C:\\Users\\Fernan\\.julia\\environments\\v1.5\\Manifest.toml`\n", "\u001b[32m\u001b[1m Resolving\u001b[22m\u001b[39m package versions...\n", "\u001b[32m\u001b[1mNo Changes\u001b[22m\u001b[39m to `C:\\Users\\Fernan\\.julia\\environments\\v1.5\\Project.toml`\n", "\u001b[32m\u001b[1mNo Changes\u001b[22m\u001b[39m to `C:\\Users\\Fernan\\.julia\\environments\\v1.5\\Manifest.toml`\n", "\u001b[32m\u001b[1m Resolving\u001b[22m\u001b[39m package versions...\n", "\u001b[32m\u001b[1mNo Changes\u001b[22m\u001b[39m to `C:\\Users\\Fernan\\.julia\\environments\\v1.5\\Project.toml`\n", "\u001b[32m\u001b[1mNo Changes\u001b[22m\u001b[39m to `C:\\Users\\Fernan\\.julia\\environments\\v1.5\\Manifest.toml`\n", "\u001b[32m\u001b[1m Resolving\u001b[22m\u001b[39m package versions...\n", "\u001b[32m\u001b[1mNo Changes\u001b[22m\u001b[39m to `C:\\Users\\Fernan\\.julia\\environments\\v1.5\\Project.toml`\n", "\u001b[32m\u001b[1mNo Changes\u001b[22m\u001b[39m to `C:\\Users\\Fernan\\.julia\\environments\\v1.5\\Manifest.toml`\n", "\u001b[32m\u001b[1m Resolving\u001b[22m\u001b[39m package versions...\n", "\u001b[32m\u001b[1mNo Changes\u001b[22m\u001b[39m to `C:\\Users\\Fernan\\.julia\\environments\\v1.5\\Project.toml`\n", "\u001b[32m\u001b[1mNo Changes\u001b[22m\u001b[39m to `C:\\Users\\Fernan\\.julia\\environments\\v1.5\\Manifest.toml`\n", "\u001b[32m\u001b[1m Building\u001b[22m\u001b[39m CodecZlib → `C:\\Users\\Fernan\\.julia\\packages\\CodecZlib\\5t9zO\\deps\\build.log`\n" ] } ], "source": [ "# Import Packages\n", "using Pkg # Package to install new packages\n", "\n", "# Install packages \n", "Pkg.add(\"DataFrames\")\n", "Pkg.add(\"CSV\")\n", "Pkg.add(\"CSVFiles\")\n", "Pkg.add(\"Plots\")\n", "Pkg.add(\"Lathe\")\n", "Pkg.add(\"GLM\")\n", "Pkg.add(\"StatsPlots\")\n", "Pkg.add(\"MLBase\")\n", "Pkg.add(\"Missings\")\n", "Pkg.add(\"Statistics\")\n", "Pkg.add(\"Plots\")\n", "Pkg.build(\"CodecZlib\")" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Cargar los paquetes instalados\n", "using DataFrames\n", "using CSV\n", "using CSVFiles\n", "using Plots\n", "using Lathe\n", "using GLM\n", "using Statistics\n", "using StatsPlots\n", "using MLBase\n", "using Missings" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "5×7 DataFrame\n", "│ Row │ date │ symbol │ open │ close │ low │ high │ volume │\n", "│ │ \u001b[90mString\u001b[39m │ \u001b[90mString\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mInt64\u001b[39m │\n", "├─────┼────────────┼────────┼─────────┼─────────┼─────────┼─────────┼───────────┤\n", "│ 1 │ 04-01-2010 │ A │ 31.39 │ 31.3 │ 31.13 │ 31.63 │ 3815500 │\n", "│ 2 │ 04-01-2010 │ AAP │ 40.7 │ 40.38 │ 40.36 │ 41.04 │ 1701700 │\n", "│ 3 │ 04-01-2010 │ AAPL │ 213.43 │ 214.01 │ 212.38 │ 214.5 │ 123432400 │\n", "│ 4 │ 04-01-2010 │ ABC │ 26.29 │ 26.63 │ 26.14 │ 26.69 │ 2455900 │\n", "│ 5 │ 04-01-2010 │ ABT │ 54.19 │ 54.46 │ 53.92 │ 54.56 │ 10829000 │\n" ] } ], "source": [ "# Carga el archivo CSV en un DataFrame\n", "# para más detalles consultar -> https://juliapackages.com/p/csvfiles\n", "\n", "using CSVFiles, DataFrames\n", "\n", "df = DataFrame(load(\"./Downloads/nystocks.csv\"))\n", "\n", "println(first(df,5))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exploración de los Datos" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "7-element Array{String,1}:\n", " \"date\"\n", " \"symbol\"\n", " \"open\"\n", " \"close\"\n", " \"low\"\n", " \"high\"\n", " \"volume\"" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Variables Disponibles\n", "names(df)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "

5 rows × 7 columns

datesymbolopencloselowhighvolume
StringStringFloat64Float64Float64Float64Int64
104-01-2010A31.3931.331.1331.633815500
204-01-2010AAP40.740.3840.3641.041701700
304-01-2010AAPL213.43214.01212.38214.5123432400
404-01-2010ABC26.2926.6326.1426.692455900
504-01-2010ABT54.1954.4653.9254.5610829000
" ], "text/latex": [ "\\begin{tabular}{r|ccccccc}\n", "\t& date & symbol & open & close & low & high & volume\\\\\n", "\t\\hline\n", "\t& String & String & Float64 & Float64 & Float64 & Float64 & Int64\\\\\n", "\t\\hline\n", "\t1 & 04-01-2010 & A & 31.39 & 31.3 & 31.13 & 31.63 & 3815500 \\\\\n", "\t2 & 04-01-2010 & AAP & 40.7 & 40.38 & 40.36 & 41.04 & 1701700 \\\\\n", "\t3 & 04-01-2010 & AAPL & 213.43 & 214.01 & 212.38 & 214.5 & 123432400 \\\\\n", "\t4 & 04-01-2010 & ABC & 26.29 & 26.63 & 26.14 & 26.69 & 2455900 \\\\\n", "\t5 & 04-01-2010 & ABT & 54.19 & 54.46 & 53.92 & 54.56 & 10829000 \\\\\n", "\\end{tabular}\n" ], "text/plain": [ "5×7 DataFrame. Omitted printing of 1 columns\n", "│ Row │ date │ symbol │ open │ close │ low │ high │\n", "│ │ \u001b[90mString\u001b[39m │ \u001b[90mString\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │\n", "├─────┼────────────┼────────┼─────────┼─────────┼─────────┼─────────┤\n", "│ 1 │ 04-01-2010 │ A │ 31.39 │ 31.3 │ 31.13 │ 31.63 │\n", "│ 2 │ 04-01-2010 │ AAP │ 40.7 │ 40.38 │ 40.36 │ 41.04 │\n", "│ 3 │ 04-01-2010 │ AAPL │ 213.43 │ 214.01 │ 212.38 │ 214.5 │\n", "│ 4 │ 04-01-2010 │ ABC │ 26.29 │ 26.63 │ 26.14 │ 26.69 │\n", "│ 5 │ 04-01-2010 │ ABT │ 54.19 │ 54.46 │ 53.92 │ 54.56 │" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Presentar las primeras 5 filas\n", "first(df,5)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "

5 rows × 7 columns

datesymbolopencloselowhighvolume
StringStringFloat64Float64Float64Float64Int64
106-01-2010BMY25.1725.2225.0725.2915528900
206-01-2010BSX9.079.168.999.2812923000
306-01-2010BWA35.3936.6935.336.784171000
406-01-2010BXP68.2368.4468.0368.941814900
506-01-2010C3.563.643.513.6867433800
" ], "text/latex": [ "\\begin{tabular}{r|ccccccc}\n", "\t& date & symbol & open & close & low & high & volume\\\\\n", "\t\\hline\n", "\t& String & String & Float64 & Float64 & Float64 & Float64 & Int64\\\\\n", "\t\\hline\n", "\t1 & 06-01-2010 & BMY & 25.17 & 25.22 & 25.07 & 25.29 & 15528900 \\\\\n", "\t2 & 06-01-2010 & BSX & 9.07 & 9.16 & 8.99 & 9.28 & 12923000 \\\\\n", "\t3 & 06-01-2010 & BWA & 35.39 & 36.69 & 35.3 & 36.78 & 4171000 \\\\\n", "\t4 & 06-01-2010 & BXP & 68.23 & 68.44 & 68.03 & 68.94 & 1814900 \\\\\n", "\t5 & 06-01-2010 & C & 3.56 & 3.64 & 3.51 & 3.68 & 67433800 \\\\\n", "\\end{tabular}\n" ], "text/plain": [ "5×7 DataFrame\n", "│ Row │ date │ symbol │ open │ close │ low │ high │ volume │\n", "│ │ \u001b[90mString\u001b[39m │ \u001b[90mString\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mInt64\u001b[39m │\n", "├─────┼────────────┼────────┼─────────┼─────────┼─────────┼─────────┼──────────┤\n", "│ 1 │ 06-01-2010 │ BMY │ 25.17 │ 25.22 │ 25.07 │ 25.29 │ 15528900 │\n", "│ 2 │ 06-01-2010 │ BSX │ 9.07 │ 9.16 │ 8.99 │ 9.28 │ 12923000 │\n", "│ 3 │ 06-01-2010 │ BWA │ 35.39 │ 36.69 │ 35.3 │ 36.78 │ 4171000 │\n", "│ 4 │ 06-01-2010 │ BXP │ 68.23 │ 68.44 │ 68.03 │ 68.94 │ 1814900 │\n", "│ 5 │ 06-01-2010 │ C │ 3.56 │ 3.64 │ 3.51 │ 3.68 │ 67433800 │" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Las últimas 5 filas\n", "last(df,5)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "

7 rows × 8 columns

variablemeanminmedianmaxnuniquenmissingeltype
SymbolUnion…AnyUnion…AnyUnion…NothingDataType
1date04-01-201006-01-20105String
2symbolAZION467String
3open46.90741.5337.07627.181Float64
4close47.04071.6137.25626.751Float64
5low46.44531.5136.74624.241Float64
6high47.41971.6137.76629.511Float64
7volume7.01361e6100003.0912e6215620200Int64
" ], "text/latex": [ "\\begin{tabular}{r|cccccccc}\n", "\t& variable & mean & min & median & max & nunique & nmissing & eltype\\\\\n", "\t\\hline\n", "\t& Symbol & Union… & Any & Union… & Any & Union… & Nothing & DataType\\\\\n", "\t\\hline\n", "\t1 & date & & 04-01-2010 & & 06-01-2010 & 5 & & String \\\\\n", "\t2 & symbol & & A & & ZION & 467 & & String \\\\\n", "\t3 & open & 46.9074 & 1.53 & 37.07 & 627.181 & & & Float64 \\\\\n", "\t4 & close & 47.0407 & 1.61 & 37.25 & 626.751 & & & Float64 \\\\\n", "\t5 & low & 46.4453 & 1.51 & 36.74 & 624.241 & & & Float64 \\\\\n", "\t6 & high & 47.4197 & 1.61 & 37.76 & 629.511 & & & Float64 \\\\\n", "\t7 & volume & 7.01361e6 & 10000 & 3.0912e6 & 215620200 & & & Int64 \\\\\n", "\\end{tabular}\n" ], "text/plain": [ "7×8 DataFrame. Omitted printing of 2 columns\n", "│ Row │ variable │ mean │ min │ median │ max │ nunique │\n", "│ │ \u001b[90mSymbol\u001b[39m │ \u001b[90mUnion…\u001b[39m │ \u001b[90mAny\u001b[39m │ \u001b[90mUnion…\u001b[39m │ \u001b[90mAny\u001b[39m │ \u001b[90mUnion…\u001b[39m │\n", "├─────┼──────────┼───────────┼────────────┼──────────┼────────────┼─────────┤\n", "│ 1 │ date │ │ 04-01-2010 │ │ 06-01-2010 │ 5 │\n", "│ 2 │ symbol │ │ A │ │ ZION │ 467 │\n", "│ 3 │ open │ 46.9074 │ 1.53 │ 37.07 │ 627.181 │ │\n", "│ 4 │ close │ 47.0407 │ 1.61 │ 37.25 │ 626.751 │ │\n", "│ 5 │ low │ 46.4453 │ 1.51 │ 36.74 │ 624.241 │ │\n", "│ 6 │ high │ 47.4197 │ 1.61 │ 37.76 │ 629.511 │ │\n", "│ 7 │ volume │ 7.01361e6 │ 10000 │ 3.0912e6 │ 215620200 │ │" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Algunos Indicadores Estadísticos\n", "describe(df)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "

GroupedDataFrame with 467 groups based on key: symbol

First Group (3 rows): symbol = \"A\"

datesymbolopencloselowhighvolume
StringStringFloat64Float64Float64Float64Int64
104-01-2010A31.3931.331.1331.633815500
205-01-2010A31.2130.9630.7631.224186000
306-01-2010A30.8530.8530.7631.03243700

Last Group (1 row): symbol = \"CHTR\"

datesymbolopencloselowhighvolume
StringStringFloat64Float64Float64Float64Int64
105-01-2010CHTR35.035.035.035.010000
" ], "text/latex": [ "GroupedDataFrame with 467 groups based on key: symbol\n", "\n", "First Group (3 rows): symbol = \"A\"\n", "\n", "\\begin{tabular}{r|ccccccc}\n", "\t& date & symbol & open & close & low & high & volume\\\\\n", "\t\\hline\n", "\t& String & String & Float64 & Float64 & Float64 & Float64 & Int64\\\\\n", "\t\\hline\n", "\t1 & 04-01-2010 & A & 31.39 & 31.3 & 31.13 & 31.63 & 3815500 \\\\\n", "\t2 & 05-01-2010 & A & 31.21 & 30.96 & 30.76 & 31.22 & 4186000 \\\\\n", "\t3 & 06-01-2010 & A & 30.85 & 30.85 & 30.76 & 31.0 & 3243700 \\\\\n", "\\end{tabular}\n", "\n", "$\\dots$\n", "\n", "Last Group (1 row): symbol = \"CHTR\"\n", "\n", "\\begin{tabular}{r|ccccccc}\n", "\t& date & symbol & open & close & low & high & volume\\\\\n", "\t\\hline\n", "\t& String & String & Float64 & Float64 & Float64 & Float64 & Int64\\\\\n", "\t\\hline\n", "\t1 & 05-01-2010 & CHTR & 35.0 & 35.0 & 35.0 & 35.0 & 10000 \\\\\n", "\\end{tabular}\n" ], "text/plain": [ "GroupedDataFrame with 467 groups based on key: symbol\n", "First Group (3 rows): symbol = \"A\"\n", "│ Row │ date │ symbol │ open │ close │ low │ high │ volume │\n", "│ │ \u001b[90mString\u001b[39m │ \u001b[90mString\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mInt64\u001b[39m │\n", "├─────┼────────────┼────────┼─────────┼─────────┼─────────┼─────────┼─────────┤\n", "│ 1 │ 04-01-2010 │ A │ 31.39 │ 31.3 │ 31.13 │ 31.63 │ 3815500 │\n", "│ 2 │ 05-01-2010 │ A │ 31.21 │ 30.96 │ 30.76 │ 31.22 │ 4186000 │\n", "│ 3 │ 06-01-2010 │ A │ 30.85 │ 30.85 │ 30.76 │ 31.0 │ 3243700 │\n", "⋮\n", "Last Group (1 row): symbol = \"CHTR\"\n", "│ Row │ date │ symbol │ open │ close │ low │ high │ volume │\n", "│ │ \u001b[90mString\u001b[39m │ \u001b[90mString\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mInt64\u001b[39m │\n", "├─────┼────────────┼────────┼─────────┼─────────┼─────────┼─────────┼────────┤\n", "│ 1 │ 05-01-2010 │ CHTR │ 35.0 │ 35.0 │ 35.0 │ 35.0 │ 10000 │" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Separar por grupos\n", "agrupar = groupby(df, :symbol)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "SubDataFrame{DataFrame,DataFrames.Index,Array{Int64,1}}\n" ] }, { "data": { "text/html": [ "

3 rows × 7 columns

datesymbolopencloselowhighvolume
StringStringFloat64Float64Float64Float64Int64
104-01-2010BMY25.4125.6325.325.714376100
205-01-2010BMY25.5125.2325.0125.5516973600
306-01-2010BMY25.1725.2225.0725.2915528900
" ], "text/latex": [ "\\begin{tabular}{r|ccccccc}\n", "\t& date & symbol & open & close & low & high & volume\\\\\n", "\t\\hline\n", "\t& String & String & Float64 & Float64 & Float64 & Float64 & Int64\\\\\n", "\t\\hline\n", "\t1 & 04-01-2010 & BMY & 25.41 & 25.63 & 25.3 & 25.7 & 14376100 \\\\\n", "\t2 & 05-01-2010 & BMY & 25.51 & 25.23 & 25.01 & 25.55 & 16973600 \\\\\n", "\t3 & 06-01-2010 & BMY & 25.17 & 25.22 & 25.07 & 25.29 & 15528900 \\\\\n", "\\end{tabular}\n" ], "text/plain": [ "3×7 SubDataFrame\n", "│ Row │ date │ symbol │ open │ close │ low │ high │ volume │\n", "│ │ \u001b[90mString\u001b[39m │ \u001b[90mString\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mInt64\u001b[39m │\n", "├─────┼────────────┼────────┼─────────┼─────────┼─────────┼─────────┼──────────┤\n", "│ 1 │ 04-01-2010 │ BMY │ 25.41 │ 25.63 │ 25.3 │ 25.7 │ 14376100 │\n", "│ 2 │ 05-01-2010 │ BMY │ 25.51 │ 25.23 │ 25.01 │ 25.55 │ 16973600 │\n", "│ 3 │ 06-01-2010 │ BMY │ 25.17 │ 25.22 │ 25.07 │ 25.29 │ 15528900 │" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Obtener un grupo\n", "losBXP = get(agrupar, (symbol=:\"BMY\",), nothing)\n", "println(typeof(losBXP))\n", "losBXP\n" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "SubDataFrame{DataFrame,DataFrames.Index,Array{Int64,1}}\n" ] }, { "data": { "text/html": [ "

3 rows × 7 columns

datesymbolopencloselowhighvolume
StringStringFloat64Float64Float64Float64Int64
104-01-2010BXP67.5967.166.5368.331511500
205-01-2010BXP67.2468.1266.4568.22173700
306-01-2010BXP68.2368.4468.0368.941814900
" ], "text/latex": [ "\\begin{tabular}{r|ccccccc}\n", "\t& date & symbol & open & close & low & high & volume\\\\\n", "\t\\hline\n", "\t& String & String & Float64 & Float64 & Float64 & Float64 & Int64\\\\\n", "\t\\hline\n", "\t1 & 04-01-2010 & BXP & 67.59 & 67.1 & 66.53 & 68.33 & 1511500 \\\\\n", "\t2 & 05-01-2010 & BXP & 67.24 & 68.12 & 66.45 & 68.2 & 2173700 \\\\\n", "\t3 & 06-01-2010 & BXP & 68.23 & 68.44 & 68.03 & 68.94 & 1814900 \\\\\n", "\\end{tabular}\n" ], "text/plain": [ "3×7 SubDataFrame\n", "│ Row │ date │ symbol │ open │ close │ low │ high │ volume │\n", "│ │ \u001b[90mString\u001b[39m │ \u001b[90mString\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mInt64\u001b[39m │\n", "├─────┼────────────┼────────┼─────────┼─────────┼─────────┼─────────┼─────────┤\n", "│ 1 │ 04-01-2010 │ BXP │ 67.59 │ 67.1 │ 66.53 │ 68.33 │ 1511500 │\n", "│ 2 │ 05-01-2010 │ BXP │ 67.24 │ 68.12 │ 66.45 │ 68.2 │ 2173700 │\n", "│ 3 │ 06-01-2010 │ BXP │ 68.23 │ 68.44 │ 68.03 │ 68.94 │ 1814900 │" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Obtener un grupo\n", "losBXP = get(agrupar, (symbol=:\"BXP\",), nothing)\n", "println(typeof(losBXP))\n", "losBXP" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "SubDataFrame{DataFrame,DataFrames.Index,Array{Int64,1}}\n" ] }, { "data": { "text/html": [ "

3 rows × 7 columns

datesymbolopencloselowhighvolume
StringStringFloat64Float64Float64Float64Int64
104-01-2010BMY25.4125.6325.325.714376100
205-01-2010BMY25.5125.2325.0125.5516973600
306-01-2010BMY25.1725.2225.0725.2915528900
" ], "text/latex": [ "\\begin{tabular}{r|ccccccc}\n", "\t& date & symbol & open & close & low & high & volume\\\\\n", "\t\\hline\n", "\t& String & String & Float64 & Float64 & Float64 & Float64 & Int64\\\\\n", "\t\\hline\n", "\t1 & 04-01-2010 & BMY & 25.41 & 25.63 & 25.3 & 25.7 & 14376100 \\\\\n", "\t2 & 05-01-2010 & BMY & 25.51 & 25.23 & 25.01 & 25.55 & 16973600 \\\\\n", "\t3 & 06-01-2010 & BMY & 25.17 & 25.22 & 25.07 & 25.29 & 15528900 \\\\\n", "\\end{tabular}\n" ], "text/plain": [ "3×7 SubDataFrame\n", "│ Row │ date │ symbol │ open │ close │ low │ high │ volume │\n", "│ │ \u001b[90mString\u001b[39m │ \u001b[90mString\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mInt64\u001b[39m │\n", "├─────┼────────────┼────────┼─────────┼─────────┼─────────┼─────────┼──────────┤\n", "│ 1 │ 04-01-2010 │ BMY │ 25.41 │ 25.63 │ 25.3 │ 25.7 │ 14376100 │\n", "│ 2 │ 05-01-2010 │ BMY │ 25.51 │ 25.23 │ 25.01 │ 25.55 │ 16973600 │\n", "│ 3 │ 06-01-2010 │ BMY │ 25.17 │ 25.22 │ 25.07 │ 25.29 │ 15528900 │" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "losBMY = agrupar[(symbol= \"BMY\",)]\n", "println(typeof(losBMY))\n", "losBMY" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Relación entre los precios de apertura vs los de cierre" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Relación de los Precios de Apertura vs los de Cierre\n", "scatter(df.open, df.close, xlabel=\"Precios de Apertura\", ylabel=\"Precios de Cierre\")" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(800, 7)\n", "(200, 7)\n" ] } ], "source": [ "# División del Conjunto de Entrenamiento y Prueba \n", "# Esquema 1: Conservando el orden de los datos\n", "# Tamaño del DataFrame \n", "filas, columnas = size(df)\n", "\n", "#Se toma el 80% para Entrenamiento\n", "nEntrenamiento = filas * 0.8\n", "nValidacion = filas - nEntrenamiento\n", "\n", "train = first(df, Int(nEntrenamiento))\n", "test = last(df, Int(nValidacion))\n", "println(size(train))\n", "println(size(test))" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(814×7 DataFrame. Omitted printing of 1 columns\n", "│ Row │ date │ symbol │ open │ close │ low │ high │\n", "│ │ \u001b[90mString\u001b[39m │ \u001b[90mString\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │\n", "├─────┼────────────┼────────┼─────────┼─────────┼─────────┼─────────┤\n", "│ 1 │ 04-01-2010 │ A │ 31.39 │ 31.3 │ 31.13 │ 31.63 │\n", "│ 2 │ 04-01-2010 │ AAP │ 40.7 │ 40.38 │ 40.36 │ 41.04 │\n", "│ 3 │ 04-01-2010 │ AAPL │ 213.43 │ 214.01 │ 212.38 │ 214.5 │\n", "│ 4 │ 04-01-2010 │ ABC │ 26.29 │ 26.63 │ 26.14 │ 26.69 │\n", "│ 5 │ 04-01-2010 │ ACN │ 41.52 │ 42.07 │ 41.5 │ 42.2 │\n", "│ 6 │ 04-01-2010 │ ADP │ 43.54 │ 42.83 │ 42.7 │ 43.54 │\n", "│ 7 │ 04-01-2010 │ ADS │ 65.0 │ 65.89 │ 64.96 │ 66.0 │\n", "│ 8 │ 04-01-2010 │ ADSK │ 25.61 │ 25.67 │ 25.61 │ 25.83 │\n", "│ 9 │ 04-01-2010 │ AEP │ 35.1 │ 34.94 │ 34.8 │ 36.0 │\n", "│ 10 │ 04-01-2010 │ AET │ 32.06 │ 33.0 │ 31.87 │ 33.08 │\n", "⋮\n", "│ 804 │ 06-01-2010 │ BBBY │ 39.13 │ 39.23 │ 38.47 │ 39.67 │\n", "│ 805 │ 06-01-2010 │ BDX │ 77.99 │ 77.66 │ 77.28 │ 78.08 │\n", "│ 806 │ 06-01-2010 │ BEN │ 108.57 │ 108.92 │ 108.32 │ 109.64 │\n", "│ 807 │ 06-01-2010 │ BHI │ 43.37 │ 45.84 │ 43.37 │ 46.03 │\n", "│ 808 │ 06-01-2010 │ BIIB │ 53.1 │ 53.43 │ 52.8 │ 53.7 │\n", "│ 809 │ 06-01-2010 │ BLL │ 51.92 │ 52.0 │ 51.64 │ 52.12 │\n", "│ 810 │ 06-01-2010 │ BMY │ 25.17 │ 25.22 │ 25.07 │ 25.29 │\n", "│ 811 │ 06-01-2010 │ BSX │ 9.07 │ 9.16 │ 8.99 │ 9.28 │\n", "│ 812 │ 06-01-2010 │ BWA │ 35.39 │ 36.69 │ 35.3 │ 36.78 │\n", "│ 813 │ 06-01-2010 │ BXP │ 68.23 │ 68.44 │ 68.03 │ 68.94 │\n", "│ 814 │ 06-01-2010 │ C │ 3.56 │ 3.64 │ 3.51 │ 3.68 │, 186×7 DataFrame. Omitted printing of 1 columns\n", "│ Row │ date │ symbol │ open │ close │ low │ high │\n", "│ │ \u001b[90mString\u001b[39m │ \u001b[90mString\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │\n", "├─────┼────────────┼────────┼─────────┼─────────┼─────────┼─────────┤\n", "│ 1 │ 04-01-2010 │ ABT │ 54.19 │ 54.46 │ 53.92 │ 54.56 │\n", "│ 2 │ 04-01-2010 │ ADBE │ 36.65 │ 37.09 │ 36.65 │ 37.3 │\n", "│ 3 │ 04-01-2010 │ ADI │ 31.79 │ 31.67 │ 31.61 │ 32.19 │\n", "│ 4 │ 04-01-2010 │ ADM │ 31.48 │ 31.47 │ 31.33 │ 31.84 │\n", "│ 5 │ 04-01-2010 │ AEE │ 28.03 │ 27.76 │ 27.69 │ 28.27 │\n", "│ 6 │ 04-01-2010 │ AES │ 13.38 │ 13.67 │ 13.38 │ 13.7 │\n", "│ 7 │ 04-01-2010 │ AGN │ 39.7 │ 40.29 │ 39.7 │ 40.46 │\n", "│ 8 │ 04-01-2010 │ AIG │ 30.53 │ 29.89 │ 29.41 │ 30.54 │\n", "│ 9 │ 04-01-2010 │ AKAM │ 25.63 │ 25.92 │ 25.53 │ 26.06 │\n", "│ 10 │ 04-01-2010 │ ALL │ 30.36 │ 30.41 │ 30.09 │ 30.51 │\n", "⋮\n", "│ 176 │ 06-01-2010 │ AMAT │ 14.23 │ 14.16 │ 14.1 │ 14.4 │\n", "│ 177 │ 06-01-2010 │ AMG │ 69.21 │ 70.79 │ 69.21 │ 71.47 │\n", "│ 178 │ 06-01-2010 │ AMP │ 41.37 │ 41.38 │ 40.86 │ 41.55 │\n", "│ 179 │ 06-01-2010 │ ARNC │ 16.31 │ 16.97 │ 16.26 │ 17.06 │\n", "│ 180 │ 06-01-2010 │ ATVI │ 11.26 │ 11.26 │ 11.21 │ 11.38 │\n", "│ 181 │ 06-01-2010 │ BAC │ 16.21 │ 16.39 │ 16.03 │ 16.54 │\n", "│ 182 │ 06-01-2010 │ BBT │ 25.95 │ 26.58 │ 25.95 │ 26.91 │\n", "│ 183 │ 06-01-2010 │ BBY │ 41.21 │ 40.89 │ 40.66 │ 41.34 │\n", "│ 184 │ 06-01-2010 │ BCR │ 79.37 │ 79.01 │ 78.68 │ 79.37 │\n", "│ 185 │ 06-01-2010 │ BK │ 28.48 │ 28.16 │ 28.09 │ 28.51 │\n", "│ 186 │ 06-01-2010 │ BLK │ 238.51 │ 234.67 │ 234.06 │ 238.65 │)" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# División del Conjunto de Entrenamiento y Prueba \n", "# Esquema 2: Aleatoriamente\n", "# Aproximadamente el 80% para entrenamiento\n", "using Lathe.preprocess: TrainTestSplit\n", "train, test = TrainTestSplit(df, .8)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## El Modelo de Regresión" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}\n", "\n", "open ~ 1 + close\n", "\n", "Coefficients:\n", "────────────────────────────────────────────────────────────────────────────\n", " Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95%\n", "────────────────────────────────────────────────────────────────────────────\n", "(Intercept) -0.161929 0.0336811 -4.81 <1e-5 -0.228041 -0.0958166\n", "close 1.00038 0.00047325 2113.84 <1e-99 0.999448 1.00131\n", "────────────────────────────────────────────────────────────────────────────" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "using GLM\n", "modelo = @formula(open~close)\n", "linreg = lm(modelo, train)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.9998183099258227" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Verificamos el valor del R cuadrado\n", "r2(linreg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Predicción" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "test_pred = predict(linreg, test);\n", "train_pred = predict(linreg, train);" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "perf_test = df_original = DataFrame(y_original = test[!, :open], y_pred = test_pred)\n", "perf_test.error = perf_test[!,:y_original] - perf_test[!,:y_pred] \n", "perf_test.error_sq = perf_test.error.*perf_test.error;" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "perf_train = df_original = DataFrame(y_original = train[!, :open], y_pred = train_pred)\n", "perf_train.error = perf_train[!,:y_original] - perf_train[!,:y_pred] \n", "perf_train.error_sq = perf_train.error .* perf_train.error;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Funciones de Pérdida" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "rmse (generic function with 1 method)" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Calcular la Función de Pérdida o Riesgo Total\n", "\n", "# Función de MAPE\n", "function mape(perf_df)\n", " mape = mean(abs.(perf_df.error ./ perf_df.y_original))\n", " return mape\n", "end\n", "\n", "#Funcion RMSE\n", "function rmse(perf_df)\n", " rmse = sqrt(mean(perf_df.error .* perf_df.error))\n", " return rmse\n", "end\n" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Mean Absolute test error: 0.5017023151586166\n", "Mean Absolute Percentage test error: 0.011701124353951658\n", "Root Mean Square Test Error: 0.7499470152740347\n", "Mean Square Test Error: 0.5624205257184333\n" ] } ], "source": [ "println(\"Mean Absolute test error: \",mean(abs.(perf_test.error)))\n", "println(\"Mean Absolute Percentage test error: \", mape(perf_test))\n", "println(\"Root Mean Square Test Error: \", rmse(perf_test))\n", "println(\"Mean Square Test Error: \",mean(perf_test.error_sq))" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Mean Absolute train error: 0.4845467192962157\n", "Mean Absolute Percentage train error: 0.012227805323710275\n", "Root Mean Square train Error: 0.7201515657678202\n", "Mean Square train Error: 0.5186182776778432\n" ] } ], "source": [ "println(\"Mean Absolute train error: \",mean(abs.(perf_train.error)))\n", "println(\"Mean Absolute Percentage train error: \", mape(perf_train))\n", "println(\"Root Mean Square train Error: \", rmse(perf_train))\n", "println(\"Mean Square train Error: \",mean(perf_train.error_sq))" ] } ], "metadata": { "kernelspec": { "display_name": "Julia 1.5.3", "language": "julia", "name": "julia-1.5" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "1.5.3" } }, "nbformat": 4, "nbformat_minor": 4 }