Carga de características desde diccionarios usando DictVectorizer#
La clase DictVectorizer permite convertir vectores de características representados como diccionarios a matrices que pueden ser usadas por los estimadores de sklearn.
Note que esta es una representación en formato JSON.
[1]:
#
# Cada diccionario representa una fila del dataset
#
measurements = [
{"city": "Dubai", "temperature": 33.0},
{"city": "London", "temperature": 12.0},
{"city": "San Francisco", "temperature": 18.0},
]
from sklearn.feature_extraction import DictVectorizer
#
# Creación de una instancia
#
dictVectorizer = DictVectorizer(
# -------------------------------------------------------------------------
# Separator string used when constructing new features for one-hot coding.
separator="=",
# -------------------------------------------------------------------------
# Whether feature_names_ and vocabulary_ should be sorted when fitting.
sort=True,
)
#
# Entrenamiento
#
dictVectorizer.fit(measurements)
#
# Transformación
#
X = dictVectorizer.transform(measurements).toarray()
X
[1]:
array([[ 1., 0., 0., 33.],
[ 0., 1., 0., 12.],
[ 0., 0., 1., 18.]])
[2]:
#
# Fit-Transform
#
dictVectorizer.fit_transform(measurements).toarray()
[2]:
array([[ 1., 0., 0., 33.],
[ 0., 1., 0., 12.],
[ 0., 0., 1., 18.]])
[3]:
#
# Nombres de las columnas
#
dictVectorizer.get_feature_names_out()
[3]:
array(['city=Dubai', 'city=London', 'city=San Francisco', 'temperature'],
dtype=object)
[4]:
import pandas as pd
pd.DataFrame(
dictVectorizer.fit_transform(measurements).toarray(),
columns=dictVectorizer.get_feature_names_out(),
)
[4]:
city=Dubai | city=London | city=San Francisco | temperature | |
---|---|---|---|---|
0 | 1.0 | 0.0 | 0.0 | 33.0 |
1 | 0.0 | 1.0 | 0.0 | 12.0 |
2 | 0.0 | 0.0 | 1.0 | 18.0 |
[5]:
#
# Transformación inversa
#
dictVectorizer.inverse_transform(X)
[5]:
[{'city=Dubai': 1.0, 'temperature': 33.0},
{'city=London': 1.0, 'temperature': 12.0},
{'city=San Francisco': 1.0, 'temperature': 18.0}]