Carga de características desde diccionarios usando DictVectorizer#

La clase DictVectorizer permite convertir vectores de características representados como diccionarios a matrices que pueden ser usadas por los estimadores de sklearn.

Note que esta es una representación en formato JSON.

[1]:

#
# Cada diccionario representa una fila del dataset
#
measurements = [
    {"city": "Dubai", "temperature": 33.0},
    {"city": "London", "temperature": 12.0},
    {"city": "San Francisco", "temperature": 18.0},
]

from sklearn.feature_extraction import DictVectorizer

#
# Creación de una instancia
#
dictVectorizer = DictVectorizer(
    # -------------------------------------------------------------------------
    # Separator string used when constructing new features for one-hot coding.
    separator="=",
    # -------------------------------------------------------------------------
    # Whether feature_names_ and vocabulary_ should be sorted when fitting.
    sort=True,
)

#
# Entrenamiento
#
dictVectorizer.fit(measurements)

#
# Transformación
#
X = dictVectorizer.transform(measurements).toarray()
X

[1]:

array([[ 1.,  0.,  0., 33.],
       [ 0.,  1.,  0., 12.],
       [ 0.,  0.,  1., 18.]])

[2]:

#
# Fit-Transform
#
dictVectorizer.fit_transform(measurements).toarray()

[2]:

array([[ 1.,  0.,  0., 33.],
       [ 0.,  1.,  0., 12.],
       [ 0.,  0.,  1., 18.]])

[3]:

#
# Nombres de las columnas
#
dictVectorizer.get_feature_names_out()

[3]:

array(['city=Dubai', 'city=London', 'city=San Francisco', 'temperature'],
      dtype=object)

[4]:

import pandas as pd

pd.DataFrame(
    dictVectorizer.fit_transform(measurements).toarray(),
    columns=dictVectorizer.get_feature_names_out(),
)

[4]:

	city=Dubai	city=London	city=San Francisco	temperature
0	1.0	0.0	0.0	33.0
1	0.0	1.0	0.0	12.0
2	0.0	0.0	1.0	18.0

[5]:

#
# Transformación inversa
#
dictVectorizer.inverse_transform(X)

[5]:

[{'city=Dubai': 1.0, 'temperature': 33.0},
 {'city=London': 1.0, 'temperature': 12.0},
 {'city=San Francisco': 1.0, 'temperature': 18.0}]