Olivetti faces dataset — 3:45 min#

  • 3:45 min | Ultima modificación: Septiembre 27, 2021 | YouTube

https://scikit-learn.org/stable/datasets/toy_dataset.html

En este dataset, hay 10 imágenes diferentes para 40 sujetos distintos (400 imágenes). Para algunos de ellos, las imágenes fueron tomadas en diferentes momentos, variando la luz, las expresiones faciales (ojos abiertos/cerrados, con/sin sonrisa) y los detalles faciales (con/sin gafas). Todas las imágenes fueron tomadas sobre un fondo oscuro y homogéneo. Cada imágen está compuesta de 4.096 elementos que toman el valor entre 0 y 1.

La variable de respuesta es un entero entre 0 y 39, el cual indica la identidad de la persona.

[1]:
from sklearn.datasets import fetch_olivetti_faces
[2]:
bunch = fetch_olivetti_faces(
    # -----------------------------------------------------
    # Specify another download and cache folder for the
    # datasets. By default all scikit-learn data is stored
    # in ‘~/scikit_learn_data’ subfolders.
    data_home=None,
    # -----------------------------------------------------
    # If True the order of the dataset is shuffled to avoid
    # having images of the same person grouped.
    shuffle=False,
    # -----------------------------------------------------
    # If True, returns (data, target) instead of a Bunch
    # object.
    return_X_y=False,
)

bunch.keys()
[2]:
dict_keys(['data', 'images', 'target', 'DESCR'])
[3]:
import matplotlib.pyplot as plt


def print_faces(images, target, top_n):
    fig = plt.figure(figsize=(12, 12))
    fig.subplots_adjust(
        left=0,
        right=1,
        bottom=0,
        top=1,
        hspace=0.05,
        wspace=0.05,
    )
    for i in range(top_n):
        p = fig.add_subplot(10, 10, i + 1, xticks=[], yticks=[])
        p.imshow(images[i], cmap=plt.cm.bone)
        p.text(0, 14, str(target[i]))
        p.text(0, 60, str(i))


print_faces(bunch.images, bunch.target, 100)
../_images/53_datasets_10_olivetti_faces_5_0.png
[4]:
bunch.data[:5, :]
[4]:
array([[0.30991736, 0.3677686 , 0.41735536, ..., 0.15289256, 0.16115703,
        0.1570248 ],
       [0.45454547, 0.47107437, 0.5123967 , ..., 0.15289256, 0.15289256,
        0.15289256],
       [0.3181818 , 0.40082645, 0.49173555, ..., 0.14049587, 0.14876033,
        0.15289256],
       [0.1983471 , 0.19421488, 0.19421488, ..., 0.75206614, 0.75206614,
        0.73966944],
       [0.5       , 0.54545456, 0.58264464, ..., 0.17768595, 0.17355372,
        0.17355372]], dtype=float32)
[5]:
bunch.target
[5]:
array([ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  1,  1,  1,  1,  1,  1,  1,
        1,  1,  1,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  3,  3,  3,  3,
        3,  3,  3,  3,  3,  3,  4,  4,  4,  4,  4,  4,  4,  4,  4,  4,  5,
        5,  5,  5,  5,  5,  5,  5,  5,  5,  6,  6,  6,  6,  6,  6,  6,  6,
        6,  6,  7,  7,  7,  7,  7,  7,  7,  7,  7,  7,  8,  8,  8,  8,  8,
        8,  8,  8,  8,  8,  9,  9,  9,  9,  9,  9,  9,  9,  9,  9, 10, 10,
       10, 10, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 11, 11, 11, 11,
       11, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 13, 13, 13, 13, 13, 13,
       13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 15, 15, 15,
       15, 15, 15, 15, 15, 15, 15, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
       17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 18, 18, 18, 18, 18, 18, 18,
       18, 18, 18, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 20, 20, 20, 20,
       20, 20, 20, 20, 20, 20, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 22,
       22, 22, 22, 22, 22, 22, 22, 22, 22, 23, 23, 23, 23, 23, 23, 23, 23,
       23, 23, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 25, 25, 25, 25, 25,
       25, 25, 25, 25, 25, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 27, 27,
       27, 27, 27, 27, 27, 27, 27, 27, 28, 28, 28, 28, 28, 28, 28, 28, 28,
       28, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 30, 30, 30, 30, 30, 30,
       30, 30, 30, 30, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32,
       32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
       34, 34, 34, 34, 34, 34, 34, 34, 34, 34, 35, 35, 35, 35, 35, 35, 35,
       35, 35, 35, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 37, 37, 37, 37,
       37, 37, 37, 37, 37, 37, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 39,
       39, 39, 39, 39, 39, 39, 39, 39, 39])
[6]:
X, y = fetch_olivetti_faces(
    # -----------------------------------------------------
    # If True, returns (data, target) instead of a Bunch
    # object.
    return_X_y=True,
)

display(
    X[:5, :],
    y[:5],
)
array([[0.30991736, 0.3677686 , 0.41735536, ..., 0.15289256, 0.16115703,
        0.1570248 ],
       [0.45454547, 0.47107437, 0.5123967 , ..., 0.15289256, 0.15289256,
        0.15289256],
       [0.3181818 , 0.40082645, 0.49173555, ..., 0.14049587, 0.14876033,
        0.15289256],
       [0.1983471 , 0.19421488, 0.19421488, ..., 0.75206614, 0.75206614,
        0.73966944],
       [0.5       , 0.54545456, 0.58264464, ..., 0.17768595, 0.17355372,
        0.17355372]], dtype=float32)
array([0, 0, 0, 0, 0])