El German Credit Dataset — 2:31 min#

  • 2:31 min | Ultima modificación: Septiembre 28, 2021 | YouTube

En este dataset se clasifican personas de acuerdo con sus hábitos de pago como riesgosas o no riesgosas.

El dataset contiene un total de 1.000 instancias y 20 atributos, de los cuales, 17 son categóricos. No hay valores faltantes.

La variable de salida toma los siguientes valores:

1 - Good
2 - Bad

Los atributos y sus valores son los siguientes:

Attribute 1:  (qualitative)
         Status of existing checking account
         A11 :      ... <    0 DM
         A12 : 0 <= ... <  200 DM
         A13 :      ... >= 200 DM /
               salary assignments for at least 1 year
         A14 : no checking account

Attribute 2:  (numerical)
         Duration in month

Attribute 3:  (qualitative)
         Credit history
         A30 : no credits taken/
               all credits paid back duly
         A31 : all credits at this bank paid back duly
         A32 : existing credits paid back duly till now
         A33 : delay in paying off in the past
         A34 : critical account/
               other credits existing (not at this bank)

Attribute 4:  (qualitative)
         Purpose
         A40 : car (new)
         A41 : car (used)
         A42 : furniture/equipment
         A43 : radio/television
         A44 : domestic appliances
         A45 : repairs
         A46 : education
         A47 : (vacation - does not exist?)
         A48 : retraining
         A49 : business
         A410 : others

Attribute 5:  (numerical)
         Credit amount

Attribute 6:  (qualitative)
         Savings account/bonds
         A61 :          ... <  100 DM
         A62 :   100 <= ... <  500 DM
         A63 :   500 <= ... < 1000 DM
         A64 :          .. >= 1000 DM
         A65 :   unknown/ no savings account

Attribute 7:  (qualitative)
         Present employment since
         A71 : unemployed
         A72 :       ... < 1 year
         A73 : 1  <= ... < 4 years
         A74 : 4  <= ... < 7 years
         A75 :       .. >= 7 years

Attribute 8:  (numerical)
         Installment rate in percentage of disposable income

Attribute 9:  (qualitative)
         Personal status and sex
         A91 : male   : divorced/separated
         A92 : female : divorced/separated/married
         A93 : male   : single
         A94 : male   : married/widowed
         A95 : female : single

Attribute 10: (qualitative)
         Other debtors / guarantors
         A101 : none
         A102 : co-applicant
         A103 : guarantor

Attribute 11: (numerical)
         Present residence since

Attribute 12: (qualitative)
         Property
         A121 : real estate
         A122 : if not A121 : building society savings agreement/
                  life insurance
         A123 : if not A121/A122 : car or other, not in attribute 6
         A124 : unknown / no property

Attribute 13: (numerical)
         Age in years

Attribute 14: (qualitative)
         Other installment plans
         A141 : bank
         A142 : stores
         A143 : none

Attribute 15: (qualitative)
         Housing
         A151 : rent
         A152 : own
         A153 : for free

Attribute 16: (numerical)
         Number of existing credits at this bank

Attribute 17: (qualitative)
         Job
         A171 : unemployed/ unskilled  - non-resident
         A172 : unskilled - resident
         A173 : skilled employee / official
         A174 : management/ self-employed/
                highly qualified employee/ officer

Attribute 18: (numerical)
         Number of people being liable to provide maintenance for

Attribute 19: (qualitative)
         Telephone
         A191 : none
         A192 : yes, registered under the customers name

Attribute 20: (qualitative)
         foreign worker
         A201 : yes
         A202 : no
[1]:
import pandas as pd

df = pd.read_csv(
    "https://raw.githubusercontent.com/jdvelasq/datalabs/master/datasets/german.csv",
)

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 21 columns):
 #   Column                Non-Null Count  Dtype
---  ------                --------------  -----
 0   checking_balance      1000 non-null   object
 1   months_loan_duration  1000 non-null   int64
 2   credit_history        1000 non-null   object
 3   purpose               1000 non-null   object
 4   amount                1000 non-null   int64
 5   savings_balance       1000 non-null   object
 6   employment_length     1000 non-null   object
 7   installment_rate      1000 non-null   int64
 8   personal_status       1000 non-null   object
 9   other_debtors         1000 non-null   object
 10  residence_history     1000 non-null   int64
 11  property              1000 non-null   object
 12  age                   1000 non-null   int64
 13  installment_plan      1000 non-null   object
 14  housing               1000 non-null   object
 15  existing_credits      1000 non-null   int64
 16  default               1000 non-null   int64
 17  dependents            1000 non-null   int64
 18  telephone             1000 non-null   object
 19  foreign_worker        1000 non-null   object
 20  job                   1000 non-null   object
dtypes: int64(8), object(13)
memory usage: 164.2+ KB
[2]:
df.head()
[2]:
checking_balance months_loan_duration credit_history purpose amount savings_balance employment_length installment_rate personal_status other_debtors ... property age installment_plan housing existing_credits default dependents telephone foreign_worker job
0 < 0 DM 6 critical radio/tv 1169 unknown > 7 yrs 4 single male none ... real estate 67 none own 2 1 1 yes yes skilled employee
1 1 - 200 DM 48 repaid radio/tv 5951 < 100 DM 1 - 4 yrs 2 female none ... real estate 22 none own 1 2 1 none yes skilled employee
2 unknown 12 critical education 2096 < 100 DM 4 - 7 yrs 2 single male none ... real estate 49 none own 1 1 2 none yes unskilled resident
3 < 0 DM 42 repaid furniture 7882 < 100 DM 4 - 7 yrs 2 single male guarantor ... building society savings 45 none for free 1 1 2 none yes skilled employee
4 < 0 DM 24 delayed car (new) 4870 < 100 DM 1 - 4 yrs 3 single male none ... unknown/none 53 none for free 2 2 2 none yes skilled employee

5 rows × 21 columns