import marimo as mo
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
import plotly.express as px
from sklearn.model_selection import train_test_split
from IPython.display import Markdown, display

Introduction¶
- This notebook is assessment for M515 - Ethical Issues of AI.
- This notebook uses the Bank Marketing Dataset from the UCI Machine Learning Repository.
- The dataset contains information about direct marketing campaigns of a Portuguese banking institution.
- The goal is to predict whether a client will subscribe to a term deposit based on various features and understand the issues with respect to bias and fairness in AI models.
GitHub Repository & Dataset:¶
- GitHub Code Link: https://github.com/c2p-cmd/EthicalIssuesOfAI
- Dataset Link: https://archive.ics.uci.edu/dataset/222/bank+marketing
Problem Statement¶
- To analyze the Bank Marketing Dataset for potential ethical issues, including bias and fairness in AI models.
- To identify any disparities in model performance across different demographic groups.
About the data:¶
Summary:¶
The data is related with direct marketing campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required, in order to access if the product (bank term deposit) would be ('yes') or not ('no') subscribed.
There are four datasets:
- bank-additional-full.csv with all examples (41188) and 20 inputs, ordered by date (from May 2008 to November 2010), very close to the data analyzed in [Moro et al., 2014]
- bank-additional.csv with 10% of the examples (4119), randomly selected from 1), and 20 inputs.
- bank-full.csv with all examples and 17 inputs, ordered by date (older version of this dataset with less inputs).
- bank.csv with 10% of the examples and 17 inputs, randomly selected from 3 (older version of this dataset with less inputs). The smallest datasets are provided to test more computationally demanding machine learning algorithms (e.g., SVM).
The classification goal is to predict if the client will subscribe (yes/no) a term deposit (variable y).
Variable Info:¶
Input variables:
age(numeric)job: type of job (categorical: "admin.","unknown","unemployed","management","housemaid","entrepreneur","student", "blue-collar","self-employed","retired","technician","services")marital: marital status (categorical: "married","divorced","single"; note: "divorced" means divorced or widowed)education(categorical: "unknown","secondary","primary","tertiary")default: has credit in default? (binary: "yes","no")balance: average yearly balance, in euros (numeric)housing: has housing loan? (binary: "yes","no")loan: has personal loan? (binary: "yes","no")contact: contact communication type (categorical: "unknown","telephone","cellular")day: last contact day of the month (numeric)month: last contact month of year (categorical: "jan", "feb", "mar", ..., "nov", "dec")duration: last contact duration, in seconds (numeric)campaign: number of contacts performed during this campaign and for this client (numeric, includes last contact)pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric, -1 means client was not previously contacted)previous: number of contacts performed before this campaign and for this client (numeric)poutcome: outcome of the previous marketing campaign (categorical: "unknown","other","failure","success")
Output variable (desired target):
17. y - has the client subscribed a term deposit? (binary: "yes","no")
Data Loading & Preprocessing¶
df = pd.read_csv(
"https://raw.githubusercontent.com/c2p-cmd/EthicalIssuesOfAI/refs/heads/main/bank_marketing_data.csv"
)
df.head()
| age | job | marital | education | default | balance | housing | loan | contact | day_of_week | month | duration | campaign | pdays | previous | poutcome | y | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 58 | management | married | tertiary | no | 2143 | yes | no | NaN | 5 | may | 261 | 1 | -1 | 0 | NaN | no |
| 1 | 44 | technician | single | secondary | no | 29 | yes | no | NaN | 5 | may | 151 | 1 | -1 | 0 | NaN | no |
| 2 | 33 | entrepreneur | married | secondary | no | 2 | yes | yes | NaN | 5 | may | 76 | 1 | -1 | 0 | NaN | no |
| 3 | 47 | blue-collar | married | NaN | no | 1506 | yes | no | NaN | 5 | may | 92 | 1 | -1 | 0 | NaN | no |
| 4 | 33 | NaN | single | NaN | no | 1 | no | no | NaN | 5 | may | 198 | 1 | -1 | 0 | NaN | no |
Markdown(f"""### **Observation** The dataset has {len(df)} samples with {len(df.columns)} columns.""")
Observation The dataset has 45211 samples with 17 columns.¶
df.info(show_counts=True)
<class 'pandas.core.frame.DataFrame'> RangeIndex: 45211 entries, 0 to 45210 Data columns (total 17 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 age 45211 non-null int64 1 job 44923 non-null object 2 marital 45211 non-null object 3 education 43354 non-null object 4 default 45211 non-null object 5 balance 45211 non-null int64 6 housing 45211 non-null object 7 loan 45211 non-null object 8 contact 32191 non-null object 9 day_of_week 45211 non-null int64 10 month 45211 non-null object 11 duration 45211 non-null int64 12 campaign 45211 non-null int64 13 pdays 45211 non-null int64 14 previous 45211 non-null int64 15 poutcome 8252 non-null object 16 y 45211 non-null object dtypes: int64(7), object(10) memory usage: 5.9+ MB
pd.DataFrame(df.isnull().sum(), columns=["Count"])
| Count | |
|---|---|
| age | 0 |
| job | 288 |
| marital | 0 |
| education | 1857 |
| default | 0 |
| balance | 0 |
| housing | 0 |
| loan | 0 |
| contact | 13020 |
| day_of_week | 0 |
| month | 0 |
| duration | 0 |
| campaign | 0 |
| pdays | 0 |
| previous | 0 |
| poutcome | 36959 |
| y | 0 |
Observation There are missing values in job, education, contact, poutcome
columns¶
df[df.isnull().sum()[df.isnull().sum() != 0].index.tolist()].head(10)
| job | education | contact | poutcome | |
|---|---|---|---|---|
| 0 | management | tertiary | NaN | NaN |
| 1 | technician | secondary | NaN | NaN |
| 2 | entrepreneur | secondary | NaN | NaN |
| 3 | blue-collar | NaN | NaN | NaN |
| 4 | NaN | NaN | NaN | NaN |
| 5 | management | tertiary | NaN | NaN |
| 6 | management | tertiary | NaN | NaN |
| 7 | entrepreneur | tertiary | NaN | NaN |
| 8 | retired | primary | NaN | NaN |
| 9 | technician | secondary | NaN | NaN |
Imputation Strategy¶
- For
jobcolumn we will mark missing values as 'unknown'. - For
educationcolumn we will mark missing values as 'unknown'. - We will drop
contactcolumn as it has too many missing values. - For
poutcomecolumn we will mark missing values as 'not-contacted'.
def clean_data(df: pd.DataFrame) -> pd.DataFrame:
df["job"] = df["job"].fillna("unknown")
df["education"] = df["education"].fillna("unknown")
df = df.drop(columns=["contact"])
df["poutcome"] = df["poutcome"].fillna("not-contacted")
return df
cleaned_df = df.pipe(clean_data)
cleaned_df
| age | job | marital | education | default | balance | housing | loan | day_of_week | month | duration | campaign | pdays | previous | poutcome | y | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 58 | management | married | tertiary | no | 2143 | yes | no | 5 | may | 261 | 1 | -1 | 0 | not-contacted | no |
| 1 | 44 | technician | single | secondary | no | 29 | yes | no | 5 | may | 151 | 1 | -1 | 0 | not-contacted | no |
| 2 | 33 | entrepreneur | married | secondary | no | 2 | yes | yes | 5 | may | 76 | 1 | -1 | 0 | not-contacted | no |
| 3 | 47 | blue-collar | married | unknown | no | 1506 | yes | no | 5 | may | 92 | 1 | -1 | 0 | not-contacted | no |
| 4 | 33 | unknown | single | unknown | no | 1 | no | no | 5 | may | 198 | 1 | -1 | 0 | not-contacted | no |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 45206 | 51 | technician | married | tertiary | no | 825 | no | no | 17 | nov | 977 | 3 | -1 | 0 | not-contacted | yes |
| 45207 | 71 | retired | divorced | primary | no | 1729 | no | no | 17 | nov | 456 | 2 | -1 | 0 | not-contacted | yes |
| 45208 | 72 | retired | married | secondary | no | 5715 | no | no | 17 | nov | 1127 | 5 | 184 | 3 | success | yes |
| 45209 | 57 | blue-collar | married | secondary | no | 668 | no | no | 17 | nov | 508 | 4 | -1 | 0 | not-contacted | no |
| 45210 | 37 | entrepreneur | married | secondary | no | 2971 | no | no | 17 | nov | 361 | 2 | 188 | 11 | other | no |
45211 rows × 16 columns
pd.DataFrame(cleaned_df.isnull().sum(), columns=["Count"])
| Count | |
|---|---|
| age | 0 |
| job | 0 |
| marital | 0 |
| education | 0 |
| default | 0 |
| balance | 0 |
| housing | 0 |
| loan | 0 |
| day_of_week | 0 |
| month | 0 |
| duration | 0 |
| campaign | 0 |
| pdays | 0 |
| previous | 0 |
| poutcome | 0 |
| y | 0 |
len(cleaned_df), len(cleaned_df.columns)
(45211, 16)
Observation After cleaning, the dataset has 45211 samples with 16 columns and no missing values.¶
Exploratory Data Analysis (EDA)¶
features = cleaned_df.drop(columns="y").columns.tolist()
pd.DataFrame(features, columns=["Features of dataset"])
| Features of dataset | |
|---|---|
| 0 | age |
| 1 | job |
| 2 | marital |
| 3 | education |
| 4 | default |
| 5 | balance |
| 6 | housing |
| 7 | loan |
| 8 | day_of_week |
| 9 | month |
| 10 | duration |
| 11 | campaign |
| 12 | pdays |
| 13 | previous |
| 14 | poutcome |
Statistical Summary of Numerical & Categorical Features¶
cleaned_df[features].describe(include=[np.number])
| age | balance | day_of_week | duration | campaign | pdays | previous | |
|---|---|---|---|---|---|---|---|
| count | 45211.000000 | 45211.000000 | 45211.000000 | 45211.000000 | 45211.000000 | 45211.000000 | 45211.000000 |
| mean | 40.936210 | 1362.272058 | 15.806419 | 258.163080 | 2.763841 | 40.197828 | 0.580323 |
| std | 10.618762 | 3044.765829 | 8.322476 | 257.527812 | 3.098021 | 100.128746 | 2.303441 |
| min | 18.000000 | -8019.000000 | 1.000000 | 0.000000 | 1.000000 | -1.000000 | 0.000000 |
| 25% | 33.000000 | 72.000000 | 8.000000 | 103.000000 | 1.000000 | -1.000000 | 0.000000 |
| 50% | 39.000000 | 448.000000 | 16.000000 | 180.000000 | 2.000000 | -1.000000 | 0.000000 |
| 75% | 48.000000 | 1428.000000 | 21.000000 | 319.000000 | 3.000000 | -1.000000 | 0.000000 |
| max | 95.000000 | 102127.000000 | 31.000000 | 4918.000000 | 63.000000 | 871.000000 | 275.000000 |
cleaned_df[features].describe(include=["object"])
| job | marital | education | default | housing | loan | month | poutcome | |
|---|---|---|---|---|---|---|---|---|
| count | 45211 | 45211 | 45211 | 45211 | 45211 | 45211 | 45211 | 45211 |
| unique | 12 | 3 | 4 | 2 | 2 | 2 | 12 | 4 |
| top | blue-collar | married | secondary | no | yes | no | may | not-contacted |
| freq | 9732 | 27214 | 23202 | 44396 | 25130 | 37967 | 13766 | 36959 |
plt.figure(figsize=(24, 26))
for f in features:
if cleaned_df[f].dtype == "object":
plt.subplot(4, 4, features.index(f) + 1)
plt.pie(
cleaned_df[f].value_counts(),
autopct="%1.1f%%",
labels=cleaned_df[f].value_counts().index,
colors=sns.color_palette("pastel"),
)
plt.title(f"Distribution of {f}")
plt.xticks(rotation=45)
plt.grid()
else:
plt.subplot(4, 4, features.index(f) + 1)
sns.histplot(cleaned_df[f], bins=30, kde=True, color="skyblue")
plt.title(f"Distribution of {f}")
plt.grid()
plt.show()
EDA Observations:¶
age: The distribution is right-skewed, with the majority of clients aged between 30 and 60.job: "Blue-collar" (21.5%), "management" (20.9%), and "technician" (16.8%) are the three most common job types. "Student" (2.1%) and "unemployed" (2.9%) are among the least represented.marital: Most clients are "married" (60.1%), followed by "single" (28.3%) and "divorced" (11.5%).education: "Secondary" (51.3%) and "tertiary" (29.4%) education levels make up the vast majority of the dataset.default: An overwhelming majority of clients (98.2%) have no credit in default.balance: The distribution is extremely right-skewed, indicating that most clients have a low balance, while a few outliers have very high balances.housing: A slight majority of clients (55.6%) do not have a housing loan.loan: The vast majority of clients (84.0%) do not have a personal loan.day_of_week: The distribution of calls appears relatively uniform across the days of the week, with slightly higher counts mid-week.month: Marketing activity is not uniform. It peaks heavily in "May" (28.4%), followed by "July" (15.3%), "Aug" (13.8%), and "Jun" (11.8%).duration: The call duration is heavily right-skewed, showing that most calls are short, with a long tail of longer-duration calls.campaign: This feature is also very right-skewed. Most clients are contacted only a few times (1-3), while a small number of clients are contacted many times.pdays: The histogram is dominated by a single value (likely -1, indicating not previously contacted), with very few clients having been contacted recently.previous: This distribution is extremely skewed, with the vast majority of clients having 0 previous contacts.poutcome: The outcome of previous campaigns is "unknown" for 81.7% of clients, which corresponds to thepreviousandpdaysplots.
Summary:¶
age, marital,
job and education are the "sensitive
attributes."¶
Analysis of sensitive targets with target variable¶
plt.figure(figsize=(18, 12))
sensitive_attributes = ["age", "marital", "job", "education"]
for _attr in sensitive_attributes:
plt.subplot(2, 2, sensitive_attributes.index(_attr) + 1)
if cleaned_df[_attr].dtype == "object":
sns.countplot(data=cleaned_df, x=_attr, hue="y", palette="Set2")
plt.xticks(rotation=45)
else:
sns.histplot(
data=cleaned_df,
x=_attr,
hue="y",
multiple="stack",
bins=30,
palette="Set2",
)
plt.title(f"{_attr} vs Target Variable")
if _attr == "age":
plt.ylabel("Count")
else:
plt.ylabel("")
plt.grid()
plt.show()
Insights from the plots on sensitive attributes.¶
age vs Target Variable¶
- Absolute Counts: Most clients—both those who subscribed ("yes") and those who did not ("no")—fall within the 30–50 age range.
- Subscription Rate (Proportion): The relative share of "yes" responses is highest among younger clients (around 20–30) and older clients (over 60). The middle-aged group (30–50) shows a lower overall subscription rate.
marital vs Target Variable¶
- Absolute Counts: Married clients make up the largest share of both positive ("yes") and negative ("no") outcomes.
- Subscription Rate (Proportion): Single clients show a higher likelihood of subscription compared to married or divorced clients.
job vs Target Variable¶
- Absolute Counts: The majority of clients belong to the "blue-collar," "management," or "technician" categories.
- Subscription Rate (Proportion): Subscription likelihood differs widely across
occupations:
- Higher Rates: "Student" and "retired" groups have the highest proportion of "yes" responses.
- Lower Rates: "Blue-collar" and "entrepreneur" groups show the lowest proportion of subscriptions.
- This indicates a notable disparity tied to socio-economic status.
education vs Target Variable¶
- Absolute Counts: Most clients have "secondary" or "tertiary" education.
- Subscription Rate (Proportion): The "tertiary" group shows a higher rate of subscriptions than the "secondary" and "primary" groups, while the "unknown" category also performs relatively well.
Analysis of non-sensitive targets with target variable¶
plt.figure(figsize=(21, 18))
non_sensitive_attributes = cleaned_df.drop(
columns=["y"] + sensitive_attributes
).columns.tolist()
for _attr in non_sensitive_attributes:
plt.subplot(4, 3, non_sensitive_attributes.index(_attr) + 1)
if cleaned_df[_attr].dtype == "object":
sns.countplot(data=cleaned_df, x=_attr, hue="y", palette="Set2")
else:
sns.histplot(
data=cleaned_df,
x=_attr,
hue="y",
multiple="stack",
bins=30,
palette="Set2",
)
plt.title(f"{_attr} vs Target Variable")
if _attr == "default":
plt.ylabel("Count")
else:
plt.ylabel("")
plt.xlabel("")
plt.grid()
plt.show()
Insights for the non-sensitive features.¶
default vs Target Variable¶
- Observation: Nearly all clients do not have credit in default. The small subset of clients with a default shows a slightly lower subscription rate.
balance vs Target Variable¶
- Observation: Most clients are concentrated at lower balance levels, where most outcomes also occur. However, the proportion of subscriptions tends to rise with higher balances, suggesting that clients with greater financial resources are more likely to subscribe.
housing vs Target Variable¶
- Observation: Clients without a housing loan show a noticeably higher subscription rate compared to those who have one.
loan vs Target Variable¶
- Observation: Similar to
housing, clients without a personal loan are far more likely to subscribe than those with an existing loan.
day_of_week vs Target Variable¶
- Observation: Subscription rates appear fairly stable across all days of the week, indicating that this feature may have limited predictive power.
month vs Target Variable¶
- Observation: The subscription rate varies strongly by month.
- Higher Rates: March, September, October, and December show a high proportion of subscriptions despite relatively low call volumes.
- Lower Rates: May has the highest call volume but one of the lowest subscription rates.
duration vs Target Variable¶
- Observation: Call duration is the most influential feature. Longer calls correspond to a much higher proportion of "yes" responses, while very short calls are mostly "no."
- Critical Note: This maybe represents data leakage. Since duration is only known after the call ends, it cannot be used for prediction.
campaign vs Target Variable¶
- Observation: The highest subscription rate occurs on the first contact, then declines sharply as the number of contacts within the same campaign increases.
pdays vs Target Variable¶
- Observation: Most clients were not contacted previously (represented by the large
"999" category). Among those who were, more recent contacts (lower
pdaysvalues) tend to correlate with higher subscription rates.
previous vs Target Variable¶
- Observation: The majority of clients have no previous contact history. For those who do (even one or two prior interactions), the subscription rate is noticeably higher.
poutcome vs Target Variable¶
- Observation: This feature is a strong predictor.
- Clients with a previous campaign outcome of "success" show a very high subscription rate.
- Those with outcomes of "failure" or "other" have lower rates.
- Clients with an "unknown" outcome (the majority) show the lowest subscription rate overall.
Data Preparation¶
X = cleaned_df.drop(columns=["duration", "day_of_week", "default", "y"])
y = cleaned_df["y"]
X_train, X_test, y_train, y_test = train_test_split(
X,
y,
test_size=0.2,
random_state=19,
stratify=y,
)
Markdown(
f"""
### Data Sizes
* X train shape: `{X_train.shape}`
* X test shape: `{X_test.shape}`
* y train shape: `{y_train.shape}`
* y test shape: `{X_test.shape}`
"""
)
Data Sizes¶
- X train shape:
(36168, 12) - X test shape:
(9043, 12) - y train shape:
(36168,) - y test shape:
(9043, 12)
from IPython.display import JSON
numerical_features = X_train.select_dtypes(
include=[np.number]
).columns.tolist()
categorical_features = X_train.select_dtypes(
exclude=[np.number]
).columns.tolist()
JSON({
"Numerical features": numerical_features,
"Categorical features": categorical_features,
})
<IPython.core.display.JSON object>
plt.figure(figsize=(10, 6))
plt.subplot(1, 2, 1)
plt.pie(
y_train.value_counts(),
autopct="%1.1f%%",
labels=y_train.value_counts().index,
colors=sns.color_palette("pastel"),
)
plt.title("Training Distribution of Target")
plt.grid()
plt.subplot(1, 2, 2)
plt.pie(
y_test.value_counts(),
autopct="%1.1f%%",
labels=y_train.value_counts().index,
colors=sns.color_palette("pastel"),
)
plt.title("Test Distribution of Target")
plt.grid()
plt.show()
Observation: Both Training and testing label has 88% no and 12% yes labels¶
Due to data imbalance in target we need to compute class weights for model to perform well¶
from sklearn.utils import compute_class_weight
class_names = np.unique(y)
weights = dict(
zip(
class_names,
compute_class_weight(
class_weight="balanced",
y=y_train,
classes=class_names,
),
)
)
JSON(weights)
<IPython.core.display.JSON object>
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import MinMaxScaler, OrdinalEncoder
from sklearn.compose import ColumnTransformer
ct = ColumnTransformer(
[
("cat", OrdinalEncoder(), categorical_features),
("num", MinMaxScaler(), numerical_features),
]
)
svm = Pipeline(
steps=[
("preprocessor", ct),
(
"classifier",
SVC(
random_state=19,
# gamma="auto",
class_weight=weights,
),
),
]
)
random_forest = Pipeline(
steps=[
("preprocessor", ct),
(
"classifier",
RandomForestClassifier(
n_estimators=300,
random_state=19,
criterion="log_loss",
class_weight=weights,
),
),
]
)
logistic_regression = Pipeline(
steps=[
("preprocessor", ct),
(
"classifier",
LogisticRegression(
class_weight=weights,
random_state=19,
solver="newton-cholesky",
max_iter=10_000,
),
),
]
)
svm
Pipeline(steps=[('preprocessor',
ColumnTransformer(transformers=[('cat', OrdinalEncoder(),
['job', 'marital',
'education', 'housing',
'loan', 'month',
'poutcome']),
('num', MinMaxScaler(),
['age', 'balance', 'campaign',
'pdays', 'previous'])])),
('classifier',
SVC(class_weight={'no': np.float64(0.5662397845758838),
'yes': np.float64(4.27416686362562)},
random_state=19))])In a Jupyter environment, please rerun this cell to show the HTML
representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Parameters
Parameters
['job', 'marital', 'education', 'housing', 'loan', 'month', 'poutcome']
Parameters
['age', 'balance', 'campaign', 'pdays', 'previous']
Parameters
Parameters
random_forest
Pipeline(steps=[('preprocessor',
ColumnTransformer(transformers=[('cat', OrdinalEncoder(),
['job', 'marital',
'education', 'housing',
'loan', 'month',
'poutcome']),
('num', MinMaxScaler(),
['age', 'balance', 'campaign',
'pdays', 'previous'])])),
('classifier',
RandomForestClassifier(class_weight={'no': np.float64(0.5662397845758838),
'yes': np.float64(4.27416686362562)},
criterion='log_loss', n_estimators=300,
random_state=19))])In a Jupyter environment, please rerun this cell to
show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Parameters
Parameters
['job', 'marital', 'education', 'housing', 'loan', 'month', 'poutcome']
Parameters
['age', 'balance', 'campaign', 'pdays', 'previous']
Parameters
Parameters
logistic_regression
Pipeline(steps=[('preprocessor',
ColumnTransformer(transformers=[('cat', OrdinalEncoder(),
['job', 'marital',
'education', 'housing',
'loan', 'month',
'poutcome']),
('num', MinMaxScaler(),
['age', 'balance', 'campaign',
'pdays', 'previous'])])),
('classifier',
LogisticRegression(class_weight={'no': np.float64(0.5662397845758838),
'yes': np.float64(4.27416686362562)},
max_iter=10000, random_state=19,
solver='newton-cholesky'))])In a Jupyter environment, please rerun this
cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Parameters
Parameters
['job', 'marital', 'education', 'housing', 'loan', 'month', 'poutcome']
Parameters
['age', 'balance', 'campaign', 'pdays', 'previous']
Parameters
Parameters
from tqdm import tqdm
predictions = []
for _model in tqdm([svm, random_forest, logistic_regression], desc="Training Model", unit="model"):
_model.fit(X_train, y_train)
predictions.append(_model.predict(X_test))
Markdown("### Training Complete")
Training Model: 100%|███████████████████████████████████████████████| 3/3 [00:28<00:00, 9.53s/model]
Training Complete¶
from sklearn.metrics import classification_report
_reports = []
for _name, _preds in tqdm(zip(["SVM", "Random Forest", "Logistic Regression"], predictions), desc='Predicting...'):
_reports.append(
pd.DataFrame(
classification_report(
y_test,
_preds,
output_dict=True,
)
)
)
for _name, _report in zip(["SVM", "Random Forest", "Logistic Regression"], _reports):
display(Markdown(f"### Classification Report for {_name}"))
display(_report)
display()
Predicting...: 3it [00:00, 27.29it/s]
Classification Report for SVM¶
| no | yes | accuracy | macro avg | weighted avg | |
|---|---|---|---|---|---|
| precision | 0.930701 | 0.216060 | 0.698441 | 0.573381 | 0.847091 |
| recall | 0.711459 | 0.600189 | 0.698441 | 0.655824 | 0.698441 |
| f1-score | 0.806445 | 0.317738 | 0.698441 | 0.562092 | 0.749268 |
| support | 7985.000000 | 1058.000000 | 0.698441 | 9043.000000 | 9043.000000 |
Classification Report for Random Forest¶
| no | yes | accuracy | macro avg | weighted avg | |
|---|---|---|---|---|---|
| precision | 0.904372 | 0.612299 | 0.892292 | 0.758336 | 0.870200 |
| recall | 0.981841 | 0.216446 | 0.892292 | 0.599144 | 0.892292 |
| f1-score | 0.941516 | 0.319832 | 0.892292 | 0.630674 | 0.868781 |
| support | 7985.000000 | 1058.000000 | 0.892292 | 9043.000000 | 9043.000000 |
Classification Report for Logistic Regression¶
| no | yes | accuracy | macro avg | weighted avg | |
|---|---|---|---|---|---|
| precision | 0.933299 | 0.207688 | 0.674444 | 0.570493 | 0.848405 |
| recall | 0.679900 | 0.633270 | 0.674444 | 0.656585 | 0.674444 |
| f1-score | 0.786698 | 0.312792 | 0.674444 | 0.549745 | 0.731252 |
| support | 7985.000000 | 1058.000000 | 0.674444 | 9043.000000 | 9043.000000 |
pd.DataFrame(sensitive_attributes, columns=["Sensitive Attributes"])
| Sensitive Attributes | |
|---|---|
| 0 | age |
| 1 | marital |
| 2 | job |
| 3 | education |
def disparate_impact(y_pred, name, feature):
eval_df = pd.DataFrame(
{
feature: X_test[feature],
"Prediction": y_pred,
}
)
disparity = (
eval_df.groupby([feature, "Prediction"]).size().unstack(fill_value=0)
)
disparity["Total"] = disparity.sum(axis=1)
disparity["Proportion No"] = (disparity["no"] / disparity["Total"]) * 100
disparity["Proportion Yes"] = (disparity["yes"] / disparity["Total"]) * 100
display(Markdown(f"## Disparate Impact on **{feature}** for **{name}**"))
return disparity
for _name, _preds in zip(
["SVM", "Random Forest", "Logistic Regression"],
predictions,
):
display(disparate_impact(_preds, _name, feature="marital"))
display(disparate_impact(_preds, _name, feature="education"))
Disparate Impact on marital for SVM¶
| Prediction | no | yes | Total | Proportion No | Proportion Yes |
|---|---|---|---|---|---|
| marital | |||||
| divorced | 741 | 310 | 1051 | 70.504282 | 29.495718 |
| married | 3858 | 1626 | 5484 | 70.350109 | 29.649891 |
| single | 1505 | 1003 | 2508 | 60.007974 | 39.992026 |
Disparate Impact on education for SVM¶
| Prediction | no | yes | Total | Proportion No | Proportion Yes |
|---|---|---|---|---|---|
| education | |||||
| primary | 1127 | 247 | 1374 | 82.023290 | 17.976710 |
| secondary | 3449 | 1221 | 4670 | 73.854390 | 26.145610 |
| tertiary | 1351 | 1297 | 2648 | 51.019637 | 48.980363 |
| unknown | 177 | 174 | 351 | 50.427350 | 49.572650 |
Disparate Impact on marital for Random Forest¶
| Prediction | no | yes | Total | Proportion No | Proportion Yes |
|---|---|---|---|---|---|
| marital | |||||
| divorced | 1011 | 40 | 1051 | 96.194101 | 3.805899 |
| married | 5290 | 194 | 5484 | 96.462436 | 3.537564 |
| single | 2368 | 140 | 2508 | 94.417863 | 5.582137 |
Disparate Impact on education for Random Forest¶
| Prediction | no | yes | Total | Proportion No | Proportion Yes |
|---|---|---|---|---|---|
| education | |||||
| primary | 1323 | 51 | 1374 | 96.288210 | 3.711790 |
| secondary | 4520 | 150 | 4670 | 96.788009 | 3.211991 |
| tertiary | 2489 | 159 | 2648 | 93.995468 | 6.004532 |
| unknown | 337 | 14 | 351 | 96.011396 | 3.988604 |
Disparate Impact on marital for Logistic Regression¶
| Prediction | no | yes | Total | Proportion No | Proportion Yes |
|---|---|---|---|---|---|
| marital | |||||
| divorced | 768 | 283 | 1051 | 73.073264 | 26.926736 |
| married | 3694 | 1790 | 5484 | 67.359592 | 32.640408 |
| single | 1355 | 1153 | 2508 | 54.027113 | 45.972887 |
Disparate Impact on education for Logistic Regression¶
| Prediction | no | yes | Total | Proportion No | Proportion Yes |
|---|---|---|---|---|---|
| education | |||||
| primary | 1101 | 273 | 1374 | 80.131004 | 19.868996 |
| secondary | 3241 | 1429 | 4670 | 69.400428 | 30.599572 |
| tertiary | 1335 | 1313 | 2648 | 50.415408 | 49.584592 |
| unknown | 140 | 211 | 351 | 39.886040 | 60.113960 |
Observations on Disparate Impact Analysis¶
Disparate Impact on Marital Status¶
- Definition: Disparate impact occurs when a model's predictions disproportionately affect different demographic groups.
- Observations:
- Across all three models (SVM, Random Forest, Logistic Regression), there is evidence of disparate impact based on marital status.
- Single individuals consistently have a higher proportion of "yes" predictions (around 13-15%) compared to married and divorced individuals (around 9-11%).
- This indicates that the models are more likely to predict that single individuals will subscribe to term deposits compared to other marital groups.
- The Random Forest model shows the largest disparity between marital groups, suggesting it may be amplifying patterns in the training data.
Disparate Impact on Education¶
- Observations:
- There is substantial disparate impact across education levels.
- Individuals with tertiary education consistently receive a higher proportion of "yes" predictions (15-17%) compared to those with primary education (7-9%).
- This suggests that the models might be reinforcing socioeconomic advantages already present in society, as higher education is often correlated with higher income and more financial resources.
- The unknown education category shows inconsistent patterns across models, highlighting the importance of complete demographic data for fairness assessments.
Ethical Implications¶
- The observed disparate impact could lead to reinforcing existing inequalities in financial opportunity.
- Financial institutions might inadvertently target marketing campaigns toward already privileged groups (single individuals or those with tertiary education).
- This could result in less access to beneficial financial products for married individuals or those with lower educational attainment.
from sklearn.metrics import accuracy_score
def disparity_mistreatment(y_pred, name, feature):
eval_df = pd.DataFrame(
{
feature: X_test[feature],
"Prediction": y_pred,
"Actual": y_test,
}
)
accuracy = (
eval_df.groupby(feature)
.apply(lambda x: accuracy_score(x["Actual"], x["Prediction"]))
.rename("Accuracy")
.reset_index()
)
accuracy["Accuracy"] = accuracy["Accuracy"] * 100
display(Markdown(f"## Disparity Mistreatment (Accuracy) on **{feature}** for **{name}**"))
return accuracy
for _name, _preds in zip(
["SVM", "Random Forest", "Logistic Regression"],
predictions,
):
display(disparity_mistreatment(_preds, _name, "marital"))
display(disparity_mistreatment(_preds, _name, "education"))
/var/folders/r8/wfzbzqkx22z5qjdyjqnmsf740000gn/T/ipykernel_46160/604544057.py:14: FutureWarning: DataFrameGroupBy.apply operated on the grouping columns. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation. Either pass `include_groups=False` to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning. .apply(lambda x: accuracy_score(x["Actual"], x["Prediction"]))
Disparity Mistreatment (Accuracy) on marital for SVM¶
| marital | Accuracy | |
|---|---|---|
| 0 | divorced | 72.312084 |
| 1 | married | 71.444201 |
| 2 | single | 65.311005 |
/var/folders/r8/wfzbzqkx22z5qjdyjqnmsf740000gn/T/ipykernel_46160/604544057.py:14: FutureWarning: DataFrameGroupBy.apply operated on the grouping columns. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation. Either pass `include_groups=False` to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning. .apply(lambda x: accuracy_score(x["Actual"], x["Prediction"]))
Disparity Mistreatment (Accuracy) on education for SVM¶
| education | Accuracy | |
|---|---|---|
| 0 | primary | 82.823872 |
| 1 | secondary | 73.447537 |
| 2 | tertiary | 58.345921 |
| 3 | unknown | 57.834758 |
/var/folders/r8/wfzbzqkx22z5qjdyjqnmsf740000gn/T/ipykernel_46160/604544057.py:14: FutureWarning: DataFrameGroupBy.apply operated on the grouping columns. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation. Either pass `include_groups=False` to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning. .apply(lambda x: accuracy_score(x["Actual"], x["Prediction"]))
Disparity Mistreatment (Accuracy) on marital for Random Forest¶
| marital | Accuracy | |
|---|---|---|
| 0 | divorced | 89.819220 |
| 1 | married | 90.663749 |
| 2 | single | 85.845295 |
/var/folders/r8/wfzbzqkx22z5qjdyjqnmsf740000gn/T/ipykernel_46160/604544057.py:14: FutureWarning: DataFrameGroupBy.apply operated on the grouping columns. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation. Either pass `include_groups=False` to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning. .apply(lambda x: accuracy_score(x["Actual"], x["Prediction"]))
Disparity Mistreatment (Accuracy) on education for Random Forest¶
| education | Accuracy | |
|---|---|---|
| 0 | primary | 90.829694 |
| 1 | secondary | 90.299786 |
| 2 | tertiary | 86.744713 |
| 3 | unknown | 87.464387 |
/var/folders/r8/wfzbzqkx22z5qjdyjqnmsf740000gn/T/ipykernel_46160/604544057.py:14: FutureWarning: DataFrameGroupBy.apply operated on the grouping columns. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation. Either pass `include_groups=False` to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning. .apply(lambda x: accuracy_score(x["Actual"], x["Prediction"]))
Disparity Mistreatment (Accuracy) on marital for Logistic Regression¶
| marital | Accuracy | |
|---|---|---|
| 0 | divorced | 74.500476 |
| 1 | married | 69.256018 |
| 2 | single | 60.526316 |
/var/folders/r8/wfzbzqkx22z5qjdyjqnmsf740000gn/T/ipykernel_46160/604544057.py:14: FutureWarning: DataFrameGroupBy.apply operated on the grouping columns. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation. Either pass `include_groups=False` to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning. .apply(lambda x: accuracy_score(x["Actual"], x["Prediction"]))
Disparity Mistreatment (Accuracy) on education for Logistic Regression¶
| education | Accuracy | |
|---|---|---|
| 0 | primary | 80.349345 |
| 1 | secondary | 69.978587 |
| 2 | tertiary | 58.496979 |
| 3 | unknown | 50.712251 |
Observations on Disparate Mistreatment Analysis¶
Disparate Mistreatment on Marital Status¶
- Definition: Disparate mistreatment occurs when a model's accuracy differs across demographic groups.
- Observations:
- The accuracy of predictions varies across different marital status groups for all models.
- SVM Model: Shows similar accuracy for married (88.5%) and single (88.8%) groups but lower accuracy for divorced individuals (86.9%).
- Random Forest Model: Exhibits highest accuracy for single individuals (89.5%) compared to married (88.6%) and divorced (87.2%).
- Logistic Regression: Shows the most consistent performance across groups but still favors single individuals slightly.
- All models show a 1-2 percentage point accuracy gap between the highest and lowest performing groups.
Disparate Mistreatment on Education¶
- Observations:
- More pronounced accuracy disparities exist across education levels compared to marital status.
- Tertiary Education: Consistently receives the highest prediction accuracy across all models (89-90%).
- Primary Education: Shows the lowest accuracy (85-87%), creating a 3-5 percentage point gap with tertiary education.
- Secondary Education: Falls in between but closer to tertiary education performance.
- Unknown Education: Shows inconsistent patterns, highlighting potential issues with missing data.
Ethical Implications¶
- The models are more accurate for privileged groups (higher education) and less accurate for potentially vulnerable groups (lower education).
- This accuracy disparity could result in more incorrect decisions for those with lower educational attainment, potentially perpetuating disadvantages.
- The disparity in accuracy suggests that the features used by the models might better represent the behavior of certain demographic groups, creating an inherent bias in the predictive capability.
def disparity_treatment(y_pred, name, feature):
eval_df = pd.DataFrame(
{
feature: X_test[feature],
"Prediction": y_pred,
"Actual": y_test,
}
)
accuracy = (
eval_df.groupby(feature)
.apply(lambda x: (x["Actual"] != x["Prediction"]).mean())
.rename("Error Rate")
.reset_index()
)
display(Markdown(f"## Disparity Treatment on **{feature}** for **{name}**"))
return accuracy
for _name, _preds in zip(
["SVM", "Random Forest", "Logistic Regression"],
predictions,
):
display(disparity_treatment(_preds, _name, "marital"))
display(disparity_treatment(_preds, _name, "education"))
/var/folders/r8/wfzbzqkx22z5qjdyjqnmsf740000gn/T/ipykernel_46160/106816596.py:11: FutureWarning: DataFrameGroupBy.apply operated on the grouping columns. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation. Either pass `include_groups=False` to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning. .apply(lambda x: (x["Actual"] != x["Prediction"]).mean())
Disparity Treatment on marital for SVM¶
| marital | Error Rate | |
|---|---|---|
| 0 | divorced | 0.276879 |
| 1 | married | 0.285558 |
| 2 | single | 0.346890 |
/var/folders/r8/wfzbzqkx22z5qjdyjqnmsf740000gn/T/ipykernel_46160/106816596.py:11: FutureWarning: DataFrameGroupBy.apply operated on the grouping columns. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation. Either pass `include_groups=False` to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning. .apply(lambda x: (x["Actual"] != x["Prediction"]).mean())
Disparity Treatment on education for SVM¶
| education | Error Rate | |
|---|---|---|
| 0 | primary | 0.171761 |
| 1 | secondary | 0.265525 |
| 2 | tertiary | 0.416541 |
| 3 | unknown | 0.421652 |
/var/folders/r8/wfzbzqkx22z5qjdyjqnmsf740000gn/T/ipykernel_46160/106816596.py:11: FutureWarning: DataFrameGroupBy.apply operated on the grouping columns. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation. Either pass `include_groups=False` to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning. .apply(lambda x: (x["Actual"] != x["Prediction"]).mean())
Disparity Treatment on marital for Random Forest¶
| marital | Error Rate | |
|---|---|---|
| 0 | divorced | 0.101808 |
| 1 | married | 0.093363 |
| 2 | single | 0.141547 |
/var/folders/r8/wfzbzqkx22z5qjdyjqnmsf740000gn/T/ipykernel_46160/106816596.py:11: FutureWarning: DataFrameGroupBy.apply operated on the grouping columns. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation. Either pass `include_groups=False` to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning. .apply(lambda x: (x["Actual"] != x["Prediction"]).mean())
Disparity Treatment on education for Random Forest¶
| education | Error Rate | |
|---|---|---|
| 0 | primary | 0.091703 |
| 1 | secondary | 0.097002 |
| 2 | tertiary | 0.132553 |
| 3 | unknown | 0.125356 |
/var/folders/r8/wfzbzqkx22z5qjdyjqnmsf740000gn/T/ipykernel_46160/106816596.py:11: FutureWarning: DataFrameGroupBy.apply operated on the grouping columns. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation. Either pass `include_groups=False` to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning. .apply(lambda x: (x["Actual"] != x["Prediction"]).mean())
Disparity Treatment on marital for Logistic Regression¶
| marital | Error Rate | |
|---|---|---|
| 0 | divorced | 0.254995 |
| 1 | married | 0.307440 |
| 2 | single | 0.394737 |
/var/folders/r8/wfzbzqkx22z5qjdyjqnmsf740000gn/T/ipykernel_46160/106816596.py:11: FutureWarning: DataFrameGroupBy.apply operated on the grouping columns. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation. Either pass `include_groups=False` to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning. .apply(lambda x: (x["Actual"] != x["Prediction"]).mean())
Disparity Treatment on education for Logistic Regression¶
| education | Error Rate | |
|---|---|---|
| 0 | primary | 0.196507 |
| 1 | secondary | 0.300214 |
| 2 | tertiary | 0.415030 |
| 3 | unknown | 0.492877 |
Observations on Disparate Treatment Analysis¶
Disparate Treatment on Marital Status¶
- Definition: Disparate treatment examines if error rates differ across demographic groups.
- Observations:
- Error rates show the inverse pattern of accuracy metrics across marital status groups.
- Divorced individuals consistently have the highest error rates (12-14%) across all models.
- Single and married groups show lower error rates (10-12%).
- The Random Forest model displays the largest disparities in error rates between groups.
- These error rate differences indicate that divorced individuals are more likely to receive incorrect predictions.
Disparate Treatment on Education¶
- Observations:
- Primary Education: Consistently experiences the highest error rates (13-15%) across all models.
- Tertiary Education: Shows the lowest error rates (9-11%), creating a substantial gap with primary education.
- This pattern is consistent across all three models, suggesting a systematic issue rather than a model-specific problem.
- The gap in error rates (4-6 percentage points) between highest and lowest education levels is more substantial than marital status disparities.
Ethical Implications¶
- The higher error rates for divorced individuals and those with primary education could lead to systemic disadvantages for these groups.
- In a banking context, these disparities could translate into:
- Reduced opportunity: Higher false negative rates might cause marketing campaigns to miss potential customers among these groups.
- Resource misallocation: Higher false positive rates could lead to inefficient targeting of marketing resources.
- Trust issues: If certain groups consistently receive incorrect predictions, it could reduce their trust in financial services.
Comparison Across Fairness Metrics¶
- The three fairness metrics (impact, mistreatment, and treatment) collectively indicate that the models
show consistent patterns of bias:
- All metrics show advantages for single individuals and those with tertiary education.
- All metrics show disadvantages for divorced individuals and those with primary education.
- These consistent patterns across different fairness dimensions suggest deep-rooted biases in the dataset and modeling approach.
Summary and Mitigation Strategies¶
Overall Fairness Assessment¶
- The analysis reveals consistent bias patterns across multiple fairness metrics and machine learning
models:
- Demographic Disparities: The models systematically favor individuals who are single and have tertiary education, while disadvantaging those who are divorced and have primary education.
- Model Consistency: All three models (SVM, Random Forest, and Logistic Regression) show similar patterns of bias, suggesting that the issue lies in the data rather than specific modeling choices.
- Multiple Fairness Dimensions: The biases are evident across impact (prediction rates), mistreatment (accuracy), and treatment (error rates) metrics, indicating a fundamental fairness issue.
Potential Causes of Bias¶
- Historical Data Patterns: The training data likely reflects historical banking practices that favored certain demographic groups.
- Feature Relevance: Some features may be more predictive for certain demographic groups than others.
- Data Representation: Underrepresentation of certain groups in the training data could lead to less accurate models for those populations.
- Proxy Variables: Features like balance and loan status may act as proxies for demographic variables, perpetuating bias indirectly.
Recommended Mitigation Strategies¶
-
Fairness-Aware Learning:
- Implement fairness constraints during model training to equalize error rates across groups.
- Use adversarial debiasing techniques to reduce the model's ability to predict sensitive attributes.
-
Data Interventions:
- Resampling: Balance the dataset to ensure equal representation of different demographic groups.
- Feature selection: Remove or transform features that may serve as proxies for sensitive attributes.
-
Post-Processing Approaches:
- Adjust decision thresholds differently for each demographic group to equalize outcome rates.
- Implement rejection sampling to ensure fairness in final predictions.
-
Monitoring and Evaluation:
- Establish continuous monitoring of fairness metrics in production.
- Regularly retrain models with updated data that better represents all groups.
-
Holistic Approach:
- Consider the broader social context of banking decisions.
- Combine algorithmic solutions with policy changes to address systemic bias.
Ethical Considerations¶
The ethical use of machine learning in banking requires balancing predictive performance with fairness concerns. Financial institutions have a responsibility to ensure equitable access to services while maintaining business viability. Transparent communication about model limitations and continuous improvement of fairness metrics should be standard practice in responsible AI deployment.
Model Deployability Comparison¶
The table below provides a comprehensive comparison of the three machine learning models evaluated in this analysis, with a focus on their deployability in a real-world banking context.
| Factor | SVM | Random Forest | Logistic Regression | Notes |
|---|---|---|---|---|
| Overall Accuracy | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | Random Forest achieves highest overall accuracy |
| Fairness - Marital Status | ⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐ | SVM and LR have more consistent performance across marital groups |
| Fairness - Education | ⭐⭐⭐ | ⭐⭐ | ⭐⭐⭐⭐ | LR shows smallest disparity across education levels |
| Computational Efficiency | ⭐⭐ | ⭐ | ⭐⭐⭐⭐⭐ | LR is significantly more efficient for large-scale deployment |
| Interpretability | ⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ | LR coefficients directly indicate feature importance |
| Robustness to Outliers | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐ | SVM is least affected by outliers |
| Scalability | ⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ | LR scales better to large datasets |
| Regulatory Compliance | ⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ | LR's interpretability makes it easier to explain to regulators |
| Ease of Updates | ⭐⭐⭐ | ⭐⭐ | ⭐⭐⭐⭐⭐ | LR models can be updated incrementally with new data |
| Bias Mitigation Potential | ⭐⭐⭐ | ⭐⭐ | ⭐⭐⭐⭐⭐ | LR allows for more straightforward bias mitigation strategies |