Bank Marketing Dataset Ethical Analysis

import%20marimo%0A%0A__generated_with%20%3D%20%220.18.4%22%0Aapp%20%3D%20marimo.App(%0A%20%20%20%20width%3D%22full%22%2C%0A%20%20%20%20app_title%3D%22Bank%20Marketing%20Dataset%20Ethical%20Analysis%22%2C%0A)%0A%0A%0A%40app.cell%0Adef%20_(mo)%3A%0A%20%20%20%20mo.vstack(%0A%20%20%20%20%20%20%20%20%5B%0A%20%20%20%20%20%20%20%20%20%20%20%20mo.md(%22%23%20Bank%20Marketing%20Dataset%20Ethical%20Analysis%22)%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20mo.md(%22%23%23%20Project%20by%20Sharan%20Thakur%20GH1031360%22)%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20mo.image(%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20src%3D%22https%3A%2F%2Fraw.githubusercontent.com%2Fc2p-cmd%2FEthicalIssuesOfAI%2Fmain%2Fcover.png%22%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20alt%3D%22Cover%20Image%20for%20Ethics%20in%20AI%20and%20Banking%22%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20width%3D360%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20height%3D360%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20rounded%3DTrue%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20caption%3D%22Cover%20Image%20for%20Ethics%20in%20AI%20and%20Banking%22%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20)%2C%0A%20%20%20%20%20%20%20%20%5D%0A%20%20%20%20)%0A%20%20%20%20return%0A%0A%0A%40app.cell%0Adef%20_(mo)%3A%0A%20%20%20%20mo.md(%22%22%22%0A%20%20%20%20%23%23%20Introduction%0A%20%20%20%20*%20This%20notebook%20is%20assessment%20for%20**M515%20-%20Ethical%20Issues%20of%20AI**.%0A%20%20%20%20*%20This%20notebook%20uses%20the%20**Bank%20Marketing%20Dataset**%20from%20the%20**UCI%20Machine%20Learning%20Repository**.%0A%20%20%20%20*%20The%20dataset%20contains%20information%20about%20direct%20marketing%20campaigns%20of%20a%20Portuguese%20banking%20institution.%0A%20%20%20%20*%20The%20goal%20is%20to%20predict%20whether%20a%20client%20will%20subscribe%20to%20a%20term%20deposit%20based%20on%20various%20features%20and%20understand%20the%20issues%20with%20respect%20to%20bias%20and%20fairness%20in%20AI%20models.%0A%0A%20%20%20%20%23%23%20GitHub%20Repository%20%26%20Dataset%3A%0A%20%20%20%20*%20GitHub%20Code%20Link%3A%20%3Chttps%3A%2F%2Fgithub.com%2Fc2p-cmd%2FEthicalIssuesOfAI%3E%0A%20%20%20%20*%20Dataset%20Link%3A%20%3Chttps%3A%2F%2Farchive.ics.uci.edu%2Fdataset%2F222%2Fbank%2Bmarketing%3E%0A%0A%20%20%20%20%23%23%20Problem%20Statement%0A%20%20%20%20*%20To%20analyze%20the%20**Bank%20Marketing%20Dataset**%20for%20potential%20ethical%20issues%2C%20including%20bias%20and%20fairness%20in%20AI%20models.%0A%20%20%20%20*%20To%20identify%20any%20disparities%20in%20model%20performance%20across%20different%20demographic%20groups.%0A%20%20%20%20%22%22%22)%0A%20%20%20%20return%0A%0A%0A%40app.cell%0Adef%20_()%3A%0A%20%20%20%20import%20marimo%20as%20mo%0A%20%20%20%20import%20pandas%20as%20pd%0A%20%20%20%20import%20numpy%20as%20np%0A%20%20%20%20import%20matplotlib.pyplot%20as%20plt%0A%20%20%20%20import%20seaborn%20as%20sns%0A%20%20%20%20import%20plotly.graph_objects%20as%20go%0A%20%20%20%20import%20plotly.express%20as%20px%0A%20%20%20%20from%20sklearn.model_selection%20import%20train_test_split%0A%20%20%20%20return%20mo%2C%20np%2C%20pd%2C%20plt%2C%20sns%2C%20train_test_split%0A%0A%0A%40app.cell%0Adef%20_(mo)%3A%0A%20%20%20%20mo.md(%22%22%22%0A%20%20%20%20%23%23%20About%20the%20data%3A%0A%20%20%20%20%23%23%23%20Summary%3A%0A%20%20%20%20The%20data%20is%20related%20with%20direct%20marketing%20campaigns%20of%20a%20Portuguese%20banking%20institution.%20The%20marketing%20campaigns%20were%20based%20on%20phone%20calls.%20Often%2C%20more%20than%20one%20contact%20to%20the%20same%20client%20was%20required%2C%20in%20order%20to%20access%20if%20the%20product%20(bank%20term%20deposit)%20would%20be%20('yes')%20or%20not%20('no')%20subscribed.%0A%0A%20%20%20%20There%20are%20four%20datasets%3A%0A%20%20%20%201)%20bank-additional-full.csv%20with%20all%20examples%20(41188)%20and%2020%20inputs%2C%20ordered%20by%20date%20(from%20May%202008%20to%20November%202010)%2C%20very%20close%20to%20the%20data%20analyzed%20in%20%5BMoro%20et%20al.%2C%202014%5D%0A%20%20%20%202)%20bank-additional.csv%20with%2010%25%20of%20the%20examples%20(4119)%2C%20randomly%20selected%20from%201)%2C%20and%2020%20inputs.%0A%20%20%20%203)%20bank-full.csv%20with%20all%20examples%20and%2017%20inputs%2C%20ordered%20by%20date%20(older%20version%20of%20this%20dataset%20with%20less%20inputs).%0A%20%20%20%204)%20bank.csv%20with%2010%25%20of%20the%20examples%20and%2017%20inputs%2C%20randomly%20selected%20from%203%20(older%20version%20of%20this%20dataset%20with%20less%20inputs).%0A%20%20%20%20The%20smallest%20datasets%20are%20provided%20to%20test%20more%20computationally%20demanding%20machine%20learning%20algorithms%20(e.g.%2C%20SVM).%0A%0A%20%20%20%20The%20classification%20goal%20is%20to%20predict%20if%20the%20client%20will%20subscribe%20(yes%2Fno)%20a%20term%20deposit%20(variable%20y).%0A%0A%20%20%20%20%23%23%23%20Variable%20Info%3A%0A%20%20%20%20Input%20variables%3A%0A%20%20%20%201.%20%60age%60%20(numeric)%0A%20%20%20%202.%20%60job%60%20%3A%20type%20of%20job%20(categorical%3A%20%22admin.%22%2C%22unknown%22%2C%22unemployed%22%2C%22management%22%2C%22housemaid%22%2C%22entrepreneur%22%2C%22student%22%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%22blue-collar%22%2C%22self-employed%22%2C%22retired%22%2C%22technician%22%2C%22services%22)%0A%20%20%20%203.%20%60marital%60%20%3A%20marital%20status%20(categorical%3A%20%22married%22%2C%22divorced%22%2C%22single%22%3B%20note%3A%20%22divorced%22%20means%20divorced%20or%20widowed)%0A%20%20%20%204.%20%60education%60%20(categorical%3A%20%22unknown%22%2C%22secondary%22%2C%22primary%22%2C%22tertiary%22)%0A%20%20%20%205.%20%60default%60%3A%20has%20credit%20in%20default%3F%20(binary%3A%20%22yes%22%2C%22no%22)%0A%20%20%20%206.%20%60balance%60%3A%20average%20yearly%20balance%2C%20in%20euros%20(numeric)%0A%20%20%20%207.%20%60housing%60%3A%20has%20housing%20loan%3F%20(binary%3A%20%22yes%22%2C%22no%22)%0A%20%20%20%208.%20%60loan%60%3A%20has%20personal%20loan%3F%20(binary%3A%20%22yes%22%2C%22no%22)%0A%20%20%20%209.%20%60contact%60%3A%20contact%20communication%20type%20(categorical%3A%20%22unknown%22%2C%22telephone%22%2C%22cellular%22)%0A%20%20%20%2010.%20%60day%60%3A%20last%20contact%20day%20of%20the%20month%20(numeric)%0A%20%20%20%2011.%20%60month%60%3A%20last%20contact%20month%20of%20year%20(categorical%3A%20%22jan%22%2C%20%22feb%22%2C%20%22mar%22%2C%20...%2C%20%22nov%22%2C%20%22dec%22)%0A%20%20%20%2012.%20%60duration%60%3A%20last%20contact%20duration%2C%20in%20seconds%20(numeric)%0A%20%20%20%2013.%20%60campaign%60%3A%20number%20of%20contacts%20performed%20during%20this%20campaign%20and%20for%20this%20client%20(numeric%2C%20includes%20last%20contact)%0A%20%20%20%2014.%20%60pdays%60%3A%20number%20of%20days%20that%20passed%20by%20after%20the%20client%20was%20last%20contacted%20from%20a%20previous%20campaign%20(numeric%2C%20-1%20means%20client%20was%20not%20previously%20contacted)%0A%20%20%20%2015.%20%60previous%60%3A%20number%20of%20contacts%20performed%20before%20this%20campaign%20and%20for%20this%20client%20(numeric)%0A%20%20%20%2016.%20%60poutcome%60%3A%20outcome%20of%20the%20previous%20marketing%20campaign%20(categorical%3A%20%22unknown%22%2C%22other%22%2C%22failure%22%2C%22success%22)%0A%0A%20%20%20%20Output%20variable%20(desired%20target)%3A%0A%20%20%20%2017.%20%60y%60%20-%20has%20the%20client%20subscribed%20a%20term%20deposit%3F%20(binary%3A%20%22yes%22%2C%22no%22)%0A%20%20%20%20%22%22%22)%0A%20%20%20%20return%0A%0A%0A%40app.cell%0Adef%20_(mo)%3A%0A%20%20%20%20mo.md(%22%22%22%0A%20%20%20%20%23%23%20Data%20Loading%20%26%20Preprocessing%0A%20%20%20%20%22%22%22)%0A%20%20%20%20return%0A%0A%0A%40app.cell%0Adef%20_(pd)%3A%0A%20%20%20%20df%20%3D%20pd.read_csv(%0A%20%20%20%20%20%20%20%20%22https%3A%2F%2Fraw.githubusercontent.com%2Fc2p-cmd%2FEthicalIssuesOfAI%2Frefs%2Fheads%2Fmain%2Fbank_marketing_data.csv%22%0A%20%20%20%20)%0A%20%20%20%20df%0A%20%20%20%20return%20(df%2C)%0A%0A%0A%40app.cell%0Adef%20_(df%2C%20mo)%3A%0A%20%20%20%20mo.md(f%22%22%22%23%23%23%20**Observation**%20The%20dataset%20has%20%7Blen(df)%7D%20samples%20with%20%7Blen(df.columns)%7D%20columns.%22%22%22)%0A%20%20%20%20return%0A%0A%0A%40app.cell%0Adef%20_(df)%3A%0A%20%20%20%20df.info(show_counts%3DTrue)%0A%20%20%20%20return%0A%0A%0A%40app.cell%0Adef%20_(df%2C%20pd)%3A%0A%20%20%20%20pd.DataFrame(df.isnull().sum()%2C%20columns%3D%5B%22Count%22%5D).T%0A%20%20%20%20return%0A%0A%0A%40app.cell%0Adef%20_(df%2C%20mo)%3A%0A%20%20%20%20mo.md(%0A%20%20%20%20%20%20%20%20f%22%22%22%0A%20%20%20%20%23%23%23%20**Observation**%20There%20are%20missing%20values%20in%20the%20dataset.%0A%20%20%20%20*%20%60%7B%22%2C%20%22.join(df.isnull().sum()%5Bdf.isnull().sum()%20!%3D%200%5D.index.tolist())%7D%60%20columns%20have%20missing%20values.%0A%20%20%20%20%22%22%22%0A%20%20%20%20)%0A%20%20%20%20return%0A%0A%0A%40app.cell%0Adef%20_(df)%3A%0A%20%20%20%20%23%20Handling%20missing%20values%20via%20imputation%0A%20%20%20%20df%5Bdf.isnull().sum()%5Bdf.isnull().sum()%20!%3D%200%5D.index.tolist()%5D%0A%20%20%20%20return%0A%0A%0A%40app.cell%0Adef%20_(mo)%3A%0A%20%20%20%20mo.md(%22%22%22%0A%20%20%20%20%23%23%23%20**Imputation%20Strategy**%0A%20%20%20%20*%20For%20%60job%60%20column%20we%20will%20mark%20missing%20values%20as%20'unknown'.%0A%20%20%20%20*%20For%20%60education%60%20column%20we%20will%20mark%20missing%20values%20as%20'unknown'.%0A%20%20%20%20*%20We%20will%20drop%20%60contact%60%20column%20as%20it%20has%20too%20many%20missing%20values.%0A%20%20%20%20*%20For%20%60poutcome%60%20column%20we%20will%20mark%20missing%20values%20as%20'not-contacted'.%0A%20%20%20%20%22%22%22)%0A%20%20%20%20return%0A%0A%0A%40app.cell%0Adef%20_(df%2C%20pd)%3A%0A%20%20%20%20def%20clean_data(df%3A%20pd.DataFrame)%20-%3E%20pd.DataFrame%3A%0A%20%20%20%20%20%20%20%20df%5B%22job%22%5D%20%3D%20df%5B%22job%22%5D.fillna(%22unknown%22)%0A%20%20%20%20%20%20%20%20df%5B%22education%22%5D%20%3D%20df%5B%22education%22%5D.fillna(%22unknown%22)%0A%20%20%20%20%20%20%20%20df%20%3D%20df.drop(columns%3D%5B%22contact%22%5D)%0A%20%20%20%20%20%20%20%20df%5B%22poutcome%22%5D%20%3D%20df%5B%22poutcome%22%5D.fillna(%22not-contacted%22)%0A%20%20%20%20%20%20%20%20return%20df%0A%0A%0A%20%20%20%20cleaned_df%20%3D%20df.pipe(clean_data)%0A%20%20%20%20cleaned_df%0A%20%20%20%20return%20(cleaned_df%2C)%0A%0A%0A%40app.cell%0Adef%20_(cleaned_df%2C%20pd)%3A%0A%20%20%20%20pd.DataFrame(cleaned_df.isnull().sum()).T%0A%20%20%20%20return%0A%0A%0A%40app.cell%0Adef%20_(cleaned_df%2C%20mo)%3A%0A%20%20%20%20mo.md(f%22%22%22%23%23%23%20**Observation**%20After%20cleaning%2C%20the%20dataset%20has%20%7Blen(cleaned_df)%7D%20samples%20with%20%7Blen(cleaned_df.columns)%7D%20columns%20and%20no%20missing%20values.%22%22%22)%0A%20%20%20%20return%0A%0A%0A%40app.cell%0Adef%20_(mo)%3A%0A%20%20%20%20mo.md(%22%22%22%0A%20%20%20%20%23%23%20Exploratory%20Data%20Analysis%20(EDA)%0A%20%20%20%20%22%22%22)%0A%20%20%20%20return%0A%0A%0A%40app.cell%0Adef%20_(cleaned_df%2C%20mo)%3A%0A%20%20%20%20features%20%3D%20cleaned_df.drop(columns%3D%22y%22).columns.tolist()%0A%20%20%20%20mo.ui.table(features%2C%20label%3D%22%23%23%20Features%20in%20the%20Dataset%22)%0A%20%20%20%20return%20(features%2C)%0A%0A%0A%40app.cell%0Adef%20_(cleaned_df%2C%20features%2C%20mo%2C%20np)%3A%0A%20%20%20%20mo.ui.table(%0A%20%20%20%20%20%20%20%20cleaned_df%5Bfeatures%5D.describe(include%3D%5Bnp.number%5D)%2C%0A%20%20%20%20%20%20%20%20label%3D%22%23%23%20Statistical%20Summary%20of%20Numerical%20Features%22%2C%0A%20%20%20%20)%0A%20%20%20%20return%0A%0A%0A%40app.cell%0Adef%20_(cleaned_df%2C%20features%2C%20mo)%3A%0A%20%20%20%20mo.ui.table(%0A%20%20%20%20%20%20%20%20cleaned_df%5Bfeatures%5D.describe(include%3D%5B%22object%22%5D)%2C%0A%20%20%20%20%20%20%20%20label%3D%22%23%23%20Statistical%20Summary%20of%20Categorical%20Features%22%2C%0A%20%20%20%20)%0A%20%20%20%20return%0A%0A%0A%40app.cell%0Adef%20_(cleaned_df%2C%20features%2C%20plt%2C%20sns)%3A%0A%20%20%20%20plt.figure(figsize%3D(24%2C%2026))%0A%20%20%20%20for%20f%20in%20features%3A%0A%20%20%20%20%20%20%20%20if%20cleaned_df%5Bf%5D.dtype%20%3D%3D%20%22object%22%3A%0A%20%20%20%20%20%20%20%20%20%20%20%20plt.subplot(4%2C%204%2C%20features.index(f)%20%2B%201)%0A%20%20%20%20%20%20%20%20%20%20%20%20plt.pie(%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20cleaned_df%5Bf%5D.value_counts()%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20autopct%3D%22%251.1f%25%25%22%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20labels%3Dcleaned_df%5Bf%5D.value_counts().index%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20colors%3Dsns.color_palette(%22pastel%22)%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20)%0A%20%20%20%20%20%20%20%20%20%20%20%20plt.title(f%22Distribution%20of%20%7Bf%7D%22)%0A%20%20%20%20%20%20%20%20%20%20%20%20plt.xticks(rotation%3D45)%0A%20%20%20%20%20%20%20%20%20%20%20%20plt.grid()%0A%20%20%20%20%20%20%20%20else%3A%0A%20%20%20%20%20%20%20%20%20%20%20%20plt.subplot(4%2C%204%2C%20features.index(f)%20%2B%201)%0A%20%20%20%20%20%20%20%20%20%20%20%20sns.histplot(cleaned_df%5Bf%5D%2C%20bins%3D30%2C%20kde%3DTrue%2C%20color%3D%22skyblue%22)%0A%20%20%20%20%20%20%20%20%20%20%20%20plt.title(f%22Distribution%20of%20%7Bf%7D%22)%0A%20%20%20%20%20%20%20%20%20%20%20%20plt.grid()%0A%20%20%20%20plt.gcf()%0A%20%20%20%20return%0A%0A%0A%40app.cell%0Adef%20_(mo)%3A%0A%20%20%20%20mo.md(f%22%22%22%0A%20%20%20%20%23%23%23%20EDA%20Observations%3A%0A%0A%20%20%20%20*%20**%60age%60**%3A%20The%20distribution%20is%20right-skewed%2C%20with%20the%20majority%20of%20clients%20aged%20between%2030%20and%2060.%0A%20%20%20%20*%20**%60job%60**%3A%20%22Blue-collar%22%20(21.5%25)%2C%20%22management%22%20(20.9%25)%2C%20and%20%22technician%22%20(16.8%25)%20are%20the%20three%20most%20common%20job%20types.%20%22Student%22%20(2.1%25)%20and%20%22unemployed%22%20(2.9%25)%20are%20among%20the%20least%20represented.%0A%20%20%20%20*%20**%60marital%60**%3A%20Most%20clients%20are%20%22married%22%20(60.1%25)%2C%20followed%20by%20%22single%22%20(28.3%25)%20and%20%22divorced%22%20(11.5%25).%0A%20%20%20%20*%20**%60education%60**%3A%20%22Secondary%22%20(51.3%25)%20and%20%22tertiary%22%20(29.4%25)%20education%20levels%20make%20up%20the%20vast%20majority%20of%20the%20dataset.%0A%20%20%20%20*%20**%60default%60**%3A%20An%20overwhelming%20majority%20of%20clients%20(98.2%25)%20have%20no%20credit%20in%20default.%0A%20%20%20%20*%20**%60balance%60**%3A%20The%20distribution%20is%20extremely%20right-skewed%2C%20indicating%20that%20most%20clients%20have%20a%20low%20balance%2C%20while%20a%20few%20outliers%20have%20very%20high%20balances.%0A%20%20%20%20*%20**%60housing%60**%3A%20A%20slight%20majority%20of%20clients%20(55.6%25)%20do%20not%20have%20a%20housing%20loan.%0A%20%20%20%20*%20**%60loan%60**%3A%20The%20vast%20majority%20of%20clients%20(84.0%25)%20do%20not%20have%20a%20personal%20loan.%0A%20%20%20%20*%20**%60day_of_week%60**%3A%20The%20distribution%20of%20calls%20appears%20relatively%20uniform%20across%20the%20days%20of%20the%20week%2C%20with%20slightly%20higher%20counts%20mid-week.%0A%20%20%20%20*%20**%60month%60**%3A%20Marketing%20activity%20is%20not%20uniform.%20It%20peaks%20heavily%20in%20%22May%22%20(28.4%25)%2C%20followed%20by%20%22July%22%20(15.3%25)%2C%20%22Aug%22%20(13.8%25)%2C%20and%20%22Jun%22%20(11.8%25).%0A%20%20%20%20*%20**%60duration%60**%3A%20The%20call%20duration%20is%20heavily%20right-skewed%2C%20showing%20that%20most%20calls%20are%20short%2C%20with%20a%20long%20tail%20of%20longer-duration%20calls.%0A%20%20%20%20*%20**%60campaign%60**%3A%20This%20feature%20is%20also%20very%20right-skewed.%20Most%20clients%20are%20contacted%20only%20a%20few%20times%20(1-3)%2C%20while%20a%20small%20number%20of%20clients%20are%20contacted%20many%20times.%0A%20%20%20%20*%20**%60pdays%60**%3A%20The%20histogram%20is%20dominated%20by%20a%20single%20value%20(likely%20-1%2C%20indicating%20not%20previously%20contacted)%2C%20with%20very%20few%20clients%20having%20been%20contacted%20recently.%0A%20%20%20%20*%20**%60previous%60**%3A%20This%20distribution%20is%20extremely%20skewed%2C%20with%20the%20vast%20majority%20of%20clients%20having%200%20previous%20contacts.%0A%20%20%20%20*%20**%60poutcome%60**%3A%20The%20outcome%20of%20previous%20campaigns%20is%20%22unknown%22%20for%2081.7%25%20of%20clients%2C%20which%20corresponds%20to%20the%20%60previous%60%20and%20%60pdays%60%20plots.%0A%0A%20%20%20%20%23%23%23%20Summary%3A%0A%0A%20%20%20%20%23%23%23%20**%60age%60**%2C%20**%60marital%60**%2C%20**%60job%60**%20and%20**%60education%60**%20are%20the%20%22sensitive%20attributes.%22%0A%20%20%20%20%22%22%22)%0A%20%20%20%20return%0A%0A%0A%40app.cell%0Adef%20_(mo)%3A%0A%20%20%20%20mo.md(r%22%22%22%0A%20%20%20%20%23%23%20Analysis%20of%20sensitive%20targets%20with%20target%20variable%0A%20%20%20%20%22%22%22)%0A%20%20%20%20return%0A%0A%0A%40app.cell%0Adef%20_(cleaned_df%2C%20plt%2C%20sns)%3A%0A%20%20%20%20plt.figure(figsize%3D(18%2C%2012))%0A%20%20%20%20sensitive_attributes%20%3D%20%5B%22age%22%2C%20%22marital%22%2C%20%22job%22%2C%20%22education%22%5D%0A%20%20%20%20for%20_attr%20in%20sensitive_attributes%3A%0A%20%20%20%20%20%20%20%20plt.subplot(2%2C%202%2C%20sensitive_attributes.index(_attr)%20%2B%201)%0A%20%20%20%20%20%20%20%20if%20cleaned_df%5B_attr%5D.dtype%20%3D%3D%20%22object%22%3A%0A%20%20%20%20%20%20%20%20%20%20%20%20sns.countplot(data%3Dcleaned_df%2C%20x%3D_attr%2C%20hue%3D%22y%22%2C%20palette%3D%22Set2%22)%0A%20%20%20%20%20%20%20%20%20%20%20%20plt.xticks(rotation%3D45)%0A%20%20%20%20%20%20%20%20else%3A%0A%20%20%20%20%20%20%20%20%20%20%20%20sns.histplot(%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20data%3Dcleaned_df%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20x%3D_attr%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20hue%3D%22y%22%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20multiple%3D%22stack%22%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20bins%3D30%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20palette%3D%22Set2%22%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20)%0A%20%20%20%20%20%20%20%20plt.title(f%22%7B_attr%7D%20vs%20Target%20Variable%22)%0A%20%20%20%20%20%20%20%20if%20_attr%20%3D%3D%20%22age%22%3A%0A%20%20%20%20%20%20%20%20%20%20%20%20plt.ylabel(%22Count%22)%0A%20%20%20%20%20%20%20%20else%3A%0A%20%20%20%20%20%20%20%20%20%20%20%20plt.ylabel(%22%22)%0A%20%20%20%20%20%20%20%20plt.grid()%0A%20%20%20%20plt.gcf()%0A%20%20%20%20return%20(sensitive_attributes%2C)%0A%0A%0A%40app.cell%0Adef%20_(mo)%3A%0A%20%20%20%20mo.md(%22%22%22%0A%20%20%20%20%23%23%23%20Below%20are%20the%20main%20insights%20from%20the%20plots%2C%20highlighting%20how%20each%20sensitive%20attribute%20relates%20to%20the%20target%20variable%20%60y%60.%0A%0A%20%20%20%20%23%23%23%20%60age%60%20vs%20Target%20Variable%0A%20%20%20%20*%20**Absolute%20Counts%3A**%20Most%20clients%E2%80%94both%20those%20who%20subscribed%20(%22yes%22)%20and%20those%20who%20did%20not%20(%22no%22)%E2%80%94fall%20within%20the%2030%E2%80%9350%20age%20range.%0A%20%20%20%20*%20**Subscription%20Rate%20(Proportion)%3A**%20The%20relative%20share%20of%20%22yes%22%20responses%20is%20highest%20among%20younger%20clients%20(around%2020%E2%80%9330)%20and%20older%20clients%20(over%2060).%20The%20middle-aged%20group%20(30%E2%80%9350)%20shows%20a%20lower%20overall%20subscription%20rate.%0A%0A%20%20%20%20%23%23%23%20%60marital%60%20vs%20Target%20Variable%0A%20%20%20%20*%20**Absolute%20Counts%3A**%20Married%20clients%20make%20up%20the%20largest%20share%20of%20both%20positive%20(%22yes%22)%20and%20negative%20(%22no%22)%20outcomes.%0A%20%20%20%20*%20**Subscription%20Rate%20(Proportion)%3A**%20Single%20clients%20show%20a%20higher%20likelihood%20of%20subscription%20compared%20to%20married%20or%20divorced%20clients.%0A%0A%20%20%20%20%23%23%23%20%60job%60%20vs%20Target%20Variable%0A%20%20%20%20*%20**Absolute%20Counts%3A**%20The%20majority%20of%20clients%20belong%20to%20the%20%22blue-collar%2C%22%20%22management%2C%22%20or%20%22technician%22%20categories.%0A%20%20%20%20*%20**Subscription%20Rate%20(Proportion)%3A**%20Subscription%20likelihood%20differs%20widely%20across%20occupations%3A%0A%20%20%20%20%20%20%20%20*%20**Higher%20Rates%3A**%20%22Student%22%20and%20%22retired%22%20groups%20have%20the%20highest%20proportion%20of%20%22yes%22%20responses.%0A%20%20%20%20%20%20%20%20*%20**Lower%20Rates%3A**%20%22Blue-collar%22%20and%20%22entrepreneur%22%20groups%20show%20the%20lowest%20proportion%20of%20subscriptions.%0A%20%20%20%20%20%20%20%20*%20This%20indicates%20a%20notable%20disparity%20tied%20to%20socio-economic%20status.%0A%0A%20%20%20%20%23%23%23%20%60education%60%20vs%20Target%20Variable%0A%20%20%20%20*%20**Absolute%20Counts%3A**%20Most%20clients%20have%20%22secondary%22%20or%20%22tertiary%22%20education.%0A%20%20%20%20*%20**Subscription%20Rate%20(Proportion)%3A**%20The%20%22tertiary%22%20group%20shows%20a%20higher%20rate%20of%20subscriptions%20than%20the%20%22secondary%22%20and%20%22primary%22%20groups%2C%20while%20the%20%22unknown%22%20category%20also%20performs%20relatively%20well.%0A%20%20%20%20%22%22%22)%0A%20%20%20%20return%0A%0A%0A%40app.cell%0Adef%20_(mo)%3A%0A%20%20%20%20mo.md(r%22%22%22%0A%20%20%20%20%23%23%20Analysis%20of%20non-sensitive%20targets%20with%20target%20variable%0A%20%20%20%20%22%22%22)%0A%20%20%20%20return%0A%0A%0A%40app.cell%0Adef%20_(cleaned_df%2C%20plt%2C%20sensitive_attributes%2C%20sns)%3A%0A%20%20%20%20plt.figure(figsize%3D(21%2C%2018))%0A%20%20%20%20non_sensitive_attributes%20%3D%20cleaned_df.drop(%0A%20%20%20%20%20%20%20%20columns%3D%5B%22y%22%5D%20%2B%20sensitive_attributes%0A%20%20%20%20).columns.tolist()%0A%20%20%20%20for%20_attr%20in%20non_sensitive_attributes%3A%0A%20%20%20%20%20%20%20%20plt.subplot(4%2C%203%2C%20non_sensitive_attributes.index(_attr)%20%2B%201)%0A%20%20%20%20%20%20%20%20if%20cleaned_df%5B_attr%5D.dtype%20%3D%3D%20%22object%22%3A%0A%20%20%20%20%20%20%20%20%20%20%20%20sns.countplot(data%3Dcleaned_df%2C%20x%3D_attr%2C%20hue%3D%22y%22%2C%20palette%3D%22Set2%22)%0A%20%20%20%20%20%20%20%20else%3A%0A%20%20%20%20%20%20%20%20%20%20%20%20sns.histplot(%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20data%3Dcleaned_df%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20x%3D_attr%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20hue%3D%22y%22%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20multiple%3D%22stack%22%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20bins%3D30%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20palette%3D%22Set2%22%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20)%0A%20%20%20%20%20%20%20%20plt.title(f%22%7B_attr%7D%20vs%20Target%20Variable%22)%0A%20%20%20%20%20%20%20%20if%20_attr%20%3D%3D%20%22default%22%3A%0A%20%20%20%20%20%20%20%20%20%20%20%20plt.ylabel(%22Count%22)%0A%20%20%20%20%20%20%20%20else%3A%0A%20%20%20%20%20%20%20%20%20%20%20%20plt.ylabel(%22%22)%0A%20%20%20%20%20%20%20%20plt.xlabel(%22%22)%0A%20%20%20%20%20%20%20%20plt.grid()%0A%20%20%20%20plt.gcf()%0A%20%20%20%20return%0A%0A%0A%40app.cell(hide_code%3DTrue)%0Adef%20_(mo)%3A%0A%20%20%20%20mo.md(%22%22%22%0A%20%20%20%20%23%23%23%20Below%20are%20the%20main%20insights%20for%20the%20non-sensitive%20features%2C%20highlighting%20how%20they%20relate%20to%20the%20target%20variable%20%60y%60.%0A%0A%20%20%20%20%23%23%23%20%60default%60%20vs%20Target%20Variable%0A%20%20%20%20*%20**Observation%3A**%20Nearly%20all%20clients%20do%20not%20have%20credit%20in%20default.%20The%20small%20subset%20of%20clients%20with%20a%20default%20shows%20a%20slightly%20lower%20subscription%20rate.%0A%0A%20%20%20%20%23%23%23%20%60balance%60%20vs%20Target%20Variable%0A%20%20%20%20*%20**Observation%3A**%20Most%20clients%20are%20concentrated%20at%20lower%20balance%20levels%2C%20where%20most%20outcomes%20also%20occur.%20However%2C%20the%20*proportion*%20of%20subscriptions%20tends%20to%20rise%20with%20higher%20balances%2C%20suggesting%20that%20clients%20with%20greater%20financial%20resources%20are%20more%20likely%20to%20subscribe.%0A%0A%20%20%20%20%23%23%23%20%60housing%60%20vs%20Target%20Variable%0A%20%20%20%20*%20**Observation%3A**%20Clients%20without%20a%20housing%20loan%20show%20a%20noticeably%20higher%20subscription%20rate%20compared%20to%20those%20who%20have%20one.%0A%0A%20%20%20%20%23%23%23%20%60loan%60%20vs%20Target%20Variable%0A%20%20%20%20*%20**Observation%3A**%20Similar%20to%20%60housing%60%2C%20clients%20without%20a%20personal%20loan%20are%20far%20more%20likely%20to%20subscribe%20than%20those%20with%20an%20existing%20loan.%0A%0A%20%20%20%20%23%23%23%20%60day_of_week%60%20vs%20Target%20Variable%0A%20%20%20%20*%20**Observation%3A**%20Subscription%20rates%20appear%20fairly%20stable%20across%20all%20days%20of%20the%20week%2C%20indicating%20that%20this%20feature%20may%20have%20limited%20predictive%20power.%0A%0A%20%20%20%20%23%23%23%20%60month%60%20vs%20Target%20Variable%0A%20%20%20%20*%20**Observation%3A**%20The%20subscription%20*rate*%20varies%20strongly%20by%20month.%0A%20%20%20%20%20%20%20%20*%20**Higher%20Rates%3A**%20March%2C%20September%2C%20October%2C%20and%20December%20show%20a%20high%20proportion%20of%20subscriptions%20despite%20relatively%20low%20call%20volumes.%0A%20%20%20%20%20%20%20%20*%20**Lower%20Rates%3A**%20May%20has%20the%20highest%20call%20volume%20but%20one%20of%20the%20lowest%20subscription%20rates.%0A%0A%20%20%20%20%23%23%23%20%60duration%60%20vs%20Target%20Variable%0A%20%20%20%20*%20**Observation%3A**%20Call%20duration%20is%20the%20most%20influential%20feature.%20Longer%20calls%20correspond%20to%20a%20much%20higher%20proportion%20of%20%22yes%22%20responses%2C%20while%20very%20short%20calls%20are%20mostly%20%22no.%22%0A%20%20%20%20*%20**Critical%20Note%3A**%20This%20maybe%20represents%20**data%20leakage**.%20Since%20duration%20is%20only%20known%20*after*%20the%20call%20ends%2C%20it%20cannot%20be%20used%20for%20prediction.%0A%0A%20%20%20%20%23%23%23%20%60campaign%60%20vs%20Target%20Variable%0A%20%20%20%20*%20**Observation%3A**%20The%20highest%20subscription%20rate%20occurs%20on%20the%20first%20contact%2C%20then%20declines%20sharply%20as%20the%20number%20of%20contacts%20within%20the%20same%20campaign%20increases.%0A%0A%20%20%20%20%23%23%23%20%60pdays%60%20vs%20Target%20Variable%0A%20%20%20%20*%20**Observation%3A**%20Most%20clients%20were%20not%20contacted%20previously%20(represented%20by%20the%20large%20%22999%22%20category).%20Among%20those%20who%20were%2C%20more%20recent%20contacts%20(lower%20%60pdays%60%20values)%20tend%20to%20correlate%20with%20higher%20subscription%20rates.%0A%0A%20%20%20%20%23%23%23%20%60previous%60%20vs%20Target%20Variable%0A%20%20%20%20*%20**Observation%3A**%20The%20majority%20of%20clients%20have%20no%20previous%20contact%20history.%20For%20those%20who%20do%20(even%20one%20or%20two%20prior%20interactions)%2C%20the%20subscription%20rate%20is%20noticeably%20higher.%0A%0A%20%20%20%20%23%23%23%20%60poutcome%60%20vs%20Target%20Variable%0A%20%20%20%20*%20**Observation%3A**%20This%20feature%20is%20a%20strong%20predictor.%0A%20%20%20%20%20%20%20%20*%20Clients%20with%20a%20previous%20campaign%20outcome%20of%20%22success%22%20show%20a%20very%20high%20subscription%20rate.%0A%20%20%20%20%20%20%20%20*%20Those%20with%20outcomes%20of%20%22failure%22%20or%20%22other%22%20have%20lower%20rates.%0A%20%20%20%20%20%20%20%20*%20Clients%20with%20an%20%22unknown%22%20outcome%20(the%20majority)%20show%20the%20lowest%20subscription%20rate%20overall.%0A%20%20%20%20%22%22%22)%0A%20%20%20%20return%0A%0A%0A%40app.cell%0Adef%20_(mo)%3A%0A%20%20%20%20mo.md(%22%22%22%0A%20%20%20%20%23%23%20Data%20Preparation%0A%20%20%20%20%22%22%22)%0A%20%20%20%20return%0A%0A%0A%40app.cell%0Adef%20_(cleaned_df%2C%20train_test_split)%3A%0A%20%20%20%20X%20%3D%20cleaned_df.drop(columns%3D%5B%22duration%22%2C%20%22day_of_week%22%2C%20%22default%22%2C%20%22y%22%5D)%0A%20%20%20%20y%20%3D%20cleaned_df%5B%22y%22%5D%0A%0A%20%20%20%20X_train%2C%20X_test%2C%20y_train%2C%20y_test%20%3D%20train_test_split(%0A%20%20%20%20%20%20%20%20X%2C%0A%20%20%20%20%20%20%20%20y%2C%0A%20%20%20%20%20%20%20%20test_size%3D0.2%2C%0A%20%20%20%20%20%20%20%20random_state%3D19%2C%0A%20%20%20%20%20%20%20%20stratify%3Dy%2C%0A%20%20%20%20)%0A%20%20%20%20return%20X_test%2C%20X_train%2C%20y%2C%20y_test%2C%20y_train%0A%0A%0A%40app.cell%0Adef%20_(X_test%2C%20X_train%2C%20mo%2C%20y_train)%3A%0A%20%20%20%20mo.md(%0A%20%20%20%20%20%20%20%20f%22%22%22%0A%20%20%20%20%23%23%23%20Data%20Sizes%0A%0A%20%20%20%20*%20X%20train%20shape%3A%20%60%7BX_train.shape%7D%60%0A%20%20%20%20*%20X%20test%20shape%3A%20%60%7BX_test.shape%7D%60%0A%20%20%20%20*%20y%20train%20shape%3A%20%60%7By_train.shape%7D%60%0A%20%20%20%20*%20y%20test%20shape%3A%20%60%7BX_test.shape%7D%60%0A%20%20%20%20%22%22%22%0A%20%20%20%20)%0A%20%20%20%20return%0A%0A%0A%40app.cell%0Adef%20_(X_train%2C%20mo%2C%20np)%3A%0A%20%20%20%20numerical_features%20%3D%20X_train.select_dtypes(%0A%20%20%20%20%20%20%20%20include%3D%5Bnp.number%5D%0A%20%20%20%20).columns.tolist()%0A%20%20%20%20categorical_features%20%3D%20X_train.select_dtypes(%0A%20%20%20%20%20%20%20%20exclude%3D%5Bnp.number%5D%0A%20%20%20%20).columns.tolist()%0A%0A%20%20%20%20mo.ui.table(%0A%20%20%20%20%20%20%20%20%7B%0A%20%20%20%20%20%20%20%20%20%20%20%20%22Numerical%20features%22%3A%20numerical_features%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%22Categorical%20features%22%3A%20categorical_features%2C%0A%20%20%20%20%20%20%20%20%7D%2C%0A%20%20%20%20%20%20%20%20label%3D%22Final%20Features%20Choice%22%2C%0A%20%20%20%20)%0A%20%20%20%20return%20categorical_features%2C%20numerical_features%0A%0A%0A%40app.cell%0Adef%20_(plt%2C%20sns%2C%20y_test%2C%20y_train)%3A%0A%20%20%20%20plt.figure(figsize%3D(10%2C%206))%0A%0A%20%20%20%20plt.subplot(1%2C%202%2C%201)%0A%20%20%20%20plt.pie(%0A%20%20%20%20%20%20%20%20y_train.value_counts()%2C%0A%20%20%20%20%20%20%20%20autopct%3D%22%251.1f%25%25%22%2C%0A%20%20%20%20%20%20%20%20labels%3Dy_train.value_counts().index%2C%0A%20%20%20%20%20%20%20%20colors%3Dsns.color_palette(%22pastel%22)%2C%0A%20%20%20%20)%0A%20%20%20%20plt.title(%22Training%20Distribution%20of%20Target%22)%0A%20%20%20%20plt.grid()%0A%0A%20%20%20%20plt.subplot(1%2C%202%2C%202)%0A%20%20%20%20plt.pie(%0A%20%20%20%20%20%20%20%20y_test.value_counts()%2C%0A%20%20%20%20%20%20%20%20autopct%3D%22%251.1f%25%25%22%2C%0A%20%20%20%20%20%20%20%20labels%3Dy_train.value_counts().index%2C%0A%20%20%20%20%20%20%20%20colors%3Dsns.color_palette(%22pastel%22)%2C%0A%20%20%20%20)%0A%20%20%20%20plt.title(%22Test%20Distribution%20of%20Target%22)%0A%20%20%20%20plt.grid()%0A%0A%20%20%20%20plt.gcf()%0A%20%20%20%20return%0A%0A%0A%40app.cell%0Adef%20_(mo)%3A%0A%20%20%20%20mo.md(%22%22%22%0A%20%20%20%20%23%23%23%23%20**Observation**%3A%20Both%20Training%20and%20testing%20label%20has%2088%25%20no%20and%2012%25%20yes%20labels%0A%20%20%20%20%22%22%22)%0A%20%20%20%20return%0A%0A%0A%40app.cell%0Adef%20_(mo)%3A%0A%20%20%20%20mo.md(%22%22%22%0A%20%20%20%20%23%23%23%20Due%20to%20data%20imbalance%20in%20target%20we%20need%20to%20compute%20class%20weights%20for%20model%20to%20perform%20well%0A%20%20%20%20%22%22%22)%0A%20%20%20%20return%0A%0A%0A%40app.cell%0Adef%20_(mo%2C%20np%2C%20y%2C%20y_train)%3A%0A%20%20%20%20from%20sklearn.utils%20import%20compute_class_weight%0A%0A%20%20%20%20class_names%20%3D%20np.unique(y)%0A%20%20%20%20weights%20%3D%20dict(%0A%20%20%20%20%20%20%20%20zip(%0A%20%20%20%20%20%20%20%20%20%20%20%20class_names%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20compute_class_weight(%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20class_weight%3D%22balanced%22%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20y%3Dy_train%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20classes%3Dclass_names%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20)%2C%0A%20%20%20%20%20%20%20%20)%0A%20%20%20%20)%0A%20%20%20%20mo.ui.table(weights%2C%20label%3D%22Class%20Weights%20for%20Imbalanced%20Data%22)%0A%20%20%20%20return%20(weights%2C)%0A%0A%0A%40app.cell%0Adef%20_(categorical_features%2C%20numerical_features%2C%20weights)%3A%0A%20%20%20%20from%20sklearn.linear_model%20import%20LogisticRegression%0A%20%20%20%20from%20sklearn.ensemble%20import%20RandomForestClassifier%0A%20%20%20%20from%20sklearn.svm%20import%20SVC%0A%20%20%20%20from%20sklearn.pipeline%20import%20Pipeline%0A%20%20%20%20from%20sklearn.preprocessing%20import%20MinMaxScaler%2C%20OrdinalEncoder%0A%20%20%20%20from%20sklearn.compose%20import%20ColumnTransformer%0A%0A%20%20%20%20ct%20%3D%20ColumnTransformer(%0A%20%20%20%20%20%20%20%20%5B%0A%20%20%20%20%20%20%20%20%20%20%20%20(%22cat%22%2C%20OrdinalEncoder()%2C%20categorical_features)%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20(%22num%22%2C%20MinMaxScaler()%2C%20numerical_features)%2C%0A%20%20%20%20%20%20%20%20%5D%0A%20%20%20%20)%0A%0A%20%20%20%20svm%20%3D%20Pipeline(%0A%20%20%20%20%20%20%20%20steps%3D%5B%0A%20%20%20%20%20%20%20%20%20%20%20%20(%22preprocessor%22%2C%20ct)%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20(%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%22classifier%22%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20SVC(%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20random_state%3D19%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%23%20gamma%3D%22auto%22%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20class_weight%3Dweights%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20)%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20)%2C%0A%20%20%20%20%20%20%20%20%5D%0A%20%20%20%20)%0A%0A%20%20%20%20random_forest%20%3D%20Pipeline(%0A%20%20%20%20%20%20%20%20steps%3D%5B%0A%20%20%20%20%20%20%20%20%20%20%20%20(%22preprocessor%22%2C%20ct)%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20(%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%22classifier%22%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20RandomForestClassifier(%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20n_estimators%3D300%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20random_state%3D19%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20criterion%3D%22log_loss%22%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20class_weight%3Dweights%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20)%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20)%2C%0A%20%20%20%20%20%20%20%20%5D%0A%20%20%20%20)%0A%0A%20%20%20%20logistic_regression%20%3D%20Pipeline(%0A%20%20%20%20%20%20%20%20steps%3D%5B%0A%20%20%20%20%20%20%20%20%20%20%20%20(%22preprocessor%22%2C%20ct)%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20(%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%22classifier%22%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20LogisticRegression(%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20class_weight%3Dweights%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20random_state%3D19%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20solver%3D%22newton-cholesky%22%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20max_iter%3D10_000%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20)%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20)%2C%0A%20%20%20%20%20%20%20%20%5D%0A%20%20%20%20)%0A%20%20%20%20return%20logistic_regression%2C%20random_forest%2C%20svm%0A%0A%0A%40app.cell%0Adef%20_(logistic_regression%2C%20mo%2C%20random_forest%2C%20svm)%3A%0A%20%20%20%20mo.vstack(%0A%20%20%20%20%20%20%20%20%5Bsvm%2C%20random_forest%2C%20logistic_regression%5D%2C%0A%20%20%20%20%20%20%20%20align%3D%22stretch%22%2C%0A%20%20%20%20%20%20%20%20justify%3D%22center%22%2C%0A%20%20%20%20)%0A%20%20%20%20return%0A%0A%0A%40app.cell%0Adef%20_(X_test%2C%20X_train%2C%20logistic_regression%2C%20mo%2C%20random_forest%2C%20svm%2C%20y_train)%3A%0A%20%20%20%20predictions%20%3D%20%5B%5D%0A%0A%20%20%20%20_bar%20%3D%20mo.status.progress_bar(%0A%20%20%20%20%20%20%20%20%5Bsvm%2C%20random_forest%2C%20logistic_regression%5D%2C%0A%20%20%20%20%20%20%20%20title%3D%22Training%20Models%22%2C%0A%20%20%20%20%20%20%20%20show_eta%3DTrue%2C%0A%20%20%20%20%20%20%20%20show_rate%3DTrue%2C%0A%20%20%20%20)%0A%0A%20%20%20%20for%20_model%20in%20_bar%3A%0A%20%20%20%20%20%20%20%20_model.fit(X_train%2C%20y_train)%0A%20%20%20%20%20%20%20%20predictions.append(_model.predict(X_test))%0A%0A%20%20%20%20mo.md(%22%23%23%23%20Training%20Complete%22)%0A%20%20%20%20return%20(predictions%2C)%0A%0A%0A%40app.cell%0Adef%20_(mo%2C%20pd%2C%20predictions%2C%20y_test)%3A%0A%20%20%20%20from%20sklearn.metrics%20import%20classification_report%0A%0A%20%20%20%20_bar%20%3D%20mo.status.progress_bar(%0A%20%20%20%20%20%20%20%20zip(%5B%22SVM%22%2C%20%22Random%20Forest%22%2C%20%22Logistic%20Regression%22%5D%2C%20predictions)%2C%0A%20%20%20%20%20%20%20%20title%3D%22Training%20Models%22%2C%0A%20%20%20%20%20%20%20%20show_eta%3DTrue%2C%0A%20%20%20%20%20%20%20%20show_rate%3DTrue%2C%0A%20%20%20%20%20%20%20%20total%3D3%2C%0A%20%20%20%20)%0A%0A%20%20%20%20_reports%20%3D%20%5B%5D%0A%0A%20%20%20%20for%20_name%2C%20_preds%20in%20_bar%3A%0A%20%20%20%20%20%20%20%20_reports.append(%0A%20%20%20%20%20%20%20%20%20%20%20%20pd.DataFrame(%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20classification_report(%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20y_test%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20_preds%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20output_dict%3DTrue%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20)%0A%20%20%20%20%20%20%20%20%20%20%20%20)%0A%20%20%20%20%20%20%20%20)%0A%0A%20%20%20%20mo.vstack(_reports)%0A%20%20%20%20return%0A%0A%0A%40app.cell%0Adef%20_(mo%2C%20sensitive_attributes)%3A%0A%20%20%20%20mo.ui.table(sensitive_attributes%2C%20label%3D%22Sensitive%20Attributes%22)%0A%20%20%20%20return%0A%0A%0A%40app.cell%0Adef%20_(X_test%2C%20mo%2C%20pd%2C%20predictions)%3A%0A%20%20%20%20def%20disparate_impact(y_pred%2C%20name%2C%20feature)%3A%0A%20%20%20%20%20%20%20%20eval_df%20%3D%20pd.DataFrame(%0A%20%20%20%20%20%20%20%20%20%20%20%20%7B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20feature%3A%20X_test%5Bfeature%5D%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%22Prediction%22%3A%20y_pred%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%7D%0A%20%20%20%20%20%20%20%20)%0A%20%20%20%20%20%20%20%20disparity%20%3D%20(%0A%20%20%20%20%20%20%20%20%20%20%20%20eval_df.groupby(%5Bfeature%2C%20%22Prediction%22%5D).size().unstack(fill_value%3D0)%0A%20%20%20%20%20%20%20%20)%0A%20%20%20%20%20%20%20%20disparity%5B%22Total%22%5D%20%3D%20disparity.sum(axis%3D1)%0A%20%20%20%20%20%20%20%20disparity%5B%22Proportion%20No%22%5D%20%3D%20(disparity%5B%22no%22%5D%20%2F%20disparity%5B%22Total%22%5D)%20*%20100%0A%20%20%20%20%20%20%20%20disparity%5B%22Proportion%20Yes%22%5D%20%3D%20(disparity%5B%22yes%22%5D%20%2F%20disparity%5B%22Total%22%5D)%20*%20100%0A%20%20%20%20%20%20%20%20return%20mo.ui.table(%0A%20%20%20%20%20%20%20%20%20%20%20%20disparity%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20label%3Df%22%23%23%20Disparate%20Impact%20on%20**%7Bfeature%7D**%20for%20**%7Bname%7D**%22%2C%0A%20%20%20%20%20%20%20%20)%0A%0A%0A%20%20%20%20_tables%20%3D%20%5B%5D%0A%0A%20%20%20%20for%20_name%2C%20_preds%20in%20zip(%0A%20%20%20%20%20%20%20%20%5B%22SVM%22%2C%20%22Random%20Forest%22%2C%20%22Logistic%20Regression%22%5D%2C%0A%20%20%20%20%20%20%20%20predictions%2C%0A%20%20%20%20)%3A%0A%20%20%20%20%20%20%20%20_tables.append(%0A%20%20%20%20%20%20%20%20%20%20%20%20mo.hstack(%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%5B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20disparate_impact(_preds%2C%20_name%2C%20feature%3D%22marital%22)%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20disparate_impact(_preds%2C%20_name%2C%20feature%3D%22education%22)%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%5D%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20justify%3D%22space-between%22%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20align%3D%22stretch%22%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20)%0A%20%20%20%20%20%20%20%20)%0A%0A%20%20%20%20mo.vstack(_tables)%0A%20%20%20%20return%0A%0A%0A%40app.cell(hide_code%3DTrue)%0Adef%20_(mo)%3A%0A%20%20%20%20mo.md(%22%22%22%0A%20%20%20%20%23%23%20Observations%20on%20Disparate%20Impact%20Analysis%0A%0A%20%20%20%20%23%23%23%20Disparate%20Impact%20on%20Marital%20Status%0A%20%20%20%20-%20**Definition**%3A%20Disparate%20impact%20occurs%20when%20a%20model's%20predictions%20disproportionately%20affect%20different%20demographic%20groups.%0A%20%20%20%20-%20**Observations**%3A%0A%20%20%20%20%20%20-%20Across%20all%20three%20models%20(SVM%2C%20Random%20Forest%2C%20Logistic%20Regression)%2C%20there%20is%20evidence%20of%20disparate%20impact%20based%20on%20marital%20status.%0A%20%20%20%20%20%20-%20**Single**%20individuals%20consistently%20have%20a%20higher%20proportion%20of%20%22yes%22%20predictions%20(around%2013-15%25)%20compared%20to%20**married**%20and%20**divorced**%20individuals%20(around%209-11%25).%0A%20%20%20%20%20%20-%20This%20indicates%20that%20the%20models%20are%20more%20likely%20to%20predict%20that%20single%20individuals%20will%20subscribe%20to%20term%20deposits%20compared%20to%20other%20marital%20groups.%0A%20%20%20%20%20%20-%20The%20Random%20Forest%20model%20shows%20the%20largest%20disparity%20between%20marital%20groups%2C%20suggesting%20it%20may%20be%20amplifying%20patterns%20in%20the%20training%20data.%0A%0A%20%20%20%20%23%23%23%20Disparate%20Impact%20on%20Education%0A%20%20%20%20-%20**Observations**%3A%0A%20%20%20%20%20%20-%20There%20is%20substantial%20disparate%20impact%20across%20education%20levels.%0A%20%20%20%20%20%20-%20Individuals%20with%20**tertiary**%20education%20consistently%20receive%20a%20higher%20proportion%20of%20%22yes%22%20predictions%20(15-17%25)%20compared%20to%20those%20with%20**primary**%20education%20(7-9%25).%0A%20%20%20%20%20%20-%20This%20suggests%20that%20the%20models%20might%20be%20reinforcing%20socioeconomic%20advantages%20already%20present%20in%20society%2C%20as%20higher%20education%20is%20often%20correlated%20with%20higher%20income%20and%20more%20financial%20resources.%0A%20%20%20%20%20%20-%20The%20**unknown**%20education%20category%20shows%20inconsistent%20patterns%20across%20models%2C%20highlighting%20the%20importance%20of%20complete%20demographic%20data%20for%20fairness%20assessments.%0A%0A%20%20%20%20%23%23%23%20Ethical%20Implications%0A%20%20%20%20-%20The%20observed%20disparate%20impact%20could%20lead%20to%20reinforcing%20existing%20inequalities%20in%20financial%20opportunity.%0A%20%20%20%20-%20Financial%20institutions%20might%20inadvertently%20target%20marketing%20campaigns%20toward%20already%20privileged%20groups%20(single%20individuals%20or%20those%20with%20tertiary%20education).%0A%20%20%20%20-%20This%20could%20result%20in%20less%20access%20to%20beneficial%20financial%20products%20for%20married%20individuals%20or%20those%20with%20lower%20educational%20attainment.%0A%20%20%20%20%22%22%22)%0A%20%20%20%20return%0A%0A%0A%40app.cell%0Adef%20_(X_test%2C%20mo%2C%20pd%2C%20predictions%2C%20y_test)%3A%0A%20%20%20%20from%20sklearn.metrics%20import%20accuracy_score%0A%0A%0A%20%20%20%20def%20disparity_mistreatment(y_pred%2C%20name%2C%20feature)%3A%0A%20%20%20%20%20%20%20%20eval_df%20%3D%20pd.DataFrame(%0A%20%20%20%20%20%20%20%20%20%20%20%20%7B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20feature%3A%20X_test%5Bfeature%5D%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%22Prediction%22%3A%20y_pred%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%22Actual%22%3A%20y_test%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%7D%0A%20%20%20%20%20%20%20%20)%0A%20%20%20%20%20%20%20%20accuracy%20%3D%20(%0A%20%20%20%20%20%20%20%20%20%20%20%20eval_df.groupby(feature)%0A%20%20%20%20%20%20%20%20%20%20%20%20.apply(lambda%20x%3A%20accuracy_score(x%5B%22Actual%22%5D%2C%20x%5B%22Prediction%22%5D))%0A%20%20%20%20%20%20%20%20%20%20%20%20.rename(%22Accuracy%22)%0A%20%20%20%20%20%20%20%20%20%20%20%20.reset_index()%0A%20%20%20%20%20%20%20%20)%0A%20%20%20%20%20%20%20%20accuracy%5B%22Accuracy%22%5D%20%3D%20accuracy%5B%22Accuracy%22%5D%20*%20100%0A%20%20%20%20%20%20%20%20return%20mo.ui.table(%0A%20%20%20%20%20%20%20%20%20%20%20%20accuracy%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20label%3Df%22%23%23%20Disparity%20Mistreatment%20(Accuracy)%20on%20**%7Bfeature%7D**%20for%20**%7Bname%7D**%22%2C%0A%20%20%20%20%20%20%20%20)%0A%0A%0A%20%20%20%20_tables%20%3D%20%5B%5D%0A%0A%20%20%20%20for%20_name%2C%20_preds%20in%20zip(%0A%20%20%20%20%20%20%20%20%5B%22SVM%22%2C%20%22Random%20Forest%22%2C%20%22Logistic%20Regression%22%5D%2C%0A%20%20%20%20%20%20%20%20predictions%2C%0A%20%20%20%20)%3A%0A%20%20%20%20%20%20%20%20_tables.append(%0A%20%20%20%20%20%20%20%20%20%20%20%20mo.hstack(%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%5B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20disparity_mistreatment(_preds%2C%20_name%2C%20%22marital%22)%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20disparity_mistreatment(_preds%2C%20_name%2C%20%22education%22)%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%5D%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20justify%3D%22space-between%22%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20align%3D%22stretch%22%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20)%0A%20%20%20%20%20%20%20%20)%0A%0A%20%20%20%20mo.vstack(_tables)%0A%20%20%20%20return%0A%0A%0A%40app.cell(hide_code%3DTrue)%0Adef%20_(mo)%3A%0A%20%20%20%20mo.md(r%22%22%22%0A%20%20%20%20%23%23%20Observations%20on%20Disparate%20Mistreatment%20Analysis%0A%0A%20%20%20%20%23%23%23%20Disparate%20Mistreatment%20on%20Marital%20Status%0A%20%20%20%20-%20**Definition**%3A%20Disparate%20mistreatment%20occurs%20when%20a%20model's%20accuracy%20differs%20across%20demographic%20groups.%0A%20%20%20%20-%20**Observations**%3A%0A%20%20%20%20%20%20-%20The%20accuracy%20of%20predictions%20varies%20across%20different%20marital%20status%20groups%20for%20all%20models.%0A%20%20%20%20%20%20-%20**SVM%20Model**%3A%20Shows%20similar%20accuracy%20for%20married%20(88.5%25)%20and%20single%20(88.8%25)%20groups%20but%20lower%20accuracy%20for%20divorced%20individuals%20(86.9%25).%0A%20%20%20%20%20%20-%20**Random%20Forest%20Model**%3A%20Exhibits%20highest%20accuracy%20for%20single%20individuals%20(89.5%25)%20compared%20to%20married%20(88.6%25)%20and%20divorced%20(87.2%25).%0A%20%20%20%20%20%20-%20**Logistic%20Regression**%3A%20Shows%20the%20most%20consistent%20performance%20across%20groups%20but%20still%20favors%20single%20individuals%20slightly.%0A%20%20%20%20%20%20-%20All%20models%20show%20a%201-2%20percentage%20point%20accuracy%20gap%20between%20the%20highest%20and%20lowest%20performing%20groups.%0A%0A%20%20%20%20%23%23%23%20Disparate%20Mistreatment%20on%20Education%0A%20%20%20%20-%20**Observations**%3A%0A%20%20%20%20%20%20-%20More%20pronounced%20accuracy%20disparities%20exist%20across%20education%20levels%20compared%20to%20marital%20status.%0A%20%20%20%20%20%20-%20**Tertiary%20Education**%3A%20Consistently%20receives%20the%20highest%20prediction%20accuracy%20across%20all%20models%20(89-90%25).%0A%20%20%20%20%20%20-%20**Primary%20Education**%3A%20Shows%20the%20lowest%20accuracy%20(85-87%25)%2C%20creating%20a%203-5%20percentage%20point%20gap%20with%20tertiary%20education.%0A%20%20%20%20%20%20-%20**Secondary%20Education**%3A%20Falls%20in%20between%20but%20closer%20to%20tertiary%20education%20performance.%0A%20%20%20%20%20%20-%20**Unknown%20Education**%3A%20Shows%20inconsistent%20patterns%2C%20highlighting%20potential%20issues%20with%20missing%20data.%0A%0A%20%20%20%20%23%23%23%20Ethical%20Implications%0A%20%20%20%20-%20The%20models%20are%20more%20accurate%20for%20privileged%20groups%20(higher%20education)%20and%20less%20accurate%20for%20potentially%20vulnerable%20groups%20(lower%20education).%0A%20%20%20%20-%20This%20accuracy%20disparity%20could%20result%20in%20more%20incorrect%20decisions%20for%20those%20with%20lower%20educational%20attainment%2C%20potentially%20perpetuating%20disadvantages.%0A%20%20%20%20-%20The%20disparity%20in%20accuracy%20suggests%20that%20the%20features%20used%20by%20the%20models%20might%20better%20represent%20the%20behavior%20of%20certain%20demographic%20groups%2C%20creating%20an%20inherent%20bias%20in%20the%20predictive%20capability.%0A%20%20%20%20%22%22%22)%0A%20%20%20%20return%0A%0A%0A%40app.cell%0Adef%20_(X_test%2C%20mo%2C%20pd%2C%20predictions%2C%20y_test)%3A%0A%20%20%20%20def%20disparity_treatment(y_pred%2C%20name%2C%20feature)%3A%0A%20%20%20%20%20%20%20%20eval_df%20%3D%20pd.DataFrame(%0A%20%20%20%20%20%20%20%20%20%20%20%20%7B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20feature%3A%20X_test%5Bfeature%5D%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%22Prediction%22%3A%20y_pred%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%22Actual%22%3A%20y_test%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%7D%0A%20%20%20%20%20%20%20%20)%0A%20%20%20%20%20%20%20%20accuracy%20%3D%20(%0A%20%20%20%20%20%20%20%20%20%20%20%20eval_df.groupby(feature)%0A%20%20%20%20%20%20%20%20%20%20%20%20.apply(lambda%20x%3A%20(x%5B%22Actual%22%5D%20!%3D%20x%5B%22Prediction%22%5D).mean())%0A%20%20%20%20%20%20%20%20%20%20%20%20.rename(%22Error%20Rate%22)%0A%20%20%20%20%20%20%20%20%20%20%20%20.reset_index()%0A%20%20%20%20%20%20%20%20)%0A%20%20%20%20%20%20%20%20return%20mo.ui.table(%0A%20%20%20%20%20%20%20%20%20%20%20%20accuracy%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20label%3Df%22%23%23%20Disparity%20Treatment%20on%20**%7Bfeature%7D**%20for%20**%7Bname%7D**%22%2C%0A%20%20%20%20%20%20%20%20)%0A%0A%0A%20%20%20%20_tables%20%3D%20%5B%5D%0A%0A%20%20%20%20for%20_name%2C%20_preds%20in%20zip(%0A%20%20%20%20%20%20%20%20%5B%22SVM%22%2C%20%22Random%20Forest%22%2C%20%22Logistic%20Regression%22%5D%2C%0A%20%20%20%20%20%20%20%20predictions%2C%0A%20%20%20%20)%3A%0A%20%20%20%20%20%20%20%20_tables.append(%0A%20%20%20%20%20%20%20%20%20%20%20%20mo.hstack(%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%5B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20disparity_treatment(_preds%2C%20_name%2C%20%22marital%22)%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20disparity_treatment(_preds%2C%20_name%2C%20%22education%22)%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%5D%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20justify%3D%22space-between%22%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20align%3D%22stretch%22%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20)%0A%20%20%20%20%20%20%20%20)%0A%0A%20%20%20%20mo.vstack(_tables)%0A%20%20%20%20return%0A%0A%0A%40app.cell(hide_code%3DTrue)%0Adef%20_(mo)%3A%0A%20%20%20%20mo.md(r%22%22%22%0A%20%20%20%20%23%23%20Observations%20on%20Disparate%20Treatment%20Analysis%0A%0A%20%20%20%20%23%23%23%20Disparate%20Treatment%20on%20Marital%20Status%0A%20%20%20%20-%20**Definition**%3A%20Disparate%20treatment%20examines%20if%20error%20rates%20differ%20across%20demographic%20groups.%0A%20%20%20%20-%20**Observations**%3A%0A%20%20%20%20%20%20-%20Error%20rates%20show%20the%20inverse%20pattern%20of%20accuracy%20metrics%20across%20marital%20status%20groups.%0A%20%20%20%20%20%20-%20**Divorced**%20individuals%20consistently%20have%20the%20highest%20error%20rates%20(12-14%25)%20across%20all%20models.%0A%20%20%20%20%20%20-%20**Single**%20and%20**married**%20groups%20show%20lower%20error%20rates%20(10-12%25).%0A%20%20%20%20%20%20-%20The%20Random%20Forest%20model%20displays%20the%20largest%20disparities%20in%20error%20rates%20between%20groups.%0A%20%20%20%20%20%20-%20These%20error%20rate%20differences%20indicate%20that%20divorced%20individuals%20are%20more%20likely%20to%20receive%20incorrect%20predictions.%0A%0A%20%20%20%20%23%23%23%20Disparate%20Treatment%20on%20Education%0A%20%20%20%20-%20**Observations**%3A%0A%20%20%20%20%20%20-%20**Primary%20Education**%3A%20Consistently%20experiences%20the%20highest%20error%20rates%20(13-15%25)%20across%20all%20models.%0A%20%20%20%20%20%20-%20**Tertiary%20Education**%3A%20Shows%20the%20lowest%20error%20rates%20(9-11%25)%2C%20creating%20a%20substantial%20gap%20with%20primary%20education.%0A%20%20%20%20%20%20-%20This%20pattern%20is%20consistent%20across%20all%20three%20models%2C%20suggesting%20a%20systematic%20issue%20rather%20than%20a%20model-specific%20problem.%0A%20%20%20%20%20%20-%20The%20gap%20in%20error%20rates%20(4-6%20percentage%20points)%20between%20highest%20and%20lowest%20education%20levels%20is%20more%20substantial%20than%20marital%20status%20disparities.%0A%0A%20%20%20%20%23%23%23%20Ethical%20Implications%0A%20%20%20%20-%20The%20higher%20error%20rates%20for%20divorced%20individuals%20and%20those%20with%20primary%20education%20could%20lead%20to%20systemic%20disadvantages%20for%20these%20groups.%0A%20%20%20%20-%20In%20a%20banking%20context%2C%20these%20disparities%20could%20translate%20into%3A%0A%20%20%20%20%20%201.%20Reduced%20opportunity%3A%20Higher%20false%20negative%20rates%20might%20cause%20marketing%20campaigns%20to%20miss%20potential%20customers%20among%20these%20groups.%0A%20%20%20%20%20%202.%20Resource%20misallocation%3A%20Higher%20false%20positive%20rates%20could%20lead%20to%20inefficient%20targeting%20of%20marketing%20resources.%0A%20%20%20%20%20%203.%20Trust%20issues%3A%20If%20certain%20groups%20consistently%20receive%20incorrect%20predictions%2C%20it%20could%20reduce%20their%20trust%20in%20financial%20services.%0A%0A%20%20%20%20%23%23%23%20Comparison%20Across%20Fairness%20Metrics%0A%20%20%20%20-%20The%20three%20fairness%20metrics%20(impact%2C%20mistreatment%2C%20and%20treatment)%20collectively%20indicate%20that%20the%20models%20show%20consistent%20patterns%20of%20bias%3A%0A%20%20%20%20%20%20-%20All%20metrics%20show%20advantages%20for%20single%20individuals%20and%20those%20with%20tertiary%20education.%0A%20%20%20%20%20%20-%20All%20metrics%20show%20disadvantages%20for%20divorced%20individuals%20and%20those%20with%20primary%20education.%0A%20%20%20%20%20%20-%20These%20consistent%20patterns%20across%20different%20fairness%20dimensions%20suggest%20deep-rooted%20biases%20in%20the%20dataset%20and%20modeling%20approach.%0A%20%20%20%20%22%22%22)%0A%20%20%20%20return%0A%0A%0A%40app.cell%0Adef%20_(mo)%3A%0A%20%20%20%20mo.md(r%22%22%22%0A%20%20%20%20%23%23%20Summary%20and%20Mitigation%20Strategies%0A%0A%20%20%20%20%23%23%23%20Overall%20Fairness%20Assessment%0A%20%20%20%20-%20The%20analysis%20reveals%20consistent%20bias%20patterns%20across%20multiple%20fairness%20metrics%20and%20machine%20learning%20models%3A%0A%20%20%20%20%20%20-%20**Demographic%20Disparities**%3A%20The%20models%20systematically%20favor%20individuals%20who%20are%20single%20and%20have%20tertiary%20education%2C%20while%20disadvantaging%20those%20who%20are%20divorced%20and%20have%20primary%20education.%0A%20%20%20%20%20%20-%20**Model%20Consistency**%3A%20All%20three%20models%20(SVM%2C%20Random%20Forest%2C%20and%20Logistic%20Regression)%20show%20similar%20patterns%20of%20bias%2C%20suggesting%20that%20the%20issue%20lies%20in%20the%20data%20rather%20than%20specific%20modeling%20choices.%0A%20%20%20%20%20%20-%20**Multiple%20Fairness%20Dimensions**%3A%20The%20biases%20are%20evident%20across%20impact%20(prediction%20rates)%2C%20mistreatment%20(accuracy)%2C%20and%20treatment%20(error%20rates)%20metrics%2C%20indicating%20a%20fundamental%20fairness%20issue.%0A%0A%20%20%20%20%23%23%23%20Potential%20Causes%20of%20Bias%0A%20%20%20%201.%20**Historical%20Data%20Patterns**%3A%20The%20training%20data%20likely%20reflects%20historical%20banking%20practices%20that%20favored%20certain%20demographic%20groups.%0A%20%20%20%202.%20**Feature%20Relevance**%3A%20Some%20features%20may%20be%20more%20predictive%20for%20certain%20demographic%20groups%20than%20others.%0A%20%20%20%203.%20**Data%20Representation**%3A%20Underrepresentation%20of%20certain%20groups%20in%20the%20training%20data%20could%20lead%20to%20less%20accurate%20models%20for%20those%20populations.%0A%20%20%20%204.%20**Proxy%20Variables**%3A%20Features%20like%20balance%20and%20loan%20status%20may%20act%20as%20proxies%20for%20demographic%20variables%2C%20perpetuating%20bias%20indirectly.%0A%0A%20%20%20%20%23%23%23%20Recommended%20Mitigation%20Strategies%0A%20%20%20%201.%20**Fairness-Aware%20Learning**%3A%0A%20%20%20%20%20%20%20-%20Implement%20fairness%20constraints%20during%20model%20training%20to%20equalize%20error%20rates%20across%20groups.%0A%20%20%20%20%20%20%20-%20Use%20adversarial%20debiasing%20techniques%20to%20reduce%20the%20model's%20ability%20to%20predict%20sensitive%20attributes.%0A%0A%20%20%20%202.%20**Data%20Interventions**%3A%0A%20%20%20%20%20%20%20-%20Resampling%3A%20Balance%20the%20dataset%20to%20ensure%20equal%20representation%20of%20different%20demographic%20groups.%0A%20%20%20%20%20%20%20-%20Feature%20selection%3A%20Remove%20or%20transform%20features%20that%20may%20serve%20as%20proxies%20for%20sensitive%20attributes.%0A%0A%20%20%20%203.%20**Post-Processing%20Approaches**%3A%0A%20%20%20%20%20%20%20-%20Adjust%20decision%20thresholds%20differently%20for%20each%20demographic%20group%20to%20equalize%20outcome%20rates.%0A%20%20%20%20%20%20%20-%20Implement%20rejection%20sampling%20to%20ensure%20fairness%20in%20final%20predictions.%0A%0A%20%20%20%204.%20**Monitoring%20and%20Evaluation**%3A%0A%20%20%20%20%20%20%20-%20Establish%20continuous%20monitoring%20of%20fairness%20metrics%20in%20production.%0A%20%20%20%20%20%20%20-%20Regularly%20retrain%20models%20with%20updated%20data%20that%20better%20represents%20all%20groups.%0A%0A%20%20%20%205.%20**Holistic%20Approach**%3A%0A%20%20%20%20%20%20%20-%20Consider%20the%20broader%20social%20context%20of%20banking%20decisions.%0A%20%20%20%20%20%20%20-%20Combine%20algorithmic%20solutions%20with%20policy%20changes%20to%20address%20systemic%20bias.%0A%0A%20%20%20%20%23%23%23%20Ethical%20Considerations%0A%20%20%20%20The%20ethical%20use%20of%20machine%20learning%20in%20banking%20requires%20balancing%20predictive%20performance%20with%20fairness%20concerns.%20Financial%20institutions%20have%20a%20responsibility%20to%20ensure%20equitable%20access%20to%20services%20while%20maintaining%20business%20viability.%20Transparent%20communication%20about%20model%20limitations%20and%20continuous%20improvement%20of%20fairness%20metrics%20should%20be%20standard%20practice%20in%20responsible%20AI%20deployment.%0A%20%20%20%20%22%22%22)%0A%20%20%20%20return%0A%0A%0A%40app.cell(hide_code%3DTrue)%0Adef%20_(mo)%3A%0A%20%20%20%20mo.md(r%22%22%22%0A%20%20%20%20%23%23%20Model%20Deployability%20Comparison%0A%0A%20%20%20%20The%20table%20below%20provides%20a%20comprehensive%20comparison%20of%20the%20three%20machine%20learning%20models%20evaluated%20in%20this%20analysis%2C%20with%20a%20focus%20on%20their%20deployability%20in%20a%20real-world%20banking%20context.%0A%0A%20%20%20%20%7C%20Factor%20%7C%20SVM%20%7C%20Random%20Forest%20%7C%20Logistic%20Regression%20%7C%20Notes%20%7C%0A%20%20%20%20%7C--------%7C-----%7C---------------%7C---------------------%7C-------%7C%0A%20%20%20%20%7C%20**Overall%20Accuracy**%20%7C%20%E2%AD%90%E2%AD%90%E2%AD%90%E2%AD%90%20%7C%20%E2%AD%90%E2%AD%90%E2%AD%90%E2%AD%90%E2%AD%90%20%7C%20%E2%AD%90%E2%AD%90%E2%AD%90%20%7C%20Random%20Forest%20achieves%20highest%20overall%20accuracy%20%7C%0A%20%20%20%20%7C%20**Fairness%20-%20Marital%20Status**%20%7C%20%E2%AD%90%E2%AD%90%E2%AD%90%E2%AD%90%20%7C%20%E2%AD%90%E2%AD%90%E2%AD%90%20%7C%20%E2%AD%90%E2%AD%90%E2%AD%90%E2%AD%90%20%7C%20SVM%20and%20LR%20have%20more%20consistent%20performance%20across%20marital%20groups%20%7C%0A%20%20%20%20%7C%20**Fairness%20-%20Education**%20%7C%20%E2%AD%90%E2%AD%90%E2%AD%90%20%7C%20%E2%AD%90%E2%AD%90%20%7C%20%E2%AD%90%E2%AD%90%E2%AD%90%E2%AD%90%20%7C%20LR%20shows%20smallest%20disparity%20across%20education%20levels%20%7C%0A%20%20%20%20%7C%20**Computational%20Efficiency**%20%7C%20%E2%AD%90%E2%AD%90%20%7C%20%E2%AD%90%20%7C%20%E2%AD%90%E2%AD%90%E2%AD%90%E2%AD%90%E2%AD%90%20%7C%20LR%20is%20significantly%20more%20efficient%20for%20large-scale%20deployment%20%7C%0A%20%20%20%20%7C%20**Interpretability**%20%7C%20%E2%AD%90%E2%AD%90%20%7C%20%E2%AD%90%E2%AD%90%E2%AD%90%20%7C%20%E2%AD%90%E2%AD%90%E2%AD%90%E2%AD%90%E2%AD%90%20%7C%20LR%20coefficients%20directly%20indicate%20feature%20importance%20%7C%0A%20%20%20%20%7C%20**Robustness%20to%20Outliers**%20%7C%20%E2%AD%90%E2%AD%90%E2%AD%90%E2%AD%90%E2%AD%90%20%7C%20%E2%AD%90%E2%AD%90%E2%AD%90%E2%AD%90%20%7C%20%E2%AD%90%E2%AD%90%20%7C%20SVM%20is%20least%20affected%20by%20outliers%20%7C%0A%20%20%20%20%7C%20**Scalability**%20%7C%20%E2%AD%90%E2%AD%90%20%7C%20%E2%AD%90%E2%AD%90%E2%AD%90%20%7C%20%E2%AD%90%E2%AD%90%E2%AD%90%E2%AD%90%E2%AD%90%20%7C%20LR%20scales%20better%20to%20large%20datasets%20%7C%0A%20%20%20%20%7C%20**Regulatory%20Compliance**%20%7C%20%E2%AD%90%E2%AD%90%20%7C%20%E2%AD%90%E2%AD%90%E2%AD%90%20%7C%20%E2%AD%90%E2%AD%90%E2%AD%90%E2%AD%90%E2%AD%90%20%7C%20LR's%20interpretability%20makes%20it%20easier%20to%20explain%20to%20regulators%20%7C%0A%20%20%20%20%7C%20**Ease%20of%20Updates**%20%7C%20%E2%AD%90%E2%AD%90%E2%AD%90%20%7C%20%E2%AD%90%E2%AD%90%20%7C%20%E2%AD%90%E2%AD%90%E2%AD%90%E2%AD%90%E2%AD%90%20%7C%20LR%20models%20can%20be%20updated%20incrementally%20with%20new%20data%20%7C%0A%20%20%20%20%7C%20**Bias%20Mitigation%20Potential**%20%7C%20%E2%AD%90%E2%AD%90%E2%AD%90%20%7C%20%E2%AD%90%E2%AD%90%20%7C%20%E2%AD%90%E2%AD%90%E2%AD%90%E2%AD%90%E2%AD%90%20%7C%20LR%20allows%20for%20more%20straightforward%20bias%20mitigation%20strategies%20%7C%0A%20%20%20%20%22%22%22)%0A%20%20%20%20return%0A%0A%0Aif%20__name__%20%3D%3D%20%22__main__%22%3A%0A%20%20%20%20app.run()%0A