Week 6 HW

Author
Affiliation

Ben Akyrueklier

George Washington University

Published

October 7, 2025

Code
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import altair as alt
from sklearn.datasets import load_iris
import plotly.express as px
import plotly.io as pio
pio.renderers.default='plotly_mimetype+notebook_connected'
Code
wb = pd.read_csv("Data/WBnew.csv")
new_column_names = {'2015 [YR2015]': '2015', '2016 [YR2016]': '2016', '2017 [YR2017]': '2017', '2018 [YR2018]': '2018', '2019 [YR2019]': '2019'}
wb1519 = wb.rename(columns=new_column_names)
wb1519 = wb1519.drop(columns=['2005 [YR2005]', '2006 [YR2006]', '2007 [YR2007]', '2008 [YR2008]', '2009 [YR2009]', '2010 [YR2010]', '2011 [YR2011]', '2012 [YR2012]', '2013 [YR2013]', '2014 [YR2014]', '2020 [YR2020]', '2021 [YR2021]', '2022 [YR2022]', '2023 [YR2023]', '2024 [YR2024]'])
wb1519.head()
Country Name Country Code Series Name Series Code 2015 2016 2017 2018 2019
0 Afghanistan AFG GDP per capita (current US$) NY.GDP.PCAP.CD 565.569730408751 522.082215583898 525.469770891619 491.337221382603 496.6025042585
1 Afghanistan AFG Hospital beds (per 1,000 people) SH.MED.BEDS.ZS 0.44 0.45 0.42 0.4 0.38
2 Afghanistan AFG Life expectancy at birth, total (years) SP.DYN.LE00.IN 62.27 62.646 62.406 62.443 62.941
3 Afghanistan AFG Net migration SM.POP.NETM -286314 -143049 -71491 -36753 9159
4 Afghanistan AFG Secure Internet servers (per 1 million people) IT.NET.SECR.P6 2.18729357416894 12.2764405423167 44.1873650754779 53.4795175761047 27.6573503133086
Code
wbmelt = pd.melt(wb1519, id_vars=['Country Name','Series Name'], value_vars=['2015', '2016', '2017', '2018', '2019'], var_name='Year', value_name='Value')
wbmelt = wbmelt[wbmelt['Country Name'].isin(['Japan', 'France', 'Brazil', 'United States', 'Canada', 'China'])]
wbmelt = wbmelt.dropna()
wbpivot = wbmelt.pivot(index=['Country Name', 'Year'], columns='Series Name', values='Value').reset_index()
wbpivot = wbpivot.dropna(axis=1, how='all')
wbpivot.head()
Series Name Country Name Year GDP per capita (current US$) Hospital beds (per 1,000 people) Income share held by highest 10% Life expectancy at birth, total (years) Net migration Real interest rate (%) Researchers in R&D (per million people) Secure Internet servers (per 1 million people)
0 Brazil 2015 8936.19661712113 2.35 40.9 75.106 -173611 33.8323439727973 .. 161.164815967859
1 Brazil 2016 8836.28652735657 2.32 42.1 75.081 -92989 40.6983614262467 .. 415.986539467638
2 Brazil 2017 10080.5092819305 2.3 42 75.383 -156296 41.7138078856955 .. 1605.82544177505
3 Brazil 2018 9300.66164923219 2.26 42.5 75.633 -230334 33.1023342519639 .. 2069.60200203718
4 Brazil 2019 9029.83326681073 2.24 42 75.809 -129216 31.9030727578921 .. 2788.39613470957
Code
wb19 = wbpivot[wbpivot['Year'].isin(['2019'])]
wb19["Hospital beds (per 1,000 people)"] = pd.to_numeric(wb19["Hospital beds (per 1,000 people)"])
hospital_sort = wb19.sort_values("Hospital beds (per 1,000 people)")
x=hospital_sort["Country Name"]
y=hospital_sort["Hospital beds (per 1,000 people)"]

plt.plot(x, y, 'o')
plt.ylim(0, y.max()*1.1)
plt.ylabel("Hospital beds (per 1,000 people)") 
plt.xlabel("Country")
plt.title("Hospital Beds per 1,000 People (2019)")
for xi, yi in zip(x, y):
    plt.text(xi, yi+0.4, str(yi), ha='center')
plt.show()
/var/folders/9b/bb7yf3dj4qzbc23czfb1z3rh0000gn/T/ipykernel_27695/4273991784.py:2: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

This graph is short and simple, which is why I think it is so effective. You can see the clear gap between Japan and the other countries, but most interestingly, China has more hospital beds per capita and countries like Canada and the United States. This is a surprise since China not only has a population of over a billion, but in most per capita metrics, China seems to fall behind other superpower nations because of the overwhelming size of their population. The next thing I would investigate would be how the other European nations shape up against France, since the size of their healthcare industry may be related to other factors or it may just be the standard in Europe.

Code
inc = wb1519[wb1519['Series Name'].isin(['Income share held by highest 10%'])]
inc = inc[inc['Country Name'].isin(['Japan', 'France', 'Brazil', 'United States', 'Canada', 'China'])]
inc['2015'] = pd.to_numeric(inc['2015'])
inc['2019'] = pd.to_numeric(inc['2019'])
inc = inc.sort_values('2015')

for c in inc.index:
    plt.plot([0,1], [inc.loc[c,'2015'], inc.loc[c,'2019']], label=inc.loc[c,'Country Name'], marker='o')
plt.xticks([0,1], ["2015","2019"])
plt.ylabel("Income Share of Highest 10% Earners (%)")
plt.title("Wealth Inequality (2015 vs 2019)")
plt.legend(title="Country", bbox_to_anchor=(1.05, 1), loc='upper left')
plt.ylim(22.5, 42.5)
plt.show()

The y-value on this graph shows the percentage of income that the top 10% of highest earners have over the rest of the population. Thus, if everyone had the same wage, we would expect this percentage to be 10%. Although the slope aspect of this graph does not really highlight any major changes with these 6 countries over a 5 year timeframe, the graph itself is a nice visualization of the differences in wealth disparity between different countries. Brazil is definitely the outlier out of these nations, with the top 10% earning almost twice as much (percentage wise) as the top 10% in Japan, Canada, and France. It is important to note this is not measuring the total wealth of the top 10%, but on the year-to-year income of the highest earners, in each country, compared to the other 90% of the population.

Code
interest=wb1519[wb1519['Country Name'].isin(['Australia', 'United States', 'China'])]
interest=interest[interest['Series Name'].isin(['Real interest rate (%)'])]
years=['2015', '2016', '2017', '2018', '2019']
total=0
for y in years:
    interest[y]=pd.to_numeric(interest[y])
    total+=interest[y]
avg=total/5
totavg=avg.sum()/3
i=0
fig, axes = plt.subplots(1, 3, figsize=(15, 5), sharey=True)
for c in interest.index:
    axes[i].plot(years, interest.loc[c, years], marker='o')
    axes[i].set_title(interest.loc[c,'Country Name'])
    axes[i].set_xlabel("Year")
    axes[0].set_ylabel("Real Interest Rate (%)")
    for xi, yi in zip(years, interest.loc[c, years]):
        axes[i].text(xi, yi-0.3, str(round(yi, 2)), va='top', ha='center')
    i+=1
for ax in axes:
    ax.axhline(y=totavg, color='gray', linestyle='--', linewidth=0.7)
axes[0].text(x=years[0], y=totavg-0.1, s='Average Interest Rate', color='gray', va='top')
fig.suptitle('Real Interest Rates (2015-2019)')
plt.ylim(-0.5, 7)
plt.tight_layout()
plt.show()

Although this is not an index with a baseline, I included a horizontel line to indicate the average interest rate across all the years of the 3 selected countries. While these graphs do not have much meaning without context, I still believe this is a great way to visualize the given information. The shared Y-axis is perfect for something simple like interest rates, but having the text labels for each datapoint makes this visualization sufficient in every aspect.

Final Reflection: I think the facetted graphs have the most possible use cases in data visualization, as it relays the same information as having multiple plots on the same graph, but it seperates the different groups which helps with readability. My favorite visaulization from this homework would be the sorted Cleveland dot plot, similaraly to a bar graph, it does a nice job at comparing a single measurement while keeping the layout simple. In my opinion, the slope graph has the most potential here as I feel like the data I chose was not the best use for the slope graph. Maybe if I extended to timeframe to 10 years, we would see more drastic changes that can tell us more about the underlying political and economic situations that occured in each nation.