import numpy as npimport pandas as pdimport matplotlib.pyplot as pltimport seaborn as snsimport altair as altfrom sklearn.datasets import load_irisimport plotly.express as pximport plotly.io as piopio.renderers.default='plotly_mimetype+notebook_connected'
/var/folders/9b/bb7yf3dj4qzbc23czfb1z3rh0000gn/T/ipykernel_27695/4273991784.py:2: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
This graph is short and simple, which is why I think it is so effective. You can see the clear gap between Japan and the other countries, but most interestingly, China has more hospital beds per capita and countries like Canada and the United States. This is a surprise since China not only has a population of over a billion, but in most per capita metrics, China seems to fall behind other superpower nations because of the overwhelming size of their population. The next thing I would investigate would be how the other European nations shape up against France, since the size of their healthcare industry may be related to other factors or it may just be the standard in Europe.
Code
inc = wb1519[wb1519['Series Name'].isin(['Income share held by highest 10%'])]inc = inc[inc['Country Name'].isin(['Japan', 'France', 'Brazil', 'United States', 'Canada', 'China'])]inc['2015'] = pd.to_numeric(inc['2015'])inc['2019'] = pd.to_numeric(inc['2019'])inc = inc.sort_values('2015')for c in inc.index: plt.plot([0,1], [inc.loc[c,'2015'], inc.loc[c,'2019']], label=inc.loc[c,'Country Name'], marker='o')plt.xticks([0,1], ["2015","2019"])plt.ylabel("Income Share of Highest 10% Earners (%)")plt.title("Wealth Inequality (2015 vs 2019)")plt.legend(title="Country", bbox_to_anchor=(1.05, 1), loc='upper left')plt.ylim(22.5, 42.5)plt.show()
The y-value on this graph shows the percentage of income that the top 10% of highest earners have over the rest of the population. Thus, if everyone had the same wage, we would expect this percentage to be 10%. Although the slope aspect of this graph does not really highlight any major changes with these 6 countries over a 5 year timeframe, the graph itself is a nice visualization of the differences in wealth disparity between different countries. Brazil is definitely the outlier out of these nations, with the top 10% earning almost twice as much (percentage wise) as the top 10% in Japan, Canada, and France. It is important to note this is not measuring the total wealth of the top 10%, but on the year-to-year income of the highest earners, in each country, compared to the other 90% of the population.
Although this is not an index with a baseline, I included a horizontel line to indicate the average interest rate across all the years of the 3 selected countries. While these graphs do not have much meaning without context, I still believe this is a great way to visualize the given information. The shared Y-axis is perfect for something simple like interest rates, but having the text labels for each datapoint makes this visualization sufficient in every aspect.
Final Reflection: I think the facetted graphs have the most possible use cases in data visualization, as it relays the same information as having multiple plots on the same graph, but it seperates the different groups which helps with readability. My favorite visaulization from this homework would be the sorted Cleveland dot plot, similaraly to a bar graph, it does a nice job at comparing a single measurement while keeping the layout simple. In my opinion, the slope graph has the most potential here as I feel like the data I chose was not the best use for the slope graph. Maybe if I extended to timeframe to 10 years, we would see more drastic changes that can tell us more about the underlying political and economic situations that occured in each nation.