跳至主要內容

Nobel Prize Winners

HongShu, DataCamp大约 26 分钟

Nobel Prize Winners

1. The most Nobel of Prizes

img

The Nobel Prize is perhaps the world's most well known scientific award. Except for the honor, prestige and substantial prize money the recipient also gets a gold medal showing Alfred Nobel (1833 - 1896) who established the prize. Every year it's given to scientists and scholars in the categories chemistry, literature, physics, physiology or medicine, economics, and peace. The first Nobel Prize was handed out in 1901, and at that time the Prize was very Eurocentric and male-focused, but nowadays it's not biased in any way whatsoever. Surely. Right?

Well, we're going to find out! The Nobel Foundation has made a dataset available of all prize winners from the start of the prize, in 1901, to 2016. Let's load it in and take a look.

# Loading in required libraries
# ... YOUR CODE FOR TASK 1 ...
import pandas as pd
import seaborn as sns
import numpy as np

# Reading in the Nobel Prize data
nobel = pd.read_csv("datasets/nobel.csv")

# Taking a look at the first several winners
# ... YOUR CODE FOR TASK 1 ...
nobel.head(n = 6)
yearcategoryprizemotivationprize_sharelaureate_idlaureate_typefull_namebirth_datebirth_citybirth_countrysexorganization_nameorganization_cityorganization_countrydeath_datedeath_citydeath_country
01901ChemistryThe Nobel Prize in Chemistry 1901"in recognition of the extraordinary services ...1/1160IndividualJacobus Henricus van 't Hoff1852-08-30RotterdamNetherlandsMaleBerlin UniversityBerlinGermany1911-03-01BerlinGermany
11901LiteratureThe Nobel Prize in Literature 1901"in special recognition of his poetic composit...1/1569IndividualSully Prudhomme1839-03-16ParisFranceMaleNaNNaNNaN1907-09-07ChâtenayFrance
21901MedicineThe Nobel Prize in Physiology or Medicine 1901"for his work on serum therapy, especially its...1/1293IndividualEmil Adolf von Behring1854-03-15Hansdorf (Lawice)Prussia (Poland)MaleMarburg UniversityMarburgGermany1917-03-31MarburgGermany
31901PeaceThe Nobel Peace Prize 1901NaN1/2462IndividualJean Henry Dunant1828-05-08GenevaSwitzerlandMaleNaNNaNNaN1910-10-30HeidenSwitzerland
41901PeaceThe Nobel Peace Prize 1901NaN1/2463IndividualFrédéric Passy1822-05-20ParisFranceMaleNaNNaNNaN1912-06-12ParisFrance
51901PhysicsThe Nobel Prize in Physics 1901"in recognition of the extraordinary services ...1/11IndividualWilhelm Conrad Röntgen1845-03-27Lennep (Remscheid)Prussia (Germany)MaleMunich UniversityMunichGermany1923-02-10MunichGermany

2. So, who gets the Nobel Prize?

Just looking at the first couple of prize winners, or Nobel laureates as they are also called, we already see a celebrity: Wilhelm Conrad Röntgen, the guy who discovered X-rays. And actually, we see that all of the winners in 1901 were guys that came from Europe. But that was back in 1901, looking at all winners in the dataset, from 1901 to 2016, which sex and which country is the most commonly represented?

(For country, we will use the birth_country of the winner, as the organization_country is NaN for all shared Nobel Prizes.)

In [75]:

# Display the number of (possibly shared) Nobel Prizes handed
# out between 1901 and 2016
# ... YOUR CODE FOR TASK 2 ...
print(len(nobel))

# Display the number of prizes won by male and female recipients.
# ... YOUR CODE FOR TASK 2 ...
print(nobel["sex"].value_counts())

# Display the number of prizes won by the top 10 nationalities.
# ... YOUR CODE FOR TASK 2 ...
nobel['birth_country'].value_counts().head(10)

Out[75]:

911

Male      836
Female     49
Name: sex, dtype: int64

United States of America    259
United Kingdom               85
Germany                      61
France                       51
Sweden                       29
Japan                        24
Netherlands                  18
Canada                       18
Russia                       17
Italy                        17
Name: birth_country, dtype: int64

3. USA dominance

Not so surprising perhaps: the most common Nobel laureate between 1901 and 2016 was a man born in the United States of America. But in 1901 all the winners were European. When did the USA start to dominate the Nobel Prize charts?

In [77]:

# Calculating the proportion of USA born winners per decade
nobel['usa_born_winner'] = nobel["birth_country"] == "United States of America"
nobel['decade'] = (np.floor((nobel["year"]/10)) * 10).astype(int)
prop_usa_winners = nobel.groupby("decade", as_index=False)["usa_born_winner"].mean()

# Display the proportions of USA born winners per decade
prop_usa_winners

Out[77]:

decadeusa_born_winner
019000.017544
119100.075000
219200.074074
319300.250000
419400.302326
519500.291667
619600.265823
719700.317308
819800.319588
919900.403846
1020000.422764
1120100.292683

4. USA dominance, visualized

A table is OK, but to see when the USA started to dominate the Nobel charts we need a plot!

In [79]:

# Setting the plotting theme
sns.set()
# and setting the size of all plots.
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = [11, 7]

# Plotting USA born winners 
ax = sns.lineplot(x = "decade", y = "usa_born_winner", data=prop_usa_winners)

# Adding %-formatting to the y-axis
from matplotlib.ticker import PercentFormatter
# ... YOUR CODE FOR TASK 4 ...
ax.yaxis.set_major_formatter(PercentFormatter(1.0))
1654022379348.png

5. What is the gender of a typical Nobel Prize winner?

So the USA became the dominating winner of the Nobel Prize first in the 1930s and had kept the leading position ever since. But one group that was in the lead from the start, and never seems to let go, are men. Maybe it shouldn't come as a shock that there is some imbalance between how many male and female prize winners there are, but how significant is this imbalance? And is it better or worse within specific prize categories like physics, medicine, literature, etc.?

In [81]:

# Calculating the proportion of female laureates per decade
nobel['female_winner'] = nobel["sex"] == "Female"
prop_female_winners = nobel.groupby(["decade", "category"], as_index=False)["female_winner"].mean()
prop_female_winners
# Plotting USA born winners with % winners on the y-axis
# ... YOUR CODE FOR TASK 5 ...
ax = sns.lineplot(x='decade', y='female_winner', hue='category', data=prop_female_winners)
ax.yaxis.set_major_formatter(PercentFormatter(1.0))
1654022598611.png
1654022598611.png

6. The first woman to win the Nobel Prize

The plot above is a bit messy as the lines are overplotting. But it does show some interesting trends and patterns. Overall the imbalance is pretty large with physics, economics, and chemistry having the largest imbalance. Medicine has a somewhat positive trend, and since the 1990s the literature prize is also now more balanced. The big outlier is the peace prize during the 2010s, but keep in mind that this just covers the years 2010 to 2016.

Given this imbalance, who was the first woman to receive a Nobel Prize? And in what category?

In [83]:

# Picking out the first woman to win a Nobel Prize
# ... YOUR CODE FOR TASK 5 ...
nobel[nobel['sex'] == 'Female'].nsmallest(2, 'year')

Out[83]:

yearcategoryprizemotivationprize_sharelaureate_idlaureate_typefull_namebirth_datebirth_city...sexorganization_nameorganization_cityorganization_countrydeath_datedeath_citydeath_countryusa_born_winnerdecadefemale_winner
191903PhysicsThe Nobel Prize in Physics 1903"in recognition of the extraordinary services ...1/46IndividualMarie Curie, née Sklodowska1867-11-07Warsaw...FemaleNaNNaNNaN1934-07-04SallanchesFranceFalse1900True
291905PeaceThe Nobel Peace Prize 1905NaN1/1468IndividualBaroness Bertha Sophie Felicita von Suttner, n...1843-06-09Prague...FemaleNaNNaNNaN1914-06-21ViennaAustriaFalse1900True

2 rows × 21 columns

7. Repeat laureates

For most scientists/writers/activists a Nobel Prize would be the crowning achievement of a long career. But for some people, one is just not enough, and few have gotten it more than once. Who are these lucky few? (Having won no Nobel Prize myself, I'll assume it's just about luck.)

In [85]:

# Selecting the laureates that have received 2 or more prizes.
# ... YOUR CODE FOR TASK 5 ...
nobel.groupby("full_name").filter(lambda group: len(group) > 1)

Out[85]:

yearcategoryprizemotivationprize_sharelaureate_idlaureate_typefull_namebirth_datebirth_city...sexorganization_nameorganization_cityorganization_countrydeath_datedeath_citydeath_countryusa_born_winnerdecadefemale_winner
191903PhysicsThe Nobel Prize in Physics 1903"in recognition of the extraordinary services ...1/46IndividualMarie Curie, née Sklodowska1867-11-07Warsaw...FemaleNaNNaNNaN1934-07-04SallanchesFranceFalse1900True
621911ChemistryThe Nobel Prize in Chemistry 1911"in recognition of her services to the advance...1/16IndividualMarie Curie, née Sklodowska1867-11-07Warsaw...FemaleSorbonne UniversityParisFrance1934-07-04SallanchesFranceFalse1910True
891917PeaceThe Nobel Peace Prize 1917NaN1/1482OrganizationComité international de la Croix Rouge (Intern...NaNNaN...NaNNaNNaNNaNNaNNaNNaNFalse1910False
2151944PeaceThe Nobel Peace Prize 1944NaN1/1482OrganizationComité international de la Croix Rouge (Intern...NaNNaN...NaNNaNNaNNaNNaNNaNNaNFalse1940False
2781954ChemistryThe Nobel Prize in Chemistry 1954"for his research into the nature of the chemi...1/1217IndividualLinus Carl Pauling1901-02-28Portland, OR...MaleCalifornia Institute of Technology (Caltech)Pasadena, CAUnited States of America1994-08-19Big Sur, CAUnited States of AmericaTrue1950False
2831954PeaceThe Nobel Peace Prize 1954NaN1/1515OrganizationOffice of the United Nations High Commissioner...NaNNaN...NaNNaNNaNNaNNaNNaNNaNFalse1950False
2981956PhysicsThe Nobel Prize in Physics 1956"for their researches on semiconductors and th...1/366IndividualJohn Bardeen1908-05-23Madison, WI...MaleUniversity of IllinoisUrbana, ILUnited States of America1991-01-30Boston, MAUnited States of AmericaTrue1950False
3061958ChemistryThe Nobel Prize in Chemistry 1958"for his work on the structure of proteins, es...1/1222IndividualFrederick Sanger1918-08-13Rendcombe...MaleUniversity of CambridgeCambridgeUnited Kingdom2013-11-19CambridgeUnited KingdomFalse1950False
3401962PeaceThe Nobel Peace Prize 1962NaN1/1217IndividualLinus Carl Pauling1901-02-28Portland, OR...MaleCalifornia Institute of Technology (Caltech)Pasadena, CAUnited States of America1994-08-19Big Sur, CAUnited States of AmericaTrue1960False
3481963PeaceThe Nobel Peace Prize 1963NaN1/2482OrganizationComité international de la Croix Rouge (Intern...NaNNaN...NaNNaNNaNNaNNaNNaNNaNFalse1960False
4241972PhysicsThe Nobel Prize in Physics 1972"for their jointly developed theory of superco...1/366IndividualJohn Bardeen1908-05-23Madison, WI...MaleUniversity of IllinoisUrbana, ILUnited States of America1991-01-30Boston, MAUnited States of AmericaTrue1970False
5051980ChemistryThe Nobel Prize in Chemistry 1980"for their contributions concerning the determ...1/4222IndividualFrederick Sanger1918-08-13Rendcombe...MaleMRC Laboratory of Molecular BiologyCambridgeUnited Kingdom2013-11-19CambridgeUnited KingdomFalse1980False
5231981PeaceThe Nobel Peace Prize 1981NaN1/1515OrganizationOffice of the United Nations High Commissioner...NaNNaN...NaNNaNNaNNaNNaNNaNNaNFalse1980False

13 rows × 21 columns

8. How old are you when you get the prize?

The list of repeat winners contains some illustrious names! We again meet Marie Curie, who got the prize in physics for discovering radiation and in chemistry for isolating radium and polonium. John Bardeen got it twice in physics for transistors and superconductivity, Frederick Sanger got it twice in chemistry, and Linus Carl Pauling got it first in chemistry and later in peace for his work in promoting nuclear disarmament. We also learn that organizations also get the prize as both the Red Cross and the UNHCR have gotten it twice.

But how old are you generally when you get the prize?

In [87]:

# Converting birth_date from String to datetime
nobel['birth_date'] = pd.to_datetime(nobel['birth_date'])

# Calculating the age of Nobel Prize winners
nobel['age'] = nobel['year'] - nobel['birth_date'].dt.year

# Plotting the age of Nobel Prize winners
sns.lmplot(y = "year", x = "age", data = nobel, lowess=True, 
            line_kws={'color' : 'black'})
1654022729230.png
1654022729230.png

Age differences between prize categories

The plot above shows us a lot! We see that people use to be around 55 when they received the price, but nowadays the average is closer to 65. But there is a large spread in the laureates' ages, and while most are 50+, some are very young.

We also see that the density of points is much high nowadays than in the early 1900s -- nowadays many more of the prizes are shared, and so there are many more winners. We also see that there was a disruption in awarded prizes around the Second World War (1939 - 1945).

Let's look at age trends within different prize categories.

In [89]:

# Same plot as above, but separate plots for each type of Nobel Prize
# ... YOUR CODE FOR TASK 9 ...
sns.lmplot(y = "age", x = "year", row='category', aspect=2, line_kws={'color' : 'black'}, data = nobel)
1654022749180.png

10. Oldest and youngest winners

More plots with lots of exciting stuff going on! We see that both winners of the chemistry, medicine, and physics prize have gotten older over time. The trend is strongest for physics: the average age used to be below 50, and now it's almost 70. Literature and economics are more stable. We also see that economics is a newer category. But peace shows an opposite trend where winners are getting younger!

In the peace category we also a winner around 2010 that seems exceptionally young. This begs the questions, who are the oldest and youngest people ever to have won a Nobel Prize?

In [91]:

# The oldest winner of a Nobel Prize as of 2016
# ... YOUR CODE FOR TASK 10 ...
display(nobel.nlargest(1, 'age'))
# The youngest winner of a Nobel Prize as of 2016
# ... YOUR CODE FOR TASK 10 ...
nobel.nsmallest(1, 'age')
yearcategoryprizemotivationprize_sharelaureate_idlaureate_typefull_namebirth_datebirth_city...organization_nameorganization_cityorganization_countrydeath_datedeath_citydeath_countryusa_born_winnerdecadefemale_winnerage
7932007EconomicsThe Sveriges Riksbank Prize in Economic Scienc..."for having laid the foundations of mechanism ...1/3820IndividualLeonid Hurwicz1917-08-21Moscow...University of MinnesotaMinneapolis, MNUnited States of America2008-06-24Minneapolis, MNUnited States of AmericaFalse2000False90.0

1 rows × 22 columns

Out[91]:

yearcategoryprizemotivationprize_sharelaureate_idlaureate_typefull_namebirth_datebirth_city...organization_nameorganization_cityorganization_countrydeath_datedeath_citydeath_countryusa_born_winnerdecadefemale_winnerage
8852014PeaceThe Nobel Peace Prize 2014"for their struggle against the suppression of...1/2914IndividualMalala Yousafzai1997-07-12Mingora...NaNNaNNaNNaNNaNNaNFalse2010True17.0

1 rows × 22 columns

11. You get a prize!

img
# The name of the youngest winner of the Nobel Prize as of 2016
youngest_winner = nobel.nsmallest(1, 'age')["full_name"].reset_index().iloc[0,1]