I enjoy doing new tunes; it gives me a little bit to perk up, to pay a little bit more attention (Earl Scruggs, American musician)
I recently finished reading The Signal and the Noise, a book by Nate Silver, creator of the also famous FiveThirtyEight blog. The book is a very good reading for all data science professionals, and is a must in particular for all those who work trying to predict the future. The book praises the bayesian way of thinking as the best way to face and modify predictions and criticizes rigid ways of thinking with many examples of disastrous predictions. I enjoyed a lot the chapter dedicated to chess and how Deep Blue finally took over Kasparov. In a nutshell: I strongly recommend it.
One of the plots of Silver’s book present a case of false negative showing the relationship between obesity and calorie consumption across the world countries. The plot shows that there is no evidence of a connection between both variables. Since it seemed very strange to me, I decided to reproduce the plot by myself.
I compared these two variables:
- Dietary Energy Consumption (kcal/person/day) estimated by the FAO Food Balance Sheets.
- Prevalence of Obesity as percentage of defined population with a body mass index (BMI) of 30 kg/m2 or higher estimated by the World Health Organization
And this is the resulting plot:
As you can see there is a strong correlation between two variables. Why the experiment of Nate Silver shows the opposite? Obviously we did not plot the same data (although, in principle, both of us went to the same source). Anyway: to be honest, I prefer my plot because shows what all of we know: the more calories you eat, the more weight you will see in your bathroom scale. Some final thoughts seeing the plot:
- I would like to be Japanese: they don’t gain weight!
- Why US people are fatter than Austrian?
- What happens in Samoa?
Here you have the code to do the plot:
library(xlsx) library(dplyr) library(ggplot2) library(scales) setwd("YOUR WORKING DIRECTORY HERE") url_calories = "http://www.fao.org/fileadmin/templates/ess/documents/food_security_statistics/FoodConsumptionNutrients_en.xls" download.file(url_calories, method="internal", destfile = "FoodConsumptionNutrients_en.xls", mode = "ab") calories = read.xlsx(file="FoodConsumptionNutrients_en.xls", startRow = 4, colIndex = c(2,6), colClasses = c("character", "numeric"), sheetName="Dietary Energy Cons. Countries", stringsAsFactors=FALSE) colnames(calories)=c("Country", "Kcal") url_population = "http://esa.un.org/unpd/wpp/DVD/Files/1_Excel%20(Standard)/EXCEL_FILES/1_Population/WPP2015_POP_F01_1_TOTAL_POPULATION_BOTH_SEXES.XLS" download.file(url_population, method="internal", destfile = "Population.xls", mode = "ab") population = read.xlsx(file="Population.xls", startRow = 17, colIndex = c(3,71), colClasses = c("character", "numeric"), sheetName="ESTIMATES", stringsAsFactors=FALSE) colnames(population)=c("Country", "Population") # http://apps.who.int/gho/data/node.main.A900A?lang=en url_obesity = "http://apps.who.int/gho/athena/data/xmart.csv?target=GHO/NCD_BMI_30A&profile=crosstable&filter=AGEGROUP:*;COUNTRY:*;SEX:*&x-sideaxis=COUNTRY&x-topaxis=GHO;YEAR;AGEGROUP;SEX&x-collapse=true" obesity = read.csv(file=url_obesity, stringsAsFactors=FALSE) obesity %>% select(matches("Country|2014.*Both")) -> obesity colnames(obesity)=c("Country", "Obesity") obesity %>% filter(Obesity!="No data") -> obesity obesity %>% mutate(Obesity=as.numeric(substr(Obesity, 1, regexpr(pattern = "[", obesity$Obesity)-1))) -> obesity population %>% inner_join(calories,by = "Country") %>% inner_join(obesity,by = "Country") -> data opts=theme( panel.background = element_rect(fill="gray98"), panel.border = element_rect(colour="black", fill=NA), axis.line = element_line(size = 0.5, colour = "black"), axis.ticks = element_line(colour="black"), panel.grid.major = element_line(colour="gray75", linetype = 2), panel.grid.minor = element_blank(), axis.text = element_text(colour="gray25", size=15), axis.title = element_text(size=18, colour="gray10"), legend.key = element_blank(), legend.position = "none", legend.background = element_blank(), plot.title = element_text(size = 40, colour="gray10")) ggplot(data, aes(x=Kcal, y=Obesity/100, size=log(Population), label=Country), guide=FALSE)+ geom_point(colour="white", fill="sandybrown", shape=21, alpha=.55)+ scale_size_continuous(range=c(2,40))+ scale_x_continuous(limits=c(1500,4100))+ scale_y_continuous(labels = percent)+ labs(title="The World We Live In #5: Calories And Kilograms", x="Dietary Energy Consumption (kcal/person/day)", y="% population with body mass index >= 30 kg/m2")+ geom_text(data=subset(data, Obesity>35|Kcal>3700), size=5.5, colour="gray25", hjust=0, vjust=0)+ geom_text(data=subset(data, Kcal<2000), size=5.5, colour="gray25", hjust=0, vjust=0)+ geom_text(data=subset(data, Obesity<10 & Kcal>2600), size=5.5, colour="gray25", hjust=0, vjust=0)+ geom_text(aes(3100, .01), colour="gray25", hjust=0, label="Source: United Nations (size of bubble depending on population)", size=4.5)+opts
16 thoughts on “The World We Live In #5: Calories And Kilograms”
Wrt to the Samoa question:
This article in German
1) Hectical lifestyle is disrespected in Samoa
2) Being fat is associated with prosperity
3) People rarely go by foot
4) There was significant wealth due to being a german colony and natives got lazy.
Thanks! Very good explanation!
Other explanation for Samoa (not excluding, Andrej is correct): Genetics. I read once that island populations have a propensity for obesity because they formerly endured periods of scarcity separated by periods of abundance. Those that were able to accumulate more fat in periods of abundance (lower basal metabolism), had a better chance to survive (and therefore to propagate their genes). When they changed their lifestyle, their genetic background stabbed them in the back. Moreover, feeding changes (USA-like, see my comment about Austria below) could enhance it.
Japanese are known for eating lots of vegetables and fish. Good lifestyle and maybe good genetics.
Calories are not the only important thing (Austria vs. USA). Food quality is important. Fat quality is important. Maybe Austrians eat less glucids (carbs) or higher quality glucids (less sugar, sugar drinks, etc.; more fiber or vegetables), and higher quality fats (unsaturated, saturated but adequate fatty acid proportion, less trans fats, etc.).
I believe this may apply to more than small island populations. I have heard or read somewhere, though I have not seen any hard science, that the reason there is an epidemic of obesity and hence Type II diabetes among First Nations populations of North America is that the Hunter-Gatherer lifestyle, of a bygone era, forced people through repeated periods of abundance and scarcity. Those who could efficiently store fat in times of plenty and efficiently use stored fat in time of want would be more likely to survive and have children and hence pass on these efficient genes leading to populations that are obese in these time of continued surplus.
Is there any US data available on this type of activity?
A thought-provoking plot! To get more understanding, I think you need to look at the distributions of consumption and obesity, instead of just the means. For example, a resource-poor country can have an overfed, obese elite, so it will have low average consumption and relatively high obesity. It will appear at the same obesity level as a country with a well fed population that has neither starving masses nor an obese elite (e.g. Eritrea and Japan).
All plots of this type seem provoking since there is always some outlier that need to be explained. I think all of them show that we live in a complex world with many differences where just a simple plot is not enough. I agree with you in that skewness matters. Thank you for your comment.
When I tried to load the FAO file to the workspace (after succesfully downloading it) I kept getting this error. It wasn’t until I opened the directory that I discovered that the filename was spelt differently – FoodConsumptionNutrietns_en.xls (instead of …Nutrients). Just wanted to point that out!
Thanks for sharing this information.
Thanks for your comment!
When I tried to load the file “Population.xls”, R returned this error:
What went wrong?
Have you checked if the file “Population.xls” is in your working directory?
Yes, it’s there. When I tried opening it directly, it gave a warning. Maybe it’s corrupted. What do you think?
Not sure. Could you send me the exact text of the error you are getting?
Error in .jcall(“RJavaTools”, “Ljava/lang/Object;”, “invokeMethod”, cl, :
java.lang.IllegalArgumentException: Your InputStream was neither an OLE2 stream, nor an OOXML stream
Ok. This is what I’d do. Restart RSudio session (ctrl+shift+F10), rm list and gc() and try again. If it doesnt work, update xlsx package and try again. If it doesn’ work, try to read population frpm wikipedia using rvest package (is easy), if it doesn’t work, try to read xls using another package. Tell me what happens!
Okay. Thanks a lot. Will work on it and let you know.