Up in the sky you just feel fine, there is no running out of time and you never cross a line (Up In The Sky, 77 Bombay Street)
In this post I analyze two datasets from Enigma:
- Migration flows: Every 10 years, since 1960, the World Bank estimates migrations worldwide (267.960 rows)
- World population: Values and percentages of populations for each nation examined beginning in year 1960, by the World Bank’s Health, Nutrition and Population project (4.168.185 rows)
Since the second dataset is very large, I load it into R using fread
function of data.table
package, which is extremely fast. To filter datasets, I also use dplyr
and pipes of magrittr
package (my life changed since I discovered it).
To build a comparable indicator across countries, I divide migration flows (from and to each country) by the mean population in each decade. I do this because migration flows are aggregated for each decade since 1960. For example, during the first decade of 21st century, Argentina reveived 1.537.850 inmigrants, which represents a 3,99% of the mean population of the country in this decade. In the same period, inmigration to Burundi only represented a 0,67% of its mean population.
What happened in the whole world in that decade? There were around 166 million people who moved to other countries. It represents a 2.58% of the mean population of the world. I use this figure to divide countries into four groups:
- Isolated: countries with both % of inmigrants and % of migrants under 2.58%
- Emitter: countries with % of inmigrants under 2.58% and % of migrants over 2.58%
- Receiver: countries with % of inmigrants over 2.58% and % of migrants under 2.58%
- Transit: countries with both % of inmigrants and % of migrants over 2.58%
To create the map I use leaflet
package as I did in my previous post. Shapefile of the world can be downloaded here. This is how the world looks like according to this segmentation:
Some conclusions:
- There are just sixteen receiver countries: United Arab Emirates, Argentina, Australia, Bhutan, Botswana, Costa Rica, Djibouti, Spain, Gabon, The Gambia, Libya, Qatar, Rwanda, Saudi Arabia, United States and Venezuela
- China and India (the two most populous countries in the world) are isolated
- Transit countries are concentrated in the north hemisphere and most of them are located in cold latitudes
- There are six emitter countries with more than 30% of emigrants between 2000 and 2009: Guyana, Tonga, Tuvalu, Jamaica, Bosnia and Herzegovina and Albania
This is the code you need to reproduce the map:
library(data.table) library(dplyr) library(leaflet) library(rgdal) library(RColorBrewer) setwd("YOU WORKING DIRECTORY HERE") populflows = read.csv(file="enigma-org.worldbank.migration-remittances.migrants.migration-flow-c57405e33412118c8757b1052e8a1490.csv", stringsAsFactors=FALSE) population = fread("enigma-org.worldbank.hnp.data-eaa31d1a34fadb52da9d809cc3bec954.csv") # Population population %>% filter(indicator_name=="Population, total") %>% as.data.frame %>% mutate(decade=(year-year%%10)) %>% group_by(country_name, country_code, decade) %>% summarise(avg_pop=mean(value)) -> population2 # Inmigrants by country populflows %>% filter(!is.na(total_migrants)) %>% group_by(migration_year, destination_country) %>% summarise(inmigrants = sum(total_migrants)) %>% merge(population2, by.x = c("destination_country", "migration_year"), by.y = c("country_name", "decade")) %>% mutate(p_inmigrants=inmigrants/avg_pop) -> inmigrants # Migrants by country populflows %>% filter(!is.na(total_migrants)) %>% group_by(migration_year, country_of_origin) %>% summarise(migrants = sum(total_migrants)) %>% merge(population2, by.x = c("country_of_origin", "migration_year"), by.y = c("country_name", "decade")) %>% mutate(p_migrants=migrants/avg_pop) -> migrants # Join of data sets migrants %>% merge(inmigrants, by.x = c("country_code", "migration_year"), by.y = c("country_code", "migration_year")) %>% filter(migration_year==2000) %>% select(country_of_origin, country_code, avg_pop.x, migrants, p_migrants, inmigrants, p_inmigrants) %>% plyr::rename(., c("country_of_origin"="Country", "country_code"="Country.code", "avg_pop.x"="Population.mean", "migrants"="Total.migrants", "p_migrants"="p.of.migrants", "inmigrants"="Total.inmigrants", "p_inmigrants"="p.of.inmigrants")) -> populflows2000 # Threshold to create groups populflows2000 %>% summarise(x=sum(Total.migrants), y=sum(Total.inmigrants), z=sum(Population.mean)) %>% mutate(m=y/z) %>% select(m) %>% as.numeric -> avg # Segmentation populflows2000$Group="Receiver" populflows2000[populflows2000$p.of.migrants>avg & populflows2000$p.of.inmigrants>avg, "Group"]="Transit" populflows2000[populflows2000$p.of.migrants<avg & populflows2000$p.of.inmigrants<avg, "Group"]="Isolated" populflows2000[populflows2000$p.of.migrants>avg & populflows2000$p.of.inmigrants<avg, "Group"]="Emitter" #Loading shapefile from http://data.okfn.org/data/datasets/geo-boundaries-world-110m countries=readOGR("json/countries.geojson", "OGRGeoJSON") # Join shapefile and enigma information joined=merge(countries, populflows2000, by.x="wb_a3", by.y="Country.code", all=FALSE, sort = FALSE) joined$Group=as.factor(joined$Group) # To define one color by segment factpal=colorFactor(brewer.pal(4, "Dark2"), joined$Group) leaflet(joined) %>% addPolygons(stroke = TRUE, color="white", weight=1, smoothFactor = 0.2, fillOpacity = .8, fillColor = ~factpal(Group)) %>% addTiles() %>% addLegend(pal = factpal, values=c("Emitter", "Isolated", "Receiver", "Transit"))
Thanks a lot for an interesting map.
On the code, most on the last lines have been commented out just after =”Emitter” until addPolygons’ line. There’s also a common just before leaflet(joined).
I’m new to the R world. Thanks again for this interesting map
Thanks! I think is already fixed.
Because of the large populations of China and India, wouldn’t it be more difficult for either emigrants or immigrants to be over 2.58%? Perhaps a ratio of emigrant to immigrant would also be quite enlightening?
Perhaps the reason China and India are isolated is because it is much harder for a large population country to achieve over 2.58% in either category? A ratio of emigrant to immigrant would be enlightening, I would think.