Every inch of sky’s got a star
Every inch of skin’s got a scar
(Everything Now, Arcade Fire)
I think that a very good way to start with R is doing an interactive visualization of some open data because you will train many important skills of a data scientist: loading, cleaning, transforming and combinig data and performing a suitable visualization. Doing it interactive will give you an idea of the power of R as well, because you will also realise that you are able to handle indirectly other programing languages such as JavaScript.
That’s precisely what I’ve done today. I combined two interesting datasets:
- The probability of computerisation of 702 detailed occupations, obtained by Carl Benedikt Frey and Michael A. Osborne from the University of Oxford, using a Gaussian process classifier and published in this paper in 2013.
- Statistics of jobs from (employments, median annual wages and typical education needed for entry) from the US Bureau of Labor, available here.
Apart from using dplyr
to manipulate data and highcharter
to do the visualization, I used tabulizer
package to extract the table of probabilities of computerisation from the pdf
: it makes this task extremely easy.
This is the resulting plot:
If you want to examine it in depth, here you have a full size version.
These are some of my insights (its corresponding figures are obtained directly from the dataset):
- There is a moderate negative correlation between wages and probability of computerisation.
- Around 45% of US employments are threatened by machines (have a computerisation probability higher than 80%): half of them do not require formal education to entry.
- In fact, 78% of jobs which do not require formal education to entry are threatened by machines: 0% which require a master’s degree are.
- Teachers are absolutely irreplaceable (0% are threatened by machines) but they earn a 2.2% less then the average wage (unfortunately, I’m afraid this phenomenon occurs in many other countries as well).
- Don’t study for librarian or archivist: it seems a bad way to invest your time
- Mathematicians will survive to machines
The code of this experiment is available here.
the main problem with any analysis of education/income/substitution is the issue of where BA+ is channeling. in the past, it was largely true that BA+ led to direct production employment. over the last few decades we have seen the finance sector, aka FIRE, take more of the BA+ graduates. and most of FIRE is non-production, aka overhead. one can find lots of graphs/data at FRED. this is not a pleasant trend.
Fantastic post and insight! Looks my my occupation-equivalent is safe for now.
Thank you!
Gran post Antonio
Muchas gracias Ángel 🙂