Every inch of sky’s got a star
Every inch of skin’s got a scar
(Everything Now, Arcade Fire)
I think that a very good way to start with R is doing an interactive visualization of some open data because you will train many important skills of a data scientist: loading, cleaning, transforming and combinig data and performing a suitable visualization. Doing it interactive will give you an idea of the power of R as well, because you will also realise that you are able to handle indirectly other programing languages such as JavaScript.
That’s precisely what I’ve done today. I combined two interesting datasets:
- The probability of computerisation of 702 detailed occupations, obtained by Carl Benedikt Frey and Michael A. Osborne from the University of Oxford, using a Gaussian process classifier and published in this paper in 2013.
- Statistics of jobs from (employments, median annual wages and typical education needed for entry) from the US Bureau of Labor, available here.
Apart from using dplyr to manipulate data and highcharter to do the visualization, I used tabulizer package to extract the table of probabilities of computerisation from the pdf: it makes this task extremely easy.
This is the resulting plot:
If you want to examine it in depth, here you have a full size version.
These are some of my insights (its corresponding figures are obtained directly from the dataset):
- There is a moderate negative correlation between wages and probability of computerisation.
- Around 45% of US employments are threatened by machines (have a computerisation probability higher than 80%): half of them do not require formal education to entry.
- In fact, 78% of jobs which do not require formal education to entry are threatened by machines: 0% which require a master’s degree are.
- Teachers are absolutely irreplaceable (0% are threatened by machines) but they earn a 2.2% less then the average wage (unfortunately, I’m afraid this phenomenon occurs in many other countries as well).
- Don’t study for librarian or archivist: it seems a bad way to invest your time
- Mathematicians will survive to machines
The code of this experiment is available here.
