Every inch of sky’s got a star

Every inch of skin’s got a scar

(Everything Now, Arcade Fire)

I think that a very good way to start with R is doing an interactive visualization of some open data because you will train many important skills of a data scientist: loading, cleaning, transforming and combinig data and performing a suitable visualization. Doing it interactive will give you an idea of the power of R as well, because you will also realise that you are able to *handle* indirectly other programing languages such as JavaScript.

That’s precisely what I’ve done today. I combined two interesting datasets:

- The probability of computerisation of 702 detailed occupations, obtained by Carl Benedikt Frey and Michael A. Osborne from the University of Oxford, using a Gaussian process classifier and published in this paper in 2013.
- Statistics of jobs from (employments, median annual wages and typical education needed for entry) from the US Bureau of Labor, available here.

Apart from using `dplyr`

to manipulate data and `highcharter`

to do the visualization, I used `tabulizer`

package to extract the table of probabilities of computerisation from the `pdf`

: it makes this task extremely easy.

This is the resulting plot:

If you want to examine it in depth, here you have a full size version.

These are some of my insights (its corresponding figures are obtained directly from the dataset):

- There is a
*moderate*negative correlation between wages and probability of computerisation. - Around 45% of US employments are
*threatened*by machines (have a computerisation probability higher than 80%): half of them do not require formal education to entry. - In fact, 78% of jobs which do not require formal education to entry are
*threatened*by machines: 0% which require a master’s degree are. - Teachers are absolutely irreplaceable (0% are
*threatened*by machines) but they earn a 2.2% less then the average wage (unfortunately, I’m afraid this phenomenon occurs in many other countries as well). - Don’t study for librarian or archivist: it seems a bad way to invest your time
- Mathematicians will survive to machines

The code of this experiment is available here.