Every inch of sky’s got a star

Every inch of skin’s got a scar

(Everything Now, Arcade Fire)

I think that a very good way to start with R is doing an interactive visualization of some open data because you will train many important skills of a data scientist: loading, cleaning, transforming and combinig data and performing a suitable visualization. Doing it interactive will give you an idea of the power of R as well, because you will also realise that you are able to *handle* indirectly other programing languages such as JavaScript.

That’s precisely what I’ve done today. I combined two interesting datasets:

- The probability of computerisation of 702 detailed occupations, obtained by Carl Benedikt Frey and Michael A. Osborne from the University of Oxford, using a Gaussian process classifier and published in this paper in 2013.
- Statistics of jobs from (employments, median annual wages and typical education needed for entry) from the US Bureau of Labor, available here.

Apart from using `dplyr`

to manipulate data and `highcharter`

to do the visualization, I used `tabulizer`

package to extract the table of probabilities of computerisation from the `pdf`

: it makes this task extremely easy.

This is the resulting plot:

If you want to examine it in depth, here you have a full size version.

These are some of my insights (its corresponding figures are obtained directly from the dataset):

- There is a
*moderate*negative correlation between wages and probability of computerisation. - Around 45% of US employments are
*threatened*by machines (have a computerisation probability higher than 80%): half of them do not require formal education to entry. - In fact, 78% of jobs which do not require formal education to entry are
*threatened*by machines: 0% which require a master’s degree are. - Teachers are absolutely irreplaceable (0% are
*threatened*by machines) but they earn a 2.2% less then the average wage (unfortunately, I’m afraid this phenomenon occurs in many other countries as well). - Don’t study for librarian or archivist: it seems a bad way to invest your time
- Mathematicians will survive to machines

The code of this experiment is available here.

the main problem with any analysis of education/income/substitution is the issue of where BA+ is channeling. in the past, it was largely true that BA+ led to direct production employment. over the last few decades we have seen the finance sector, aka FIRE, take more of the BA+ graduates. and most of FIRE is non-production, aka overhead. one can find lots of graphs/data at FRED. this is not a pleasant trend.

Fantastic post and insight! Looks my my occupation-equivalent is safe for now.

Thank you!

Gran post Antonio

Muchas gracias Ángel 🙂