This government is committed to introducing posthumous pardons for people with certain historical sexual offence convictions who would be innocent of any crime now (British Government Spokesperson, September 2016)
Last September, the British government announced its intention to pursue what has become known as the Alan Turing law, offering exoneration to the tens of thousands of gay men convicted of historic charges. The law was finally unveiled on 20 October 2016.
This plot shows the daily views of the Alan Turing’s wikipedia page during the last 365 days:
There are three huge peaks in May 27th, July 30th, and October 29th that can be easily detected using AnomalyDetection package:
After substituting these anomalies by a simple linear imputation, it is clear that the time series has suffered a significant impact since the last days of September:
To estimate the amount of incremental views since September 28th (this is the date I have chosen as starting point) I use CausalImpact package:
Last plot shows the accumulated effect. After 141 days, there have been around 1 million of incremental views to the Alan Turing’s wikipedia page (more than 7.000 per day) and it does not seem ephemeral.
Alan Turing has won another battle, this time posthumous. And thanks to it, there is a lot of people that have discovered his amazing legacy: long life to Alan Turing.
This is the code I wrote to do the experiment:
library(httr) library(jsonlite) library(stringr) library(xts) library(highcharter) library(AnomalyDetection) library(imputeTS) library(CausalImpact) library(dplyr) # Views last 365 days (Sys.Date()-365) %>% str_replace_all("[[:punct:]]", "") %>% substr(1,8) -> date_ini Sys.time() %>% str_replace_all("[[:punct:]]", "") %>% substr(1,8) -> date_fin url="https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia/all-access/all-agents/Alan%20Turing/daily" paste(url, date_ini, date_fin, sep="/") %>% GET %>% content("text") %>% fromJSON %>% .[[1]] -> wikistats # To prepare dataset for highcharter wikistats %>% mutate(day=str_sub(timestamp, start = 1, end = 8)) %>% mutate(day=as.POSIXct(day, format="%Y%m%d", tz="UTC")) -> wikistats # Highcharts viz rownames(wikistats)=wikistats$day wikistats %>% select(views) %>% as.xts %>% hchart # Anomaly detection wikistats %>% select(day, views) -> tsdf tsdf %>% AnomalyDetectionTs(max_anoms=0.01, direction='both', plot=TRUE)->res res$plot # Imputation of anomalies tsdf[tsdf$day %in% as.POSIXct(res$anoms$timestamp, format="%Y-%m-%d", tz="UTC"),"views"]<-NA ts(tsdf$views, frequency = 365) %>% na.interpolation() %>% xts(order.by=wikistats$day) -> tscleaned tscleaned %>% hchart # Causal Impact from September 28th x=sum(index(tscleaned)<"2016-09-28 UTC") impact <- CausalImpact(data = tscleaned %>% as.numeric, pre.period = c(1,x), post.period = c(x+1,length(tscleaned)), model.args = list(niter = 5000, nseasons = 7), alpha = 0.05) plot(impact)
Interesante! que sencillo se hace el experimento utilizando dichos paquetes.
No se cuanta gente utiliza AnomalyDetection y CausalImpact, pero sería entretenido hacer `hchart` para estos _objetos_ 😀
Saludos Antonio!
¡Hola Joshua! Sí, son muy fáciles de usar. Los dos paquetes creo que los han hecho gente de Twitter. Yo creo que van a tener éxito. Si te animas a hacer algo con highcharter yo seguro que lo uso. ¡Un abrazo!
Great article! Simple and highly effective EDA.
The packages authors are:
CausalImpact by Google
AnomalyDetection by Twitter.
Thank you!