Who is Alan Turing?

February 21, 2017The World We Live In, Time SeriesAlan Turing, anomaly detection, causal impact, R, Rstats, wikipedia@aschinchon

This government is committed to introducing posthumous pardons for people with certain historical sexual offence convictions who would be innocent of any crime now (British Government Spokesperson, September 2016)

Last September, the British government announced its intention to pursue what has become known as the Alan Turing law, offering exoneration to the tens of thousands of gay men convicted of historic charges. The law was finally unveiled on 20 October 2016.

This plot shows the daily views of the Alan Turing’s wikipedia page during the last 365 days:

There are three huge peaks in May 27th, July 30th, and October 29th that can be easily detected using AnomalyDetection package:

After substituting these anomalies by a simple linear imputation, it is clear that the time series has suffered a significant impact since the last days of September:

To estimate the amount of incremental views since September 28th (this is the date I have chosen as starting point) I use CausalImpact package:

Last plot shows the accumulated effect. After 141 days, there have been around 1 million of incremental views to the Alan Turing’s wikipedia page (more than 7.000 per day) and it does not seem ephemeral.

Alan Turing has won another battle, this time posthumous. And thanks to it, there is a lot of people that have discovered his amazing legacy: long life to Alan Turing.

This is the code I wrote to do the experiment:

library(httr)
library(jsonlite)
library(stringr)
library(xts)
library(highcharter)
library(AnomalyDetection)
library(imputeTS)
library(CausalImpact)
library(dplyr)

# Views last 365 days
(Sys.Date()-365) %>% str_replace_all("[[:punct:]]", "") %>% substr(1,8) -> date_ini
Sys.time()       %>% str_replace_all("[[:punct:]]", "") %>% substr(1,8) -> date_fin
url="https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia/all-access/all-agents/Alan%20Turing/daily"
paste(url, date_ini, date_fin, sep="/") %>% 
  GET  %>% 
  content("text") %>% 
  fromJSON %>% 
  .[[1]] -> wikistats

# To prepare dataset for highcharter
wikistats %>% 
  mutate(day=str_sub(timestamp, start = 1, end = 8)) %>% 
  mutate(day=as.POSIXct(day, format="%Y%m%d", tz="UTC")) -> wikistats

# Highcharts viz
rownames(wikistats)=wikistats$day
wikistats %>% select(views) %>% as.xts  %>% hchart

# Anomaly detection
wikistats %>% select(day, views) -> tsdf
tsdf %>%  
  AnomalyDetectionTs(max_anoms=0.01, direction='both', plot=TRUE)->res
res$plot

# Imputation of anomalies
tsdf[tsdf$day %in% as.POSIXct(res$anoms$timestamp, format="%Y-%m-%d", tz="UTC"),"views"]<-NA 
ts(tsdf$views, frequency = 365) %>% 
  na.interpolation() %>% 
  xts(order.by=wikistats$day) -> tscleaned
tscleaned %>% hchart

# Causal Impact from September 28th
x=sum(index(tscleaned)<"2016-09-28 UTC")
impact <- CausalImpact(data = tscleaned %>% as.numeric, 
                       pre.period = c(1,x),
                       post.period = c(x+1,length(tscleaned)), 
                       model.args = list(niter = 5000, nseasons = 7),
                       alpha = 0.05)
plot(impact)

How to Find Equidistant Coordinates Between Two Locations on Earth

February 13, 2017Maps, The World We Live InEarth, leaflet, maps, R, Rstats@aschinchon

Here’s to the ones who dream
foolish, as they may seem
(The Fools Who Dream, ‘La La Land’ OST)

One of the key points of The Meeting Point Locator is to obtain an orthogonal great circle to the bearing defined by any two given locations on Earth. A great circle is the intersection of the sphere and a plane that passes through the center point of the sphere. In other words, a great circle is a false meridian. The orthogonal great circle to the direction defined by any two given locations is the one which passes by all equidistant points to both of them (at least this is what I call orthogonal great circle). This was my first approach to obtain it:

Get the midpoint between the initial locations, let’s call it p₁
Calculate the direction (bearing angle) between the initial locations, let’s call it α
Obtain a very close point to p₁ (only 1 meter away) with bearing α+90, let’s call it p₂
Calculate the great circle which passes through p₁ and p₂

This is the code I used in this first approach:

library(dplyr)
library(ggmap)
library(geosphere)
library(leaflet)
library(ggplot2)
library(scales)
library(extrafont)
windowsFonts(Garamond=windowsFont("Garamond"))

#Starting places
place1="Madrid, Spain"
place2="Toledo, Spain"

# Call to Google Maps API to obtain coordinates of Starting places
p1=geocode(place1, output = "latlon")
p2=geocode(place2, output = "latlon")

#Midpoint of p1 and p2
mid=midPoint(p1, p2)

#Direction between p1 and p2
bea=bearingRhumb(p1, p2)

# Great circle between midpoint and 1-meter separated point with bearing bea+90
points=greatCircle(destPoint(p=mid, b=bea+90, d=1), mid, n=100)

# Arrange the points dependning on the distance to the input locations
data.frame(dist2p1=apply(points, 1, function (x) distGeo(p1, x)),
           dist2p2=apply(points, 1, function (x) distGeo(p2, x))) %>% 
  cbind(points) -> points

opts=theme(
  panel.background = element_rect(fill="gray90"),
  panel.border = element_rect(colour="black", fill=NA),
  axis.line = element_line(size = 0.5, colour = "black"),
  axis.ticks = element_line(colour="black"),
  panel.grid.major = element_line(colour="white", linetype = 2),
  panel.grid.minor = element_blank(),
  axis.text = element_text(colour="gray25", size=6, family = "Garamond"),
  axis.title = element_text(size=10, colour="gray10", family = "Garamond"),
  legend.key = element_blank(),
  legend.position = "none",
  legend.background = element_blank(),
  plot.title = element_text(size = 14, colour="gray10", family = "Garamond"),
  plot.subtitle = element_text(size = 10, colour="gray20", family = "Garamond"))

ggplot(points, aes(x=dist2p1, y=dist2p2), guide=FALSE)+
  geom_abline(intercept = 0, slope = 1, colour = "red", alpha=.25)+
  geom_point(colour="blue", fill="blue", shape=21, alpha=.8, size=1)+
  scale_x_continuous(label=scientific_format())+
  scale_y_continuous(label=scientific_format())+
  labs(title=paste(place1,"and" ,place2, sep=" "),
       subtitle="Equidistant points (2nd approach)",
       x=paste("Distance to" ,place1, "(Km)", sep=" "),
       y=paste("Distance to" ,place2, "(Km)", sep=" "))+opts

#Map
points %>% 
  leaflet() %>% 
  addTiles(urlTemplate = "https://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png") %>% 
  addCircleMarkers(
    lng=points$lon, lat=points$lat,
    radius = 6,
    color = "blue",
    stroke = FALSE, fillOpacity = 0.5) %>% 
  addCircleMarkers(
    lng=c(p1$lon, p2$lon), lat=c(p1$lat, p2$lat),
    radius = 6,
    color = "red",
    stroke = FALSE, fillOpacity = 0.5)

I was pretty sure that all points of this last great circle must be equidistant to the initial locations but I was wrong. When the starting points are enough close, everything goes well. This is an example with Madrid and Toledo (separated only by 67 kilometers) as starting points. The following plot shows the distance to Madrid and Toledo of 100 points on the great circle obtained as I described before:

This map shows also these 100 points (in blue) as well as the starting ones (in red):

Quite convincent. But this is what happens when I choose Tokyo and New York (10.873 kms. away) as the starting points:

And the map:

To be honest, I do not know why this happens but, based on the success obtained using close starting points, the final solution was simple: bring the starting points closer preserving the original midpoint. This was my second (and definitive) try:

And the map:

Mission accomplished. The final code:

library(dplyr)
library(ggmap)
library(geosphere)
library(leaflet)
library(ggplot2)
library(scales)
library(extrafont)
windowsFonts(Garamond=windowsFont("Garamond"))

# Starting places
place1="Tokyo, Japan"
place2="New York, USA"

# Call to Google Maps API to obtain coordinates of Starting places
p1=geocode(place1, output = "latlon")
p2=geocode(place2, output = "latlon")

# Midpoint of p1 and p2
mid=midPoint(p1, p2)
# Distance between p1 and p2
dist=distGeo(p1, p2)
# A simple piece of code to bring the starting points closer preserving the original midpoint
 x=p1
 y=p2
 while(dist>1000000)
 {
   x=midPoint(mid, x)
   y=midPoint(mid, y)
   dist=distGeo(x, y)
}
# Direction between resulting (close) points
bea=bearingRhumb(x, y)
# Great circle between midpoint and 1-meter separated point with bearing bea+90
points=greatCircle(destPoint(p=mid, b=bea+90, d=1), mid, n=100)

# Arrange the points dependning on the distance to the input locations
data.frame(dist2p1=apply(points, 1, function (x) distGeo(p1, x)),
           dist2p2=apply(points, 1, function (x) distGeo(p2, x))) %>% 
  cbind(points) -> points

opts=theme(
  panel.background = element_rect(fill="gray90"),
  panel.border = element_rect(colour="black", fill=NA),
  axis.line = element_line(size = 0.5, colour = "black"),
  axis.ticks = element_line(colour="black"),
  panel.grid.major = element_line(colour="white", linetype = 2),
  panel.grid.minor = element_blank(),
  axis.text = element_text(colour="gray25", size=6, family = "Garamond"),
  axis.title = element_text(size=10, colour="gray10", family = "Garamond"),
  legend.key = element_blank(),
  legend.position = "none",
  legend.background = element_blank(),
  plot.title = element_text(size = 14, colour="gray10", family = "Garamond"),
  plot.subtitle = element_text(size = 10, colour="gray20", family = "Garamond"))

ggplot(points, aes(x=dist2p1, y=dist2p2), guide=FALSE)+
  geom_abline(intercept = 0, slope = 1, colour = "red", alpha=.25)+
  geom_point(colour="blue", fill="blue", shape=21, alpha=.8, size=1)+
  scale_x_continuous(label=scientific_format())+
  scale_y_continuous(label=scientific_format())+
  labs(title=paste(place1,"and" ,place2, sep=" "),
       subtitle="Equidistant points (2nd approach)",
       x=paste("Distance to" ,place1, "(Km)", sep=" "),
       y=paste("Distance to" ,place2, "(Km)", sep=" "))+opts

points %>% 
  leaflet() %>% 
  addTiles(urlTemplate = "https://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png") %>% 
  addCircleMarkers(
    lng=points$lon, lat=points$lat,
    radius = 6,
    color = "blue",
    stroke = FALSE, fillOpacity = 0.5) %>% 
  addCircleMarkers(
    lng=c(p1$lon, p2$lon), lat=c(p1$lat, p2$lat),
    radius = 6,
    color = "red",
    stroke = FALSE, fillOpacity = 0.5)

The Ex Libris Generator

February 1, 2017Drawings, Shinyexlibris, ggplot2, Harmonograph, R, Rstats@aschinchon

Go ahead stomp your feet on the floorboards
Clap your hands if that’s really what you came here for
(Heaven, The Milk Carton Kids)

Inspired by curves created by the harmonograph, I have done a Shiny app to generate random images that you can personalize and use as an Exlibris. You can try the App here. For me, an exlibris (also known as bookplates) can be a nice, original and useful present for book-lovers. This is an example:
exlibris8
More examples:

I always put the code at the end of my posts. Since I always have doubts about how many people are interested in what I do, today will be different. I will share the code with those who ask it to me in any of the following ways:

Sending me a direct message on Twitter
Droping me an email

Cheers!

The Meeting Point Locator

January 16, 2017Going Bananas, Maps, The World We Live Inleaflet, R, Rstats@aschinchon

Hi Hillary, It’s Donald, would you like to have a beer with me in La Cabra Brewing, in Berwyn, Pensilvania? (Hypothetical utilization of The Meeting Point Locator)

Finding a place to have a drink with someone may become a difficult task. It is quite common that one of them does not want to move to the other’s territory. I am sure you have faced to this situation many times. With The Meeting Point Locator this will be no longer an issue, since it will give you a list of equidistant bars and coffees to any two given locations. Let’s see an example.

I do not know if Hillary Clinton and Donald Trump have met each other after the recent elections in United States, but the will probably do. Let’s suppose Hillary doesn’t want to go to The White House and that Donald prefers another place instead Hillary’s home. No problem at all. According to this, Hillary lives in Chappaqua, New York and Donald will live in The White House, Washington (although he supposedly won’t do full time as he announced recently). These two locations are the only input that The Meeting Point Locator needs to purpose equidistant places where having a drink. This is how it works:

Generates a number of coordinates on the great circle which passes through the midpoint of the original locations and is orthogonal to the rhumb defined by them; the number of points depends on the distance between the original locations.
Arranges these coordinates according to the distance to the original locations, from the nearest to the most distant.
Depending also on the distance of the original locations, defines a radius to search around each point generated on the great circle (once calculated, this radius is constant for all searches).
Starting from the nearest point, looks for a number of places (20 by default) to have a drink using the radius calculated previously. To do this, it calls to the Google Places API. Once the number of locations is reached, the proccess stops.

This map shows the places purposed for Hillary and Donald (blue points) as well as the original locations (red ones). You can make zoom in for details:

These are the 20 closest places to both of them:

listHillaryTrumpDT

Some other examples of the utility of The Meeting Point Locator:

Pau Gasol (who lives in San Antonio, Texas) and Marc Gasol (in Memphis, Tennessee) can meet in The Draft Sports Bar, in Leesville (Louisiana) to have a beer while watching a NBA match. It is 537 kilometers far from both of them.
Bob Dylan (who lives in Malibu, California) and The Swedish Academy (placed in Stockholm, Sweden) can smooth things over drinking a caipirinha in Bar São João, in Tremedal (Brasil), only 9.810 kilometers far from both of them.
Spiderman (placed in New York City) and Doraemon (in Tokio, Japan) can meet in Andreyevskaya, in Stroitel (Russia) to have a have a hot drink. Since they are superheroes, they will cover the 9.810 kilometers of separation in no time at all.

I faced with two challenges to do this experiment: how to generate the orthogonal great circle from two given locations and how to define radius and number of points over this circle to do searchings. I will try to explain in depth both things in the future in another post.

You will find the code below. To make it work, do not forget to get your own key for Google Places API Web Service here. I hope this tool will be helpful for someone; if yes, do not hesitate to tell it to me.

library(httr)
library(jsonlite)
library(dplyr)
library(ggmap)
library(geosphere)
library(DT)
library(leaflet)

# Write both addresses here (input)
place1="Chappaqua, New York, United States of America"
place2="The White House, Washington DC, United States of America"

# Call to Google Maps API to obtain coordinates of previous addresses
p1=geocode(place1, output = "latlon")
p2=geocode(place2, output = "latlon")

# To do searchings I need a radius
radius=ifelse(distGeo(p1, p2)>1000000, 10000,
              ifelse(distGeo(p1, p2)>100000, 2500, 1000))

# And a number of points
npoints=ifelse(distGeo(p1, p2)>1000000, 2002,
               ifelse(distGeo(p1, p2)>100000, 7991, 19744))

# Place here the Google Places API Key
key="PLACE_YOUR_OWN_KEY_HERE"

# Build the url to look for bars and cafes with the previous key
url1="https://maps.googleapis.com/maps/api/place/nearbysearch/json?location=lat,lon&radius="
url2="&types=cafe|bar&key="
url=paste0(url1,radius,url2,key)

# This is to obtain the great circle orthogonal to the rhumb defined by input locations
# and which passes over the midpoint. I will explain this step in the future
mid=midPoint(p1, p2)
dist=distGeo(p1, p2)
x=p1
y=p2
while(dist>1000000)
{
  x=midPoint(mid, x)
  y=midPoint(mid, y)
  dist=distGeo(x, y)
}

bea=bearingRhumb(x, y)
points=greatCircle(destPoint(p=mid, b=bea+90, d=1), mid, n=npoints)

# Arrange the points dependning on the distance to the input locations
data.frame(dist2p1=apply(points, 1, function (x) distGeo(p1, x)),
           dist2p2=apply(points, 1, function (x) distGeo(p2, x))) %>% 
  mutate(order=apply(., 1, function(x) {max(x)})) %>% 
  cbind(points) %>% 
  arrange(order) -> points

# Start searchings
nlocs=0 # locations counter (by default stops when 20 is reached)
niter=1 # iterations counter (if greater than number of points on the great circle, stops)
results=data.frame()
while(!(nlocs>=20 | niter>npoints)) {
  print(niter)
  url %>% 
    gsub("lat", points[niter, 'lat'], .) %>% 
    gsub("lon", points[niter, 'lon'], .) %>% 
    GET %>% 
    content("text") %>% 
    fromJSON -> retrieve
  
  df=data.frame(lat=retrieve$results$geometry$location$lat,
                lng=retrieve$results$geometry$location$lng,
                name=retrieve$results$name, 
                address=retrieve$results$vicinity)
  results %>% rbind(df)->results
  
  nlocs=nlocs+nrow(df)
  niter=niter+1 
}

# I prepare results to do a Data Table
data.frame(dist2p1=apply(results, 1, function (x) round(distGeo(p1, c(as.numeric(x[2]), as.numeric(x[1])))/1000, digits=1)),
           dist2p2=apply(results, 1, function (x) round(distGeo(p2, c(as.numeric(x[2]), as.numeric(x[1])))/1000, digits=1))) %>% 
  mutate(mx=apply(., 1, function(x) {max(x)})) %>% 
  cbind(results) %>% 
  arrange(mx) %>% 
  mutate(rank=row_number()) %>% 
  select(-mx)-> resultsDT

# This is the Data table
datatable(resultsDT, 
          class = 'cell-border stripe',
          rownames = FALSE,
          options = list(pageLength = 5),
          colnames = c('Distance to A (Km)', 
                       'Distance to B (Km)', 
                       'Latitude', 
                       'Longitude',
                       'Name', 
                       'Address', 
                       'Rank'))

# Map with the locations using leaflet
resultsDT %>% 
  leaflet() %>% 
  addTiles() %>% 
  addCircleMarkers(
    lng=resultsDT$lng, lat=resultsDT$lat,
    radius = 8,
    color = "blue",
    stroke = FALSE, fillOpacity = 0.5,
    popup=paste(paste0("<b>", resultsDT$name, "</b>"), resultsDT$address, sep="
")
  ) %>% 
  addCircleMarkers(
    lng=p1$lon, lat=p1$lat,
    radius = 10,
    color = "red",
    stroke = FALSE, fillOpacity = 0.5,
    popup=paste("<b>Place 1</b>", place1, sep="
")
  )%>% 
  addCircleMarkers(
    lng=p2$lon, lat=p2$lat,
    radius = 10,
    color = "red",
    stroke = FALSE, fillOpacity = 0.5,
    popup=paste("<b>Place 2</b>", place2, sep="
")
  )

Three Shiny Apps to Celebrate the Beauty of Maths

December 12, 2016ShinyCoaster, ggplot2, Harmonograph, R, Rstats, Shiny, Wool@aschinchon

Mathematics knows no races or geographic boundaries; for mathematics, the cultural world is one country (David Hilbert)

One of the best decisions I took this year related with this blog was to move it to my own self-hosted domain using WordPress.org. It allows to me, for example, to embed dynamic JavaScript visualizations like this one. Another thing I can do now is to upload my Shiny Apps to share them with my readers. In this post I have gathered three Apps I made some time ago; you can play with them as well as get the code I wrote for each one:

The Harmonograph: This App simulates harmonograph drawings. An harmonograph is a mechanism which draws trajectories by means of two pendulums: one moves a pencil and the other one moves a platform with a piece of paper on it. Click here to try it.
Shiny Wool Skeins: This App, inspired by this post, creates a plot consisting of chords inside a circle . You can change colors as well as the number and quality of the chords. Click here to try it.
The Coaster Maker: With this App you can create your own coasters using hypocicloids. Click here to try it.

I want to thank to my friend Jorge, without whom I would not have been able to make Shiny work in my server.

The Somnambulist and Pi

December 1, 2016Collaborations, Drawingscirclize, music, PI, R, Rstats, The Somnambulist@aschinchon

How wary we are of something warm and unborn. Something calmly by zero will divide (Unbegotten, The Somnambulist)

Some time ago, I assumed the mission to draw a plot for the cover of the new album of The Somnambulist, a music band from Berlin. They wanted a circlization of Pi, which is a graphic where numbers are represented in a circular layout. The idea is connecting each digit of Pi to its successive digit with links to the position of the numerically corresponding external sectors. I used a color palette composed by 10 nuances of the visible spectrum as a tribute for Planck, as Marco (the vocalist) requested me. After a number of attempts:

somnambulist

The album is named Unbegotten, a german word which means archaic. As Marco told me, in theology it also means kind of eternal because of being never born and so never dying. I like how π is integrated into the title to substitute the string “tt” in the middle. Pi is also eternal so the association is genuine.

The music of The Somnambulist is intense, dark and powerful and is waiting for you here to listen it. My favorite song is the one that gives name to the album.

If you want to know more about circlizong numbers, you can visit this post, where you also can see the code I used as starting point to do this plot.

Bayesian Blood

November 17, 2016Curiosities, Highcharts, The World We Live InBayes, Blood, highcharts, OpenData, R, Rstats@aschinchon

The fourth, the fifth, the minor fall and the major lift (Hallelujah, Leonard Cohen)

Next problem is extracted from MacKay’s Information Theory, Inference and Learning Algorithms:

Two people have left traces of their own blood at the scene of a crime. A suspect, Oliver, is tested and found to have type ‘O’ blood. The blood groups of the two traces are found to be of type ‘O’ (a common type in the local population, having frequency 60%) and of type ‘AB’ (a rare type, with frequency 1%). Do these data give evidence in favor of the proposition that Oliver was one of the people who left blood at the scene?

To answer the question, let’s first remember the probability form of Bayes theorem:

$p(H|D)=\dfrac{p(H)p(D|H)}{p(D)}$

where:

p(H) is the probability of the hypothesis H before we see the data, called the prior
p(H|D) is the probablity of the hyothesis after we see the data, called the posterior
p(D|H) is the probability of the data under the hypothesis, called the likelihood
p(D)is the probability of the data under any hypothesis, called the normalizing constant

If we have two hypothesis, A and B, we can write the ratio of posterior probabilities like this:

$\dfrac{p(A|D)}{p(B|D)}=\dfrac{p(A)p(D|A)}{p(B)p(D|B)}$

If p(A)=1-p(B) (what means that A and B are mutually exclusive and collective exhaustive), then we can rewrite the ratio of the priors and the ratio of the posteriors as odds. Writing o(A) for odds in favor of A, we get the odds form of Bayes theorem:

$o(A|D)=o(A)\dfrac{p(D|A)}{p(D|B)}$

Dividing through by o(A) we have:

$\dfrac{o(A|D)}{o(A)}=\dfrac{p(D|A)}{p(D|B)}$

The term on the left is the ratio of the posteriors and prior odds. The term on the right is the likelihood ratio, also called the Bayes factor. If it is greater than 1, that means that the data were more likely under A than under B. And since the odds ratio is also greater than 1, that means that the odds are greater, in light of the data, than they were before. If the Bayes factor is less than 1, that means the data were less likely under A than under B, so th odds in favor of A go down.

Let’s go back to our initial problem. If Oliver left his blood at the crime scene, the probability of the data is just the probability that a random member of the population has type ‘AB’ blood, which is 1%. If Oliver did not leave blood at the scene, what is the the chance of finding two people, one with type ‘O’ and one with type ‘AB’? There are two ways it might happen: the first person we choose might have type ‘O’ and the second ‘AB’, or the other way around. So the probability in this case is 2(0.6)(0.01)=1.2%. Dividing probabilities of both scenarios we obtain a Bayes factor of 0.83, and we conclude that the blood data is evidence against Oliver’s guilt.

Once I read this example, I decided to replicate it using real data of blood type distribution by country from here. After cleaning data, I have this nice data set to work with:

For each country, I get the most common blood type (the one which the suspect has) and the least common and replicate the previous calculations. For example, in the case of Spain, the most common type is ‘O+’ with 36% and the least one is ‘AB-‘ with 0.5%. The Bayes factor is 0.005/(2(0.36)(0.005))=1.39 so data support the hypothesis of guilt in this case. Next chart shows Bayes factor accross countries:

Just some comments:

Sometimes data consistent with a hypothesis are not necessarily in favor of the hypothesis
How different is the distribution of blood types between countries!
If you are a estonian ‘A+’ murderer, choose carefully your accomplice

This is the code of the experiment:

library(rvest)
library(dplyr)
library(stringr)
library(DT)
library(highcharter)

# Webscapring of the table with the distribution of blood types
url <- "http://www.rhesusnegative.net/themission/bloodtypefrequencies/"
blood <- url %>%
   read_html() %>%
   html_node(xpath='/html/body/center/table') %>%
   html_table(fill=TRUE)

# Some data cleansing
blood %>% slice(-c(66:68)) -> blood

blood[,-c(1:2)] %>% 
  sapply(gsub, pattern=",", replacement=".") %>% 
  as.data.frame %>% 
  sapply(gsub, pattern=".79.2", replacement=".79") %>% 
  as.data.frame-> blood[,-c(1:2)]

blood %>% 
  sapply(gsub, pattern="%|,", replacement="") %>% 
  as.data.frame -> blood

blood[,-1] = apply(blood[,-1], 2, function(x) as.numeric(as.character(x)))


blood[,-c(1:2)] %>% mutate_all(funs( . / 100)) -> blood[,-c(1:2)]

# And finally, we have a nice data set
datatable(blood, 
          rownames = FALSE,
          options = list(
          searching = FALSE,
          pageLength = 10)) %>% 
  formatPercentage(3:10, 2)

# Calculate the Bayes factor
blood %>% 
  mutate(factor=apply(blood[,-c(1,2)], 1, function(x) {min(x)/(2*min(x)*max(x))})) %>% 
  arrange(factor)-> blood

# Data Visualization
highchart() %>% 
     hc_chart(type = "column") %>% 
     hc_title(text = "Bayesian Blood") %>%
     hc_subtitle(text = "An experiment about the Bayes Factor") %>%  
     hc_xAxis(categories = blood$Country, 
             labels = list(rotation=-90, style = list(fontSize = "12px")))  %>% 
     hc_yAxis(plotBands = list(list(from = 0, to = 1, color = "rgba(255,215,0, 0.8)"))) %>% 
     hc_add_series(data = blood$factor,
                   color = "rgba(255, 0, 0, 0.5)",
                   name = "Bayes Factor")%>% 
  hc_yAxis(min=0.5) %>% 
  hc_tooltip(pointFormat = "{point.y:.2f}") %>% 
  hc_legend(enabled = FALSE) %>% 
  hc_exporting(enabled = TRUE) %>%
  hc_chart(zoomType = "xy")

Visualizing the Daily Variability of Bitcoin with Quandl and Highcharts

October 4, 2016HighchartsBitcoin, DataViz, highcharts, Quandl, R, Rstats@aschinchon

Lay your dreams, little darling, in a flower bed; let that sunshine in your hair (Where the skies are blue, The Lumineers)

I discovered this nice visualization some days ago. The author is also the creator of Highcharter, an incredible R wrapper for Highcharts javascript libray and its modules. I am a big fan of him.

Inspired by his radial plot, I did a visualization of the daily evolution of Daily Bitcoin exchange rate (BTC vs. EUR) on Localbtc. Data is sourced from here and I used Quandl to obtain the data frame. Quandl is a marketplace for financial and economic data delivered in modern formats for today’s analysts. There is a package called Quandl to interact directly with the Quandl API to download data in a number of formats usable in R. You only need to locate the data you want in the Quandl site. In my case data are here.

After loading data, I do the folowing steps:

Filtering data to obtain last 12 complete months
Create a new variable with the difference between closing and opening price of Bitcoin (in Euros)
Create a color variable to distinguish between positive and negative differences
Create the graph using Fivethirtyeight theme for highcharts

This is the result:

Apart of its appealing, I think is a good way to to have a quick overview of the evolution of a stock price. This is the code to do the experiment:

library(Quandl)
library(dplyr)
library(highcharter)
library(lubridate)
bitcoin=Quandl("BCHARTS/LOCALBTCEUR")
bitcoin %>% 
  arrange(Date) %>% 
  mutate(tmstmp = datetime_to_timestamp(Date)) -> bitcoin
last_date=max(bitcoin$Date)
if (day(last_date+1)==1) date_to=last_date else 
  date_to=ymd(paste(year(last_date), month(last_date),1, sep="-"))-1
date_from=ymd(paste(year(date_to)-1, month(date_to)+1,1, sep="-"))
bitcoin %>% filter(Date>=date_from, Date<=date_to) -> bitcoin
var_bitcoin <- bitcoin %>% 
  mutate(Variation = Close - Open,
         color = ifelse(Variation>=0, "green", "red"),
         y = Variation) %>% 
  select(x = tmstmp,
         y,
         variation = Variation,
         name = Date,
         color,
         open = Open,
         close = Close) %>% 
  list.parse3()
x <- c("Open", "Close", "Variation")
y <- sprintf("{point.%s}", tolower(x))
tltip <- tooltip_table(x, y)
hc <- highchart() %>% 
  hc_title(text = "Bitcoin Exchange Rate (BTC vs. EUR)") %>% 
  hc_subtitle(text = "Daily Variation on Localbtc. Last 12 months")%>% 
  hc_chart(
    type = "column",
    polar = TRUE) %>%
  hc_plotOptions(
    series = list(
      stacking = "normal",
      showInLegend = FALSE)) %>% 
  hc_xAxis(
    gridLineWidth = 0.5,
    type = "datetime",
    tickInterval = 30 * 24 * 3600 * 1000,
    labels = list(format = "{value: %b}")) %>% 
  hc_yAxis(showFirstLabel = FALSE) %>% 
  hc_add_series(data = var_bitcoin) %>% 
  hc_add_theme(hc_theme_538()) %>% 
  hc_tooltip(useHTML = TRUE,
    headerFormat = as.character(tags$small("{point.x:%d %B, %Y}")),
    pointFormat = tltip)
hc

Chaotic Galaxies

September 6, 2016Chaos, Drawingschaos, ggplot, R, Rstats@aschinchon

Tell me, which side of the earth does this nose come from? Ha! (ALF)

Reading about strange attractors I came across with this book, where I discovered a way to generate two dimensional chaotic maps. The generic equation is pretty simple:

$x_{n+1}= a_{1}+a_{2}x_{n}+a_{3}x_{n}^{2}+a_{4}x_{n}y_{n}+a_{5}y_{n}+a_{6}y_{n}^{2}$
$y_{n+1}= a_{7}+a_{8}x_{n}+a_{9}x_{n}^{2}+a_{10}x_{n}y_{n}+a_{11}y_{n}+a_{12}y_{n}^{2}$

I used it to generate these chaotic galaxies:

Changing the vector of parameters you can obtain other galaxies. Do you want to try?

library(ggplot2)
library(dplyr)
#Generic function
attractor = function(x, y, z)
{
  c(z[1]+z[2]*x+z[3]*x^2+ z[4]*x*y+ z[5]*y+ z[6]*y^2, 
    z[7]+z[8]*x+z[9]*x^2+z[10]*x*y+z[11]*y+z[12]*y^2)
}
#Function to iterate the generic function over the initial point c(0,0)
galaxy= function(iter, z)
{
  df=data.frame(x=0,y=0)
  for (i in 2:iter) df[i,]=attractor(df[i-1, 1], df[i-1, 2], z)
  df %>% rbind(data.frame(x=runif(iter/10, min(df$x), max(df$x)), 
                          y=runif(iter/10, min(df$y), max(df$y))))-> df
  return(df)
}
opt=theme(legend.position="none",
          panel.background = element_rect(fill="#00000c"),
          plot.background = element_rect(fill="#00000c"),
          panel.grid=element_blank(),
          axis.ticks=element_blank(),
          axis.title=element_blank(),
          axis.text=element_blank(),
          plot.margin=unit(c(-0.1,-0.1,-0.1,-0.1), "cm"))
#First galaxy
z1=c(1.0, -0.1, -0.2,  1.0,  0.3,  0.6,  0.0,  0.2, -0.6, -0.4, -0.6,  0.6)
galaxy1=galaxy(iter=2400, z=z1) %>% ggplot(aes(x,y))+
  geom_point(shape= 8, size=jitter(12, factor=4), color="#ffff99", alpha=jitter(.05, factor=2))+
  geom_point(shape=16, size= jitter(4, factor=2), color="#ffff99", alpha=jitter(.05, factor=2))+
  geom_point(shape=46, size= 0, color="#ffff00")+opt
#Second galaxy
z2=c(-1.1, -1.0,  0.4, -1.2, -0.7,  0.0, -0.7,  0.9,  0.3,  1.1, -0.2,  0.4)
galaxy2=galaxy(iter=2400, z=z2) %>% ggplot(aes(x,y))+
  geom_point(shape= 8, size=jitter(12, factor=4), color="#ffff99", alpha=jitter(.05, factor=2))+
  geom_point(shape=16, size= jitter(4, factor=2), color="#ffff99", alpha=jitter(.05, factor=2))+
  geom_point(shape=46, size= 0, color="#ffff00")+opt
#Third galaxy
z3=c(-0.3,  0.7,  0.7,  0.6,  0.0, -1.1,  0.2, -0.6, -0.1, -0.1,  0.4, -0.7)
galaxy3=galaxy(iter=2400, z=z3) %>% ggplot(aes(x,y))+
  geom_point(shape= 8, size=jitter(12, factor=4), color="#ffff99", alpha=jitter(.05, factor=2))+
  geom_point(shape=16, size= jitter(4, factor=2), color="#ffff99", alpha=jitter(.05, factor=2))+
  geom_point(shape=46, size= 0, color="#ffff00")+opt
#Fourth galaxy
z4=c(-1.2, -0.6, -0.5,  0.1, -0.7,  0.2, -0.9,  0.9,  0.1, -0.3, -0.9,  0.3)
galaxy4=galaxy(iter=2400, z=z4) %>% ggplot(aes(x,y))+
  geom_point(shape= 8, size=jitter(12, factor=4), color="#ffff99", alpha=jitter(.05, factor=2))+
  geom_point(shape=16, size= jitter(4, factor=2), color="#ffff99", alpha=jitter(.05, factor=2))+
  geom_point(shape=46, size= 0, color="#ffff00")+opt

The Breathtaking 1-Matrix

August 22, 2016Drawingscirclize, Matrix, R, Rstats, Silky@aschinchon

La luna sale a caminar siguiendo tus pupilas (Ojos color sol, Calle 13)

This is a 5×5 1-matrix:

$\begin{bmatrix} 1 &1 &1 &1 &1 \\ 1 &1 &1 &1 &1 \\ 1 &1 &1 &1 &1 \\ 1 &1 &1 &1 &1 \\ 1 &1 &1 &1 &1 \end{bmatrix}$

And this is a 20×20 1-matrix visualized:

Rplot001

Maybe in some other galaxy, aliens represent matrix in this way.

par(mar = c(1, 1, 1, 1), bg="violetred4")
circlize::chordDiagram(matrix(1, 20, 20),
                       col="white",
                       symmetric = TRUE,
                       transparency = 0.85,
                       annotationTrack = NULL)

Fronkonstin

Experiments in R