Tag Archives: Rstats

Drawing 10 Million Points With ggplot: Clifford Attractors

For me, mathematics cultivates a perpetual state of wonder about the nature of mind, the limits of thoughts, and our place in this vast cosmos (Clifford A. Pickover – The Math Book: From Pythagoras to the 57th Dimension, 250 Milestones in the History of Mathematics)

I am a big fan of Clifford Pickover and I find inspiration in his books very often. Thanks to him, I discovered the harmonograph and the Parrondo’s paradox, among many other mathematical treasures. Apart of being a great teacher, he also invented a family of strange attractors wearing his name. Clifford attractors are defined by these equations:

x_{n+1}\, =\, sin(a\, y_{n})\, +\, c\, cos(a\, x_{n})  \\  y_{n+1}\, =\, sin(b\, x_{n})\, +\, d\, cos(b\, y_{n})  \\

There are infinite attractors, since a, b, c and d are parameters. Given four values (one for each parameter) and a starting point (x0, y0), the previous equation defines the exact location of the point at step n, which is defined just by its location at n-1; an attractor can be thought as the trajectory described by a particle. This plot shows the evolution of a particle starting at (x0, y0)=(0, 0) with parameters a=-1.24458046630025, b=-1.25191834103316, c=-1.81590817030519 and d=-1.90866735205054 along 10 million of steps:

Changing parameters is really entertaining. Drawings have a sandy appearance:

From a technical point of view, the challenge is creating a data frame with all locations, since it must have 10 milion rows and must be populated sequentially. A very fast way to do it is using Rcpp package. To render the plot I use ggplot, which works quite well. Here you have the code to play with Clifford Attractors if you want:

library(Rcpp)
library(ggplot2)
library(dplyr)

opt = theme(legend.position  = "none",
            panel.background = element_rect(fill="white"),
            axis.ticks       = element_blank(),
            panel.grid       = element_blank(),
            axis.title       = element_blank(),
            axis.text        = element_blank())

cppFunction('DataFrame createTrajectory(int n, double x0, double y0, 
            double a, double b, double c, double d) {
            // create the columns
            NumericVector x(n);
            NumericVector y(n);
            x[0]=x0;
            y[0]=y0;
            for(int i = 1; i < n; ++i) {
            x[i] = sin(a*y[i-1])+c*cos(a*x[i-1]);
            y[i] = sin(b*x[i-1])+d*cos(b*y[i-1]);
            }
            // return a new data frame
            return DataFrame::create(_["x"]= x, _["y"]= y);
            }
            ')

a=-1.24458046630025
b=-1.25191834103316 
c=-1.81590817030519 
d=-1.90866735205054

df=createTrajectory(10000000, 0, 0, a, b, c, d)

png("Clifford.png", units="px", width=1600, height=1600, res=300)
ggplot(df, aes(x, y)) + geom_point(color="black", shape=46, alpha=.01) + opt
dev.off()

A Shiny App to Create Sentimental Tweets Based on Project Gutenberg Books

There was something about them that made me uneasy, some longing and at the same time some deadly fear – Dracula (Stoker, Bram)

Twitter is a very good source of inspiration. Some days ago I came across with this:

The tweet refers to a presentation (in Spanish) available here, which is a very concise and well illustrated document about the state-of-the-art of text mining in R. I discovered there several libraries that I will try to use in the future. In this experiment I have used one of them: the syuzhet package. As can be read in the documentation:

this package extracts sentiment and sentiment-derived plot arcs from text using three sentiment dictionaries conveniently packaged for consumption by R users. Implemented dictionaries include syuzhet (default) developed in the Nebraska Literary Lab, afinn developed by Finn Arup Nielsen, bing developed by Minqing Hu and Bing Liu, and nrc developed by Mohammad, Saif M. and Turney, Peter D.

You can find a complete explanation of the package in its vignette. A very interesting application of these techniques is the Sentiment Graph of a book, which represents how sentiment changes over time. This is the Sentiment Graph of Romeo and Juliet, by William Shakespeare, taken from Project Alexandria:

Darkest sentiments can be seen at the end of the book, where the tragedy reaches its highest level. It is also nice to see how sentiments are cyclical. This graphs can be very useful for people who just want to read happy endings books (my sister is one of those).

Inspired by this analysis, I have done another experiment in which I download a book from Project Gutenberg and measure sentiment of all its sentences. Based on this measurement, I filter top 5% (positive or negative sentiment) sentences to build tweets. I have done a Shiny app where all these steps are explained. The app is available here.

From a technical point of view I used selectize JavaScript library to filter books in a flexible way. I customized as well the appearance with CSS bootstrap from Bootswatch as explained here.

This is the code of the experiment.

UI.R:

library(shiny)

fluidPage(theme = "bootstrap.css",

  titlePanel(h1("Sentimental Tweets from Project Gutenberg Books", align="center"),
             windowTitle="Tweets from Project Gutenberg"),
  sidebarLayout(
      sidebarPanel(

        selectInput(
          'book', 'Choose a book:', 
          multiple=FALSE,
          selectize = TRUE,
          choices=c("Enter some words of title or author" = "", gutenberg_works$searchstr)
          ),
        
        radioButtons(inputId = "sent",
             label = "Choose sentiment:",
             choices = c("Dark"="1", "Bright"="20"),
             selected="1",
             inline=TRUE),
        
        radioButtons(inputId = "meth",
                    label = "Choose a method to measure sentiment:",
                    choices = c("syuzhet", "bing", "afinn", "nrc"),
                    selected="syuzhet",
                    inline=TRUE),
        
        radioButtons(inputId = "char",
                     label = "Number of characters (max):",
                     choices = list("140", "280"),
                     inline=TRUE),

        checkboxInput(inputId = "auth",
                      label = "Add author",
                      value=FALSE),
        
        checkboxInput(inputId = "titl",
                      label = "Add title",
                      value=FALSE),
        
        checkboxInput(inputId = "post",
                      label="Add link to post (thanks!)",
                      value=TRUE),
        
        textInput(inputId = "adds",
                  label="Something else?",
                  placeholder="Maybe a #hastag?"),
        
        actionButton('do','Go!', 
                     class="btn btn-success action-button", 
                     css.class="btn btn-success")
  ),
  

  
  mainPanel(
     tags$br(),
     p("First of all, choose a book entering some keywords of its 
        title or author and doing dropdown navigation. Books are 
        downloaded from Project Gutenberg. You can browse the complete 
        catalog", tags$a(href = "https://www.gutenberg.org/catalog/", "here.")),

     p("After that, choose the sentiment of tweets you want to generate. 
        There are four possible methods than can return slightly different results. 
        All of them assess the sentiment of each word of a sentence and sum up the 
        result to give a scoring for it. The more negative is this scoring, 
        the", em("darker") ,"is the sentiment. The more positive, the ", em("brighter."), 
        " You can find a nice explanation of these techniques",
        tags$a(href = "http://www.matthewjockers.net/2017/01/12/resurrecting/", "here.")),
        
        p("Next parameters are easy: you can add the title and author of the book where 
          sentence is extracted as well as a link to my blog and any other string you want. 
          Clicking on the lower button you will get after some seconds a tweet below. 
          Click as many times you want until you like the result."),
     
     p("Finally, copy, paste and tweet. ",strong("Enjoy it!")),
     tags$br(),
     tags$blockquote(textOutput("tweet1")),
     tags$br()

)))

Server.R:

library(shiny)

function(input, output) {
  
  values <- reactiveValues(default = 0)
  
  observeEvent(input$do,{
    values$default <- 1
  })

  book <- eventReactive(input$do, {
    GetTweet(input$book, input$meth, input$sent, input$char,
             input$auth, input$titl, input$post, input$adds)
  })
  
  output$tweet1 <- renderText({
    if(values$default == 0){
      "Your tweet will appear here ..."
    }
    else{
      book()
    }
  })
}

Global.R:

library(gutenbergr)
library(dplyr)
library(stringr)
library(syuzhet)

x <- tempdir() # Read the Project Gutenberg catalog and filter english works. I also create a column with # title and author to make searchings gutenberg_metadata %>%
  filter(has_text, language=="en", gutenberg_id>0, !is.na(author)) %>%
  mutate(searchstr=ifelse(is.na(author), title, paste(title, author, sep= " - "))) %>%
  mutate(searchstr=str_replace_all(searchstr, "[\r\n]" , "")) %>%
  group_by(searchstr) %>%
  summarize(gutenberg_id=min(gutenberg_id)) %>%
  ungroup() %>%
  na.omit() %>%
  filter(str_length(searchstr)<100)-> gutenberg_works

# This function generates a tweet according the UI settings (book, method, sentiment and
# number of characters). It also appends some optional strings at the end
GetTweet = function (string, method, sentim, characters,
                     author, title, link, hastag)
 {
  # Obtain gutenberg_id from book 
  gutenberg_works %>%
     filter(searchstr == string) %>%
     select(gutenberg_id) %>% .$gutenberg_id -> result
  
  # Download text, divide into sentences and score sentiment. Save results to do it once and
  # optimize performance
  if(!file.exists(paste0(x,"/","book",result,"_",method,".RDS")))
  {
    book=gutenberg_download(result)
    book[,2] %>% 
      as.data.frame() %>% 
      .$text %>% 
      paste(collapse=" ") -> text
    
    sentences_v <- get_sentences(text)
    sentiment_v <- get_sentiment(sentences_v, method=method) data.frame(sentence=sentences_v, sentiment=sentiment_v) %>% 
      mutate(length=str_length(sentence)) -> results
    saveRDS(results, paste0(x,"/","book",result,"_",method,".RDS"))
  }
   
  results=readRDS(paste0(x,"/","book",result,"_",method,".RDS"))
  book_info=gutenberg_metadata %>% filter(gutenberg_id==result)
  
  # Paste optional strings to append at the end
  post=""
  if (title)  post=paste("-", book_info[,"title"], post, sep=" ")
  if (author) post=paste0(post, " (", str_trim(book_info[,"author"]), ")")
  if (link)   post=paste(post, "https://wp.me/p7VZWY-16S", sep=" ")
  post=paste(post, hastag, sep=" ")
  length_post=nchar(post)

  # Calculate 5% quantiles
  results %>% 
    filter(length<=(as.numeric(characters)-length_post)) %>%
     mutate(sentiment=jitter(sentiment)) %>% 
     mutate(group = cut(sentiment, 
                        include.lowest = FALSE,
                        labels = FALSE,
                        breaks = quantile(sentiment, probs = seq(0, 1, 0.05)))) -> results
   
  # Obtain a sample sentence according sentiment and append optional string to create tweet
  results %>% 
     filter(group==as.numeric(sentim)) %>% 
     sample_n(1) %>% 
     select(sentence) %>% 
     .$sentence %>% 
     as.character() %>% 
     str_replace_all("[.]", "") %>% 
    paste(post, sep=" ") -> tweet
  
  return(tweet)

 }

Visualizing the Spanish Contribution to The Metropolitan Museum of Art

Well I walk upon the river like it’s easier than land
(Love is All, The Tallest Man on Earth)


The Metropolitan Museum of Art provides here a dataset with information on more than 450.000 artworks in its collection. You can do anything you want with these data: there are no restrictions of use. Each record contains information about the author, title, type of work, dimensions, date, culture and  geography of a particular piece.

I can imagine a bunch of things to do with these data but since I am a big fan of highcharter,  I have done a treemap, which is an artistic (as well as efficient) way to visualize hierarchical data. A treemap is useful to visualize frequencies. They can handle levels, allowing to navigate to go into detail about any category. Here you can find a good example of treemap.

To read data I use fread function from data.table package. I also use this package to do some data wrangling operations on the data set. After them, I filter it looking for the word SPANISH in the columns Artist Nationality and Culture and looking for the word SPAIN in the column Country. For me, any piece created by an Spanish artist (like this one), coming from Spanish culture (like this one) or from Spain (like this one) is Spanish (this is my very own definition and may do not match with any academical one). Once it is done, it is easy to extract some interesting figures:

  • There are 5.294 Spanish pieces in The Met, which means a 1,16% of the collection
  • This percentage varies significantly between departments: it raises to 9,01% in The Cloisters and to 4,83% in The Robert Lehman Collection; on the other hand, it falls to 0.52% in The Libraries and to 0,24% in Photographs.
  • The Met is home to 1.895 highlights and 44 of them (2,32%) are Spanish; It means that Spanish art is twice as important as could be expected (remember that represents a 1,16% of the entire collection)

My treemap represents the distribution of Spanish artworks by department (column Department) and type of work (column Classification). There are two important things to know before doing a treemap with highcharter:

  • You have to use treemap function from treemap package to create a list with your data frame that will serve as input for hctreemap function
  • hctreemap fails if some category name is the same as any of its subcategories. To avoid this, make sure that all names are distinct.

This is the treemap:

Here you can see a full size version of it.

There can be seen several things at a glance: most of the pieces are drawings and prints and european sculpture and decorative arts (in concrete, prints and textiles), there is also big number of costumes, arms and armor is a very fragmented department … I think treemap is a good way to see what kind of works owns The Met.

Mi favorite spanish piece in The Met is the stunning Portrait of Juan de Pareja by Velázquez, which illustrates this post: how nice would be to see it next to El Primo in El Museo del Prado!

Feel free to use my code to do your own experiments:

library(data.table)
library(dplyr)
library(stringr)
library(highcharter)
library(treemap)

file="MetObjects.csv"
# Download data
if (!file.exists(file)) download.file(paste0("https://media.githubusercontent.com/media/metmuseum/openaccess/master/", file), 
                                      destfile=file,
                                      mode='wb')
# Read data
data=fread(file, sep=",", encoding="UTF-8")

# Modify column names to remove blanks
colnames(data)=gsub(" ", ".", colnames(data))

# Clean columns to prepare for searching
data[,`:=`(Artist.Nationality_aux=toupper(Artist.Nationality) %>% str_replace_all("\\[\\d+\\]", "") %>% 
             iconv(from='UTF-8', to='ASCII//TRANSLIT'),
           Culture_aux=toupper(Culture) %>% str_replace_all("\\[\\d+\\]", "") %>% 
             iconv(from='UTF-8', to='ASCII//TRANSLIT'),
           Country_aux=toupper(Country) %>% str_replace_all("\\[\\d+\\]", "") %>% 
             iconv(from='UTF-8', to='ASCII//TRANSLIT'))]

# Look for Spanish artworks
data[Artist.Nationality_aux %like% "SPANISH" | 
       Culture_aux %like% "SPANISH" | 
       Country_aux %like% "SPAIN"] -> data_spain

# Count artworks by Department and Classification
data_spain %>% 
  mutate(Classification=ifelse(Classification=='', "miscellaneous", Classification)) %>% 
  mutate(Department=tolower(Department),
         Classification1=str_match(Classification, "(\\w+)(-|,|\\|)")[,2],
         Classification=ifelse(!is.na(Classification1), 
                               tolower(Classification1), 
                               tolower(Classification))) %>% 
  group_by(Department, Classification) %>% 
  summarize(Objects=n()) %>% 
  ungroup %>% 
  mutate(Classification=ifelse(Department==Classification, paste0(Classification, "#"), 
                               Classification)) %>% 
  as.data.frame() -> dfspain

# Do treemap without drawing
tm_dfspain <- treemap(dfspain, index = c("Department", "Classification"),
                      draw=F,
                      vSize = "Objects", 
                      vColor = "Objects",
                      type = "index")

# Do highcharter treemap 
hctreemap(
  tm_dfspain,
  allowDrillToNode = TRUE,
  allowPointSelect = T,
  levelIsConstant = F,
  levels = list(
    list(
      level = 1,
      dataLabels = list (enabled = T, color = '#f7f5ed', style = list("fontSize" = "1em")),
      borderWidth = 1
    ),
    list(
      level = 2,
      dataLabels = list (enabled = F,  align = 'right', verticalAlign = 'top', 
                         style = list("textShadow" = F, "fontWeight" = 'light', "fontSize" = "1em")),
      borderWidth = 0.7
    ) 
  )) %>% 
  hc_title(text = "Spanish Artworks in The Met") %>% 
  hc_subtitle(text = "Distribution by Department") -> plot

plot

The Cycling Accident Map of Madrid City

Far away, this ship has taken me far away (Starlight, Muse)

Madrid City has an Open Data platform where can be found around 300 data sets about a number of topics. One of these sets is the one I used for this experiment. It contains information about cycling accidents  happened in the city from January to July 2017. I have done a map to locate where the accidents took place. This experiment shows how R makes very easy to create professional maps with Leaflet (in this case I use Carto basemaps).

To locate accidents the data set only contains the address where they happened so the first thing I did is to obtain their geographical coordinates using geocode function from ggmap package. There were 431 accidents during the first 7 months of 2017 (such a big number!) and I got coordinates of 407 so I can locate 94% of the accidents.

Obviously, the amount of accidents in some place depend on how many bikers circulate there as well as on its infrastructure. None of these things can be seen in the map: It only shows number of accidents.

The categorization of accidents is:

  • Double collision (Colisión doble): Traffic accident occurred between two moving vehicles.
  • Multiple collision (Colisión múltiple): Traffic accident occurred between more than two moving vehicles.
  • Fixed object collision (Choque con objeto fijo): Accident occurred between a moving vehicle with a driver and an immovable object that occupies the road or separated area of ​​the same, whether parked vehicle, tree, street lamp, etc.
  • Accident (Atropello): Accident occurred between a vehicle and a pedestrian that occupies the road or travels by sidewalks, refuges, walks or zones of the public road not destined to the circulation of vehicles.
  • Overturn (Vuelco): Accident suffered by a vehicle with more than two wheels which by some circumstance loses contact with the road and ends supported on one side or on its roof.
  • Motorcycle fall (Caída motocicleta): Accident suffered by a motorcycle, which at some moment loses balance, because of the driver or due to the conditions of the road.
  • Moped fall (Caída ciclomotor): Accident suffered by a moped, which at some moment loses balance, because of the driver or due to the conditions of the road.
  • Bicycle fall (Caída bicicleta): Accident suffered by a bicycle, which at some moment loses balance, because of the driver or due to the conditions of the road.

These categories are redundant (e.g. Double and Multiple collision), difficult to understand (e.g. Overturn) or both things at the same time (e.g. Motorcycle fall and Moped fall). This categorization also forgets human damages incurred by the accident.

Taking all these things in mind, this is the map:

Here is a full-screen version of the map.

My suggestions to the city council of Madrid are:

  1. Add geographical coordinates to data (I guess many of the analysis will need them)
  2. Rethink the categorization to make it clearer and more informative
  3. Add more cycling data sets to the platform (detail of bikeways, traffic …) to understand accidents better
  4. Attending just to the number of accidents , put the focus around Parque del Retiro, specially on its west surroundings, from Plaza de Cibeles to Plaza de Carlos V: more warning signals, more  (or better) bikeways …

I add the code below to update the map (If someone ask it to me, I can do it myself regularly):

library(dplyr)
library(stringr)
library(ggmap)
library(leaflet)
# First, getting the data
download.file(paste0("http://datos.madrid.es/egob/catalogo/", file), 
              destfile="300110-0-accidentes-bicicleta.csv")

data=read.csv("300110-0-accidentes-bicicleta.csv", sep=";", skip=1)

# Prepare data for geolocation
data %>% 
  mutate(direccion=paste(str_trim(Lugar), str_trim(Numero), "MADRID, SPAIN", sep=", ") %>% 
           str_replace("NA, ", "") %>% 
           str_replace(" - ", " CON ")) -> data

# Geolocation (takes some time ...)
coords=c()
for (i in 1:nrow(data)) 
{
  coords %>% rbind(geocode(data[i,"direccion"])) -> coords
  Sys.sleep(0.5)
}
  
# Save data, just in case
data %>% cbind(coords) %>% saveRDS(file="bicicletas.RDS")

data=readRDS(file="bicicletas.RDS")

# Remove non-successfull geolocations
data %>% 
  filter(!is.na(lon)) %>% 
  droplevels()-> data

# Remove non-successfull geolocations
data %>% mutate(Fecha=paste0(as.Date(data$Fecha, "%d/%m/%Y"), " ", TRAMO.HORARIO),
                popup=paste0("<b>Dónde:</b>",
                             direccion,
                             "<b>Cuándo:</b>",
                             Fecha,
                             "<b>Qué pasó:</b>",
                             Tipo.Accidente)) -> data

# Do the map
data %>% split(data$Tipo.Accidente) -> data.df

l <- leaflet() %>% addProviderTiles(providers$CartoDB.Positron)

names(data.df) %>%
  purrr::walk( function(df) {
    l <<- l %>%
      addCircleMarkers(data=data.df[[df]],
                 lng=~lon, lat=~lat,
                 popup=~popup,
                 color="red",
                 stroke=FALSE,
                 fillOpacity = 0.8,
                 group = df,
                 clusterOptions = markerClusterOptions(removeOutsideVisibleBounds = F))
  })

l %>%
  addLayersControl(
    overlayGroups = names(data.df),
    options = layersControlOptions(collapsed = FALSE)
  )

Plants

Blue dragonflies dart to and fro
I tie my life to your balloon and let it go
(Warm Foothills, Alt-J)

In my last post I did some drawings based on L-Systems. These drawings are done sequentially. At any step, the state of the drawing can be described by the position (coordinates) and the orientation of the pencil. In that case I only used two kind of operators: drawing a straight line and turning a constant angle. Today I used two more symbols to do stack operations:

  • “[“ Push the current state (position and orientation) of the pencil onto a pushdown
    operations stack
  • “]” Pop a state from the stack and make it the current state of the pencil (no line is drawn)

These operators allow to return to a previous state to continue drawing from there. Using them you can draw plants like these:

Each image corresponds to a different axiom, rules, angle and depth. I described these terms in my previous post. If you want to reproduce them you can find the code below (each image corresponds to a different set of axiom, rules, angle and depth parameters). Change colors, add noise to angles, try your own plants … I am sure you will find nice images:


library(gsubfn)
library(stringr)
library(dplyr)
library(ggplot2)

#Plant 1
axiom="F"
rules=list("F"="FF-[-F+F+F]+[+F-F-F]")
angle=22.5
depth=4

#Plant 2
axiom="X"
rules=list("X"="F[+X][-X]FX", "F"="FF")
angle=25.7
depth=7

#Plant 3
axiom="X"
rules=list("X"="F[+X]F[-X]+X", "F"="FF")
angle=20
depth=7

#Plant 4
axiom="X"
rules=list("X"="F-[[X]+X]+F[+FX]-X", "F"="FF")
angle=22.5
depth=5

#Plant 5
axiom="F"
rules=list("F"="F[+F]F[-F]F")
angle=25.7
depth=5

#Plant 6
axiom="F"
rules=list("F"="F[+F]F[-F][F]")
angle=20
depth=5


for (i in 1:depth) axiom=gsubfn(".", rules, axiom)

actions=str_extract_all(axiom, "\\d*\\+|\\d*\\-|F|L|R|\\[|\\]|\\|") %>% unlist

status=data.frame(x=numeric(0), y=numeric(0), alfa=numeric(0))
points=data.frame(x1 = 0, y1 = 0, x2 = NA, y2 = NA, alfa=90, depth=1)


for (action in actions) 
{
  if (action=="F")
  {
    x=points[1, "x1"]+cos(points[1, "alfa"]*(pi/180))
    y=points[1, "y1"]+sin(points[1, "alfa"]*(pi/180))
    points[1,"x2"]=x
    points[1,"y2"]=y
    data.frame(x1 = x, y1 = y, x2 = NA, y2 = NA, 
               alfa=points[1, "alfa"],
               depth=points[1,"depth"]) %>% rbind(points)->points
  }
  if (action %in% c("+", "-")){
    alfa=points[1, "alfa"]
    points[1, "alfa"]=eval(parse(text=paste0("alfa",action, angle)))
  }
  if(action=="["){ 
    data.frame(x=points[1, "x1"], y=points[1, "y1"], alfa=points[1, "alfa"]) %>% 
      rbind(status) -> status
    points[1, "depth"]=points[1, "depth"]+1
  }
  
  if(action=="]"){ 
    depth=points[1, "depth"]
    points[-1,]->points
    data.frame(x1=status[1, "x"], y1=status[1, "y"], x2=NA, y2=NA, 
               alfa=status[1, "alfa"],
               depth=depth-1) %>% 
      rbind(points) -> points
    status[-1,]->status
  }
}

ggplot() + 
  geom_segment(aes(x = x1, y = y1, xend = x2, yend = y2), 
               lineend = "round", 
               colour="white",
               data=na.omit(points)) + 
  coord_fixed(ratio = 1) +
  theme(legend.position="none",
        panel.background = element_rect(fill="black"),
        panel.grid=element_blank(),
        axis.ticks=element_blank(),
        axis.title=element_blank(),
        axis.text=element_blank())

A Shiny App to Draw Curves Based on L-System

Don’t worry about a thing ’cause every little thing gonna be alright (Three Little Birds, Bob Marley)

One of my favourite books is The Computational Beauty of Nature by Gary William Flake where there is a fantastic chapter about fractals in which I discovered the L-Systems.

L-Systems were conceived  in 1968 by Aristide Lindenmayer, a Hungarian biologist, as a mathematical description of plant growth. Apart from the Wikipedia, there are many places on the Internet where you can read about them. If you are interested, don’t miss The Algorithmic Beauty of Plants, an awesome book by Przemysław Prusinkiewicz that you can obtain here for free.

Roughly speaking, a L-System is a very efficient way to make drawings. In its simplest way consists in two different actions: draw a straigh line and change the angle. This is just what you need, for example, to draw a square: draw a straigh line of  any length, turn 90 degrees (without drawing), draw another straigh line of the same length, turn 90 degrees in the same direction, draw, turn and draw again. Denoting F as the action of drawing a line of length d and + as turning 90 degrees right, the whole process to draw a square can be represented as F+F+F+F.

L-Systems are quite simple to program in R. You only need to substitute the rules iteratively into the axiom (I use gsubfn function to do it) and split the resulting chain into parts with str_extract_all, for example. The result is a set of very simple actions (draw or turn) that can be visualized with ggplot and its path geometry. There are four important parameters in L-Systems:

  • The seed of the drawing, called axiom
  • The substitutions to be applied iteratively, called rules
  • How many times to apply substitutions, called depth
  • Angle of each turning

For example, let’s define the next L-System:

  • Axiom: F-F-F-F
  • Rule: F → F−F+F+FF−F−F+F

The rule means that every F must be replaced by F−F+F+FF−F−F+F while + means right turning and - left one. After one iteration, the axiom is replaced by F-F+F+FF-F-F+F-F-F+F+FF-F-F+F-F-F+F+FF-F-F+F-F-F+F+FF-F-F+F and iterating again, the new string is F-F+F+FF-F-F+F-F-F+F+FF-F-F+F+F-F+F+FF-F-F+F+F-F+F+FF-F-F+FF-F+F+FF-F-F+F-F-F+F+FF-F-F+F-F-F+F+FF-F-F+F+F-F+F+FF-F-F+F-F-F+F+FF-F-F+F-F-F+F+FF-F-F+F+F-F+F+FF-F-F+F+F-F+F+FF-F-F+FF-F+F+FF-F-F+F-F-F+F+FF-F-F+F-F-F+F+FF-F-F+F+F-F+F+FF-F-F+F-F-F+F+FF-F-F+F-F-F+F+FF-F-F+F+F-F+F+FF-F-F+F+F-F+F+FF-F-F+FF-F+F+FF-F-F+F-F-F+F+FF-F-F+F-F-F+F+FF-F-F+F+F-F+F+FF-F-F+F-F-F+F+FF-F-F+F-F-F+F+FF-F-F+F+F-F+F+FF-F-F+F+F-F+F+FF-F-F+FF-F+F+FF-F-F+F-F-F+F+FF-F-F+F-F-F+F+FF-F-F+F+F-F+F+FF-F-F+F. As you can see, the length of the string grows exponentially. Converting last string into actions, produces this drawing, called Koch Island:

It is funny how different axioms and rules produce very different drawings. I have done a Shiny App to play with L-systems. Although it is quite simple, it has two interesting features I would like to undeline:

  • Delay reactions with eventReactive to allow to set depth and angle values before refreshing the plot
  • Build a dynamic UI that reacts to user input depending on the curve choosen

There are twelve curves in the application: Koch Island (and 6 variations), cuadratic snowflake, Sierpinsky triangle, hexagonal Gosper, quadratic Gosper and Dragon curve. These are their plots:

The definition of all these curves (axiom and rules) can be found in the first chapter of the Prusinkiewicz’s book. The magic comes when you modify angles and colors. These are some examples among the infinite number of possibilities that can be created:

I enjoyed a lot doing and playing with the app. You can try it here. If you do a nice drawing, please let me know in Twitter or dropping me an email. This is the code of the App:

ui.R:

library(shiny)

shinyUI(fluidPage(
  titlePanel("Curves based on L-systems"),
  
  sidebarLayout(
    sidebarPanel(
      selectInput("cur", "Choose a curve:",
                  c("","Koch Island",
                    "Cuadratic Snowflake",
                    "Koch Variation 1",
                    "Koch Variation 2",
                    "Koch Variation 3",
                    "Koch Variation 4",
                    "Koch Variation 5",
                    "Koch Variation 6",
                    "Sierpinsky Triangle",
                    "Dragon Curve",
                    "Hexagonal Gosper Curve",
                    "Quadratic Gosper Curve"),
                  selected = ""),
      
      conditionalPanel(
        condition = "input.cur != \"\"",
        uiOutput("Iterations")),
      
      conditionalPanel(
        condition = "input.cur != \"\"",
        uiOutput("Angle")),
      
      conditionalPanel(
        condition = "input.cur != \"\"",
        selectInput("lic", label = "Line color:", choices = colors(), selected = "black")),
      
      
      conditionalPanel(
        condition = "input.cur != \"\"",
        selectInput("bac", label = "Background color:", choices = colors(), selected = "white")),
      
      conditionalPanel(
        condition = "input.cur != \"\"",
        actionButton(inputId = "go", label = "Go!", 
                     style="color: #fff; background-color: #337ab7; border-color: #2e6da4"))
      
      
      
      
    ), 
    mainPanel(plotOutput("curve", height="550px", width = "100%"))
  )
  
))

server.R:

library(shiny)
library(gsubfn)
library(stringr)
library(dplyr)
library(ggplot2)
library(rlist)

shinyServer(function(input, output) {
   
  curves=list(
    list(name="Koch Island",
         axiom="F-F-F-F",
         rules=list("F"="F-F+F+FF-F-F+F"),
         angle=90,
         n=2,
         alfa0=90),
    list(name="Cuadratic Snowflake",
         axiom="-F",
         rules=list("F"="F+F-F-F+F"),
         angle=90,
         n=4,
         alfa0=90),
    list(name="Koch Variation 1",
         axiom="F-F-F-F",
         rules=list("F"="FF-F-F-F-F-F+F"),
         angle=90,
         n=3,
         alfa0=90),
    list(name="Koch Variation 2",
         axiom="F-F-F-F",
         rules=list("F"="FF-F-F-F-FF"),
         angle=90,
         n=4,
         alfa0=90),
    list(name="Koch Variation 3",
         axiom="F-F-F-F",
         rules=list("F"="FF-F+F-F-FF"),
         angle=90,
         n=3,
         alfa0=90),
    list(name="Koch Variation 4",
         axiom="F-F-F-F",
         rules=list("F"="FF-F--F-F"),
         angle=90,
         n=4,
         alfa0=90),
    list(name="Koch Variation 5",
         axiom="F-F-F-F",
         rules=list("F"="F-FF--F-F"),
         angle=90,
         n=5,
         alfa0=90),
    list(name="Koch Variation 6",
         axiom="F-F-F-F",
         rules=list("F"="F-F+F-F-F"),
         angle=90,
         n=4,
         alfa0=90),
    list(name="Sierpinsky Triangle",
         axiom="R",
         rules=list("L"="R+L+R", "R"="L-R-L"),
         angle=60,
         n=6,
         alfa0=0),
    list(name="Dragon Curve",
         axiom="L",
         rules=list("L"="L+R+", "R"="-L-R"),
         angle=90,
         n=10,
         alfa0=90),
    list(name="Hexagonal Gosper Curve",
         axiom="L",
         rules=list("L"="L+R++R-L--LL-R+", "R"="-L+RR++R+L--L-R"),
         angle=60,
         n=4,
         alfa0=60),
    list(name="Quadratic Gosper Curve",
         axiom="-R",
         rules=list("L"="LL-R-R+L+L-R-RL+R+LLR-L+R+LL+R-LR-R-L+L+RR-", 
                    "R"="+LL-R-R+L+LR+L-RR-L-R+LRR-L-RL+L+R-R-L+L+RR"),
         angle=90,
         n=2,
         alfa0=90))
  
  output$Iterations <- renderUI({ if (input$cur!="") curve=list.filter(curves, name==input$cur) else curve=list.filter(curves, name=="Koch Island") iterations=list.select(curve, n) %>% unlist
    numericInput("ite", "Depth:", iterations, min = 1, max = (iterations+2))
  })
  
  output$Angle <- renderUI({ curve=list.filter(curves, name==input$cur) angle=list.select(curve, angle) %>% unlist
    numericInput("ang", "Angle:", angle, min = 0, max = 360)
  })
  
  data <- eventReactive(input$go, { curve=list.filter(curves, name==input$cur) axiom=list.select(curve, axiom) %>% unlist
    rules=list.select(curve, rules)[[1]]$rules
    alfa0=list.select(curve, alfa0) %>% unlist
    
    for (i in 1:input$ite) axiom=gsubfn(".", rules, axiom)
    actions=str_extract_all(axiom, "\\d*\\+|\\d*\\-|F|L|R|\\[|\\]|\\|") %>% unlist
    
    points=data.frame(x=0, y=0, alfa=alfa0)
    for (i in 1:length(actions)) 
    {
      if (actions[i]=="F"|actions[i]=="L"|actions[i]=="R")
      {
        x=points[nrow(points), "x"]+cos(points[nrow(points), "alfa"]*(pi/180))
        y=points[nrow(points), "y"]+sin(points[nrow(points), "alfa"]*(pi/180))
        alfa=points[nrow(points), "alfa"]
        points %>% rbind(data.frame(x=x, y=y, alfa=alfa)) -> points
      }
      else{
        alfa=points[nrow(points), "alfa"]
        points[nrow(points), "alfa"]=eval(parse(text=paste0("alfa",actions[i], input$ang)))
      }
    }
    return(points)
  })
  
  output$curve <- renderPlot({    
    ggplot(data(), aes(x, y)) + 
      geom_path(color=input$lic) + 
      coord_fixed(ratio = 1) +
      theme(legend.position="none",
            panel.background = element_rect(fill=input$bac),
            panel.grid=element_blank(),
            axis.ticks=element_blank(),
            axis.title=element_blank(),
            axis.text=element_blank())
  })
    
})

Sunflowers for COLOURlovers

Andar, lo que es andar, anduve encima siempre de las nubes (Del tiempo perdido, Robe)

If you give importance to colours, maybe you know already COLOURlovers. As can be read in their website, COLOURlovers is a creative community where people from around the world create and share colors, palettes and patterns, discuss the latest trends and explore colorful articles… All in the spirit of love.

There is a R package called colourlovers which provides access to the COLOURlovers API. It makes very easy to choose nice colours for your graphics. I used clpalettes function to search for the top palettes of the website. Their names are pretty suggestive as well: Giant Goldfish, Thought Provoking, Adrift in Dreams, let them eat cake … Inspired by this post I have done a Shiny app to create colored flowers using that palettes. Seeds are arranged according to the golden angle. One example:

Some others:

You can play with the app here.

If you want to do your own sunflowers, here you have the code. This is the ui.R file:

library(colourlovers)
library(rlist)
top=clpalettes('top')
sapply(1:length(top), function(x) list.extract(top, x)$title)-&gt;titles

fluidPage(
  titlePanel("Sunflowers for COLOURlovers"),
  fluidRow(
    column(3,
           wellPanel(
             selectInput("pal", label = "Palette:", choices = titles),
             sliderInput("nob", label = "Number of points:", min = 200, max = 500, value = 400, step = 50)
           )
    ),
    mainPanel(
      plotOutput("Flower")
    )
  )
  )

And this is the server.R one:

library(shiny)
library(ggplot2)
library(colourlovers)
library(rlist)
library(dplyr)

top=clpalettes('top')
sapply(1:length(top), function(x) list.extract(top, x)$title)->titles

CreatePlot = function (ang=pi*(3-sqrt(5)), nob=150, siz=15, sha=21, pal="LoversInJapan") {
  
  list.extract(top, which(titles==pal))$colors %>% 
    unlist %>% 
    as.vector() %>% 
    paste0("#", .) -> all_colors
  
  colors=data.frame(hex=all_colors, darkness=colSums(col2rgb(all_colors)))
  colors %>% arrange(-darkness)->colors
  
  background=colors[1,"hex"] %>% as.character

  colors %>% filter(hex!=background) %>% .[,1] %>% as.vector()->colors

  ggplot(data.frame(r=sqrt(1:nob), t=(1:nob)*ang*pi/180), aes(x=r*cos(t), y=r*sin(t)))+
    geom_point(colour=sample(colors, nob, replace=TRUE, prob=exp(1:length(colors))), aes(size=(nob-r)), shape=16)+
    scale_x_continuous(expand=c(0,0), limits=c(-sqrt(nob)*1.4, sqrt(nob)*1.4))+
    scale_y_continuous(expand=c(0,0), limits=c(-sqrt(nob)*1.4, sqrt(nob)*1.4))+
    theme(legend.position="none",
          panel.background = element_rect(fill=background),
          panel.grid=element_blank(),
          axis.ticks=element_blank(),
          axis.title=element_blank(),
          axis.text=element_blank())}

function(input, output) {
 output$Flower=renderPlot({
    CreatePlot(ang=180*(3-sqrt(5)), nob=input$nob, siz=input$siz, sha=as.numeric(input$sha), pal=input$pal)
  }, height = 550, width = 550 )}

Genetic Music: From Schoenberg to Bach

Bach, the epitome of a musician who strove all life long and finally acquired the ‘Habit of Perfection’, was a thoroughly imperfect human being (John Eliot Gardiner, Bach: Music in the Castle of Heaven)

Sometimes I dream awake and imagine I am a famous musician.  I fantasize being Paco de Lucía playing Mi niño Curro alone on the stage, Thom Yorke singing Fake plastic trees at Glastombury or Noel Gallagher singing Don’t look back in anger for a devoted crowd.

My parents gave me the opportunity to learn music, and this has been one of the best gifts I have received ever. I played the cello intensively until I had children but I still have enough skills to play some pieces. One of that is the Prelude of Suite No. 1 of J. S. Bach. It is very close to the limit of my possibilities but I love it. It is timeless, thrilling, provocative and elegant: an absolute masterpiece. I also imagine myself often playing it as well as my admired Yo-Yo Ma does.

The aim of this experiment is to discern first 4 beats of the prelude using a genetic algorithm. First of all, let’s listen our goal melody, created with tuneR package (sorry for the sound, Mr. Bach):

The frequency range of cello goes from 65.41 Hz to 987.77 Hz. Using the basic formula for the frequency of the notes, it means that a cello can produce 48 different notes. I generated the next codification for the 48 notes of the cello:

frequency (hz) note code
65.41 C2 a
69.30 C#2/Db2 b
73.42 D2 c
77.78 D#2/Eb2 d
82.41 E2 e
87.31 F2 f
92.50 F#2/Gb2 g
98.00 G2 h
103.83 G#2/Ab2 i
110.00 A2 j
116.54 A#2/Bb2 k
123.47 B2 l
130.81 C3 m
138.59 C#3/Db3 n
146.83 D3 o
155.56 D#3/Eb3 p
164.81 E3 q
174.61 F3 r
185.00 F#3/Gb3 s
196.00 G3 t
207.65 G#3/Ab3 u
220.00 A3 v
233.08 A#3/Bb3 w
246.94 B3 x
261.63 C4 y
277.18 C#4/Db4 z
293.66 D4 A
311.13 D#4/Eb4 B
329.63 E4 C
349.23 F4 D
369.99 F#4/Gb4 E
392.00 G4 F
415.30 G#4/Ab4 G
440.00 A4 H
466.16 A#4/Bb4 I
493.88 B4 J
523.25 C5 K
554.37 C#5/Db5 L
587.33 D5 M
622.25 D#5/Eb5 N
659.26 E5 O
698.46 F5 P
739.99 F#5/Gb5 Q
783.99 G5 R
830.61 G#5/Ab5 S
880.00 A5 T
932.33 A#5/Bb5 U
987.77 B5 V

So our goal melody is codified like this:

tAJHJAJAtAJHJAJAtCKJKCKCtCKJKCKCtEKJKEKEtEKJKEKEtFJHJFJFtFJHJFJF

I start with a population of 500 random melodies. All of them have 64 notes, the same length as the goal melody has. Given a melody, the algorithm compares it with the goal melody to calculate its fitness, with the following formula:

fitness= {2}^{\displaystyle number of correct notes}

For example, a melody with 5 correct notes has a fitness of 32. Being correct means being the right note in the right place. After measuring fitness of all melodies, I select 250 couples of individuals depending of its fitness (the more fitness, the more probability of being selected). Each couple generates two children for the next generation depending on certain probability, called crossover rate. Crossing operation is not always applied. Once two parents are selected, a random crossover point is chosen. At that point in both strings the genetic material from the left side of one parent is spliced to the material from the right side of other parent. The next figure illustrates the idea:

So two parents give birth to two children for the next generation. The last thing to do is mutate children. Once again, mutation is not always applied since it depends on a rate, usually small. Mutation introduces some new notes (new genetic material) to the next population. It increases convergence speed and reduces the probability to obtain a local optimum.

How many 32 -length melodies can be written with 48 notes? The answer is 4832, which is this extremely big number:

630.550.095.814.788.844.406.620.626.462.420.008.802.064.662.402.084.486

To understand how enormous is, let’s suppose we could work with Sunway TaihuLight, the fastest supercomputer in the world nowadays. This monster can do 93.000.000.000.000.000 floating-point operations per second so it will expend more than 214.995.831.974.513.789.322.026.202.008 years to calculate the fitness of all possible melodies: brute force is not an option.

A genetic algorithm does the job in just a few iterations. Best melodies introduce innovations which increase the average fitness of the whole population as well as its maximum fitness. Next table shows the evolution of an execution of the algorithm for a crossover rate equal of 75% and a mutation  rate of 1% (not exhaustive):

iteration best melody correct notes
1 OStxSTSbHwdsJAfTcRpoiNTRtRUxKhuRuKMcVNcBjRJNhENrVeFsPiegUpJHvRHw 7
5 tdbxSTSbHwdsJAfTcRpoiNTRtRITopoCPORzDdiFkEKrhEKtMHytiffzttJHvRHw 12
20 tAGHwdtUHzdMJATVACjJKVnetRQxKCKCtBKjqwiFkEKKhEKEMHyQiFfztUJHlRHF 25
35 tAGHwAQUjAdsJAGAcUjJKCLCtRQxKCKCtEKAqwKEzEKJhEKEMHytIFfFtUJHJRHF 35
50 tAJHwAJGjAJHJAJAtUCJKCkCtRUxKCKCtEKJKwKEtEKyhEKEMHyHrFfFtUJHJFHF 45
65 tAJHJAJGjAJHJAJAtUKJKCLCtCKxKCKCtEKJKwKEtEKyhEKEMHJHNFJFtFJHJFOF 52
80 tAJHJAJmtAJHJAJAtUKJKCLCtCKJKCKCtEKJKEKEtEKyMEKEMHJHJFJFtFJHJFOF 56
95 tAJHJAJjtAJHJAJAtUKJKCLCtCKJKCKCtEKJKEKEtEKJhEKEtFJHJFJFtFJHJFRF 59
110 tAJHJAJktAJHJAJAtUKJKCvCtCKJKCKCtEKJKEKEtEKJKEKEtFJHJFJFtFJHJFJF 61
125 tAJHJAJAtAJHJAJAtCKJKCKCtCKJKCKCtEKJKEKEtEKJKEKEtFJHJFJFtFJHJFJF 64

The optimum is reached in just 125 iterations. It is funny to merge the best melodies of some iterations. This sample blends four of them. The first one comes from the first initial population (the Schoenberg flavored) and the last one is our goal melody.  The other two were randomly picked from the rest iterations. It is nice to hear how the genetic algorithm turns randomness into the wonderful Bach’s melody:

This experiment was inspired by The Computational Beauty of Nature, a splendid book by Gary William Flake I strongly recommend you.

This is the code of the experiment:

library(tuneR)
library(stringdist)
library(dplyr)
#Function to calculate frequency
freq=function(n) 440*(2^(1/12))^n
#cello notes
notes=c("C2",
        "C#2/Db2",
        "D2",
        "D#2/Eb2",
        "E2",
        "F2",
        "F#2/Gb2",
        "G2",
        "G#2/Ab2",
        "A2",
        "A#2/Bb2",
        "B2",
        "C3",
        "C#3/Db3",
        "D3",
        "D#3/Eb3",
        "E3",
        "F3",
        "F#3/Gb3",
        "G3",
        "G#3/Ab3",
        "A3",
        "A#3/Bb3",
        "B3",
        "C4",
        "C#4/Db4",
        "D4",
        "D#4/Eb4",
        "E4",
        "F4",
        "F#4/Gb4",
        "G4",
        "G#4/Ab4",
        "A4",
        "A#4/Bb4",
        "B4",
        "C5",
        "C#5/Db5",
        "D5",
        "D#5/Eb5",
        "E5",
        "F5",
        "F#5/Gb5",
        "G5",
        "G#5/Ab5",
        "A5",
        "A#5/Bb5",
        "B5")
#Table of frequencies
frequencies=data.frame(n=-33:14) %>% 
  mutate(frequency=round(freq(n),4),
         note=notes,
         code=c(letters, toupper(letters))[1:48])
#Codification of the goal melody
prelude="tAJHJAJAtAJHJAJAtCKJKCKCtCKJKCKCtEKJKEKEtEKJKEKEtFJHJFJFtFJHJFJF"
#Sample wav
if (exists("all_wave")) rm(all_wave)
frequencies %>% 
  filter(code==substr(prelude,1,1)) %>% 
  select(frequency) %>% 
  as.numeric %>% 
  sine(duration = 10000)->all_wave
for (i in 2:nchar(prelude)) 
  frequencies %>% 
  filter(code==substr(prelude,i,i)) %>% 
  select(frequency) %>% 
  as.numeric %>% 
  sine(duration = 10000) %>% bind(all_wave, .)->all_wave  
play(all_wave)
writeWave(all_wave, 'PreludeSample.wav')

popsize=500 #Population size
length=nchar(prelude)
genes=frequencies$code
maxfitness=2^(1-(stringdist(prelude, prelude, method="hamming")-length))
maxiter=200 #Max number of iterations
iter=1
mutrate=0.01
#Initial population
replicate(popsize, sample(genes, length, replace = TRUE)) %>%
  apply(2, function(x) paste(x,collapse="")) -> population
#Fitness evaluation
fitness=sapply(population, function(x) 2^(1-(stringdist(x, prelude, method="hamming")-length)), USE.NAMES=FALSE)
#Maximum fitness
maxfitenss_iter=max(fitness)
#Best melody
which((fitness)==max(fitness)) %>% min %>% population[.] ->bestfit
results=data.frame(iteration=iter, best_melody=bestfit, correct_notes=log(maxfitenss_iter, base = 2)-1)
#Execution of the algorithm
while(maxfitenss_iter<maxfitness & iter<maxiter)
{
  population2=c()
  for (i in 1:(popsize/2))
  {
    parents=sample(1:popsize, size=2, prob=fitness/sum(fitness), replace=FALSE) 
    mix=sample(1:(length-1), 1)
    
    if (runif(1)>.25)
    {
      p1=paste0(substr(population[parents[1]],1,mix), substr(population[parents[2]],mix+1,length))
      p2=paste0(substr(population[parents[2]],1,mix), substr(population[parents[1]],mix+1,length))
    }
    else
    {
      p1=population[parents[1]]
      p2=population[parents[2]]
    }
    for (j in 1:length) if(runif(1)<mutrate) substr(p1,j,j)=sample(genes,1)
    for (j in 1:length) if(runif(1)<mutrate) substr(p2,j,j)=sample(genes,1)
    c(p1, p2) %>% c(population2)->population2
  }
  #New population
  population=population2
  fitness=sapply(population, function(x) 2^(1-(stringdist(x, prelude, method="hamming")-length)), USE.NAMES=FALSE)
  which((fitness)==max(fitness)) %>% min %>% population[.] ->bestfit
  print(paste0("Iteration ",iter, ": ", bestfit))
  maxfitenss_iter=max(fitness)
  iter=iter+1
  data.frame(iteration=iter, best_melody=bestfit, correct_notes=log(maxfitenss_iter, base = 2)-1) %>% rbind(results) -> results
}

Frankenstein

Remember me, remember me, but ah! forget my fate (Dido’s Lament, Henry Purcell)

A Voronoi diagram divides a plane based on a set of original points. Each polygon, or Voronoi cell, contains an original point and all that are closer to that point than any other.

This is a nice example of a Voronoi tesselation. You can find good explanations of Voronoi diagrams and Delaunay triangulations here (in English) or here (in Spanish).

A grayscale image is simply a matrix where darkness of pixel located in coordinates (i, j) is represented by the value of its corresponding element of the matrix: a grayscale image is a dataset. This is a Voronoi diagraman of Frankenstein:

To do it I followed the next steps:

  1. Read this image
  2. Convert it to gray scale
  3. Turn it into a pure black and white image
  4. Obtain a random sample of black pixels (previous image corresponds to a sample of 6.000 points)
  5. Computes the Voronoi tesselation

Steps 1 to 3 were done with imager, a very appealing package to proccess and analice images. Step 5 was done with deldir, also a convenient package which computes Delaunay triangulation and the Dirichlet or Voronoi tessellations.

The next grid shows tesselations for sample size from 500 to 12.000 points and step equal to 500:

I gathered all previous images in this gif created with magick, another amazing package of R I discovered recently:

This is the code:

library(imager)
library(dplyr)
library(deldir)
library(ggplot2)
library(scales)

# Download the image
file="http://ereaderbackgrounds.com/movies/bw/Frankenstein.jpg"
download.file(file, destfile = "frankenstein.jpg", mode = 'wb')

# Read and convert to grayscale
load.image("frankenstein.jpg") %>% grayscale() -> x

# This is just to define frame limits
x %>% 
  as.data.frame() %>% 
  group_by() %>% 
  summarize(xmin=min(x), xmax=max(x), ymin=min(y), ymax=max(y)) %>% 
  as.vector()->rw

# Filter image to convert it to bw
x %>%
  threshold("45%") %>% 
  as.cimg() %>% 
  as.data.frame() -> df

# Function to compute and plot Voronoi tesselation depending on sample size
doPlot = function(n)
{
  #Voronoi tesselation
  df %>% 
  sample_n(n, weight=(1-value)) %>% 
  select(x,y) %>% 
  deldir(rw=rw, sort=TRUE) %>% 
  .$dirsgs -> data

  # This is just to add some alpha to lines depending on its longitude
  data %>% 
    mutate(long=sqrt((x1-x2)^2+(y1-y2)^2),
         alpha=findInterval(long, quantile(long, probs = seq(0, 1, length.out = 20)))/21)-> data

  # A little bit of ggplot to plot results
  data %>% 
    ggplot(aes(alpha=(1-alpha))) +
    geom_segment(aes(x = x1, y = y1, xend = x2, yend = y2), color="black", lwd=1) +
    scale_x_continuous(expand=c(0,0))+
    scale_y_continuous(expand=c(0,0), trans=reverse_trans())+
    theme(legend.position  = "none",
            panel.background = element_rect(fill="white"),
            axis.ticks       = element_blank(),
            panel.grid       = element_blank(),
            axis.title       = element_blank(),
            axis.text        = element_blank())->plot

return(plot)
}

# I call the previous function and store resulting plot in jpeg format
i=5000
name=paste0("frankie",i,".jpeg")
jpeg(name, width = 600, height = 800, units = "px", quality = 100)
doPlot(i)
dev.off()

# Once all images are stored I can create gif
library(magick)
frames=c()
images=list.files(pattern="jpeg")

for (i in length(images):1)
{
  x=image_read(images[i])
  x=image_scale(x, "300")
  c(x, frames) -> frames
}
animation=image_animate(frames, fps = 2)
image_write(animation, "Frankenstein.gif")

Who is Alan Turing?

This government is committed to introducing posthumous pardons for people with certain historical sexual offence convictions who would be innocent of any crime now (British Government Spokesperson, September 2016)

Last September, the British government announced its intention to pursue what has become known as the Alan Turing law, offering exoneration to the tens of thousands of gay men convicted of historic charges.  The law was finally unveiled on 20 October 2016.

This plot shows the daily views of the Alan Turing’s wikipedia page during the last 365 days:

There are three huge peaks in May 27th, July 30th, and October 29th that can be easily detected using AnomalyDetection package:


After substituting these anomalies by a simple linear imputation, it is clear that the time series has suffered a significant impact since the last days of September:

To estimate the amount of incremental views since September 28th (this is the date I have chosen as starting point) I use CausalImpact package:


Last plot shows the accumulated effect. After 141 days, there have been around 1 million of incremental views to the Alan Turing’s wikipedia page (more than 7.000 per day) and it does not seem ephemeral.

Alan Turing has won another battle, this time posthumous. And thanks to it, there is a lot of people that have discovered his amazing legacy: long life to Alan Turing.

This is the code I wrote to do the experiment:

library(httr)
library(jsonlite)
library(stringr)
library(xts)
library(highcharter)
library(AnomalyDetection)
library(imputeTS)
library(CausalImpact)
library(dplyr)

# Views last 365 days
(Sys.Date()-365) %>% str_replace_all("[[:punct:]]", "") %>% substr(1,8) -> date_ini
Sys.time()       %>% str_replace_all("[[:punct:]]", "") %>% substr(1,8) -> date_fin
url="https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia/all-access/all-agents/Alan%20Turing/daily"
paste(url, date_ini, date_fin, sep="/") %>% 
  GET  %>% 
  content("text") %>% 
  fromJSON %>% 
  .[[1]] -> wikistats

# To prepare dataset for highcharter
wikistats %>% 
  mutate(day=str_sub(timestamp, start = 1, end = 8)) %>% 
  mutate(day=as.POSIXct(day, format="%Y%m%d", tz="UTC")) -> wikistats

# Highcharts viz
rownames(wikistats)=wikistats$day
wikistats %>% select(views) %>% as.xts  %>% hchart

# Anomaly detection
wikistats %>% select(day, views) -> tsdf
tsdf %>%  
  AnomalyDetectionTs(max_anoms=0.01, direction='both', plot=TRUE)->res
res$plot

# Imputation of anomalies
tsdf[tsdf$day %in% as.POSIXct(res$anoms$timestamp, format="%Y-%m-%d", tz="UTC"),"views"]<-NA 
ts(tsdf$views, frequency = 365) %>% 
  na.interpolation() %>% 
  xts(order.by=wikistats$day) -> tscleaned
tscleaned %>% hchart

# Causal Impact from September 28th
x=sum(index(tscleaned)<"2016-09-28 UTC")
impact <- CausalImpact(data = tscleaned %>% as.numeric, 
                       pre.period = c(1,x),
                       post.period = c(x+1,length(tscleaned)), 
                       model.args = list(niter = 5000, nseasons = 7),
                       alpha = 0.05)
plot(impact)