Tag Archives: ggplot2

How Do We Draw a Line?

She dreams in colour, she dreams in red, can’t find a better man (Better Man, Pearl Jam)

Today I bring another experiment based on The Quick Draw! Data from Google, one of my most fortunate discoveries of the last times. The Quick Draw! is a web game developed by Google, that can be played on a computer, tablet or mobile phone, in which you are asked to draw something (for example, a bird). Then you have just 20 seconds to do it. You win if a machine, trained with a neural network, deduces what are you drawing. The best way to understand how it works is playing to it here. Google published data of about 50 million drawings across 345 categories, contributed by players of the game from all over the world. Datasets are in ndjson format (newline delimited JSON). In my previous post I analyzed one of these datasets, and showed a way to parse and represent the drawings in ggplot.

In this occasion I analyze the simplest drawing that Google can ask you: a line. The dataset, which is called lines.ndjson, can be found here and contains more than 143.000 lines drawn by people from about 170 countries. Most of these drawings come from The United States (45.4%), United Kingdom (7.5%), Canada (3.6%), Germany (3.5%) and Russian Federation (2.3%).

Let’s try to understand how humans draw lines. Concretely, in which direction do we draw them: horizontally? toward right o left? vertically? toward up or down? This analysis is inspired in two great articles I read recently:

There are some technical details around this experiment I would highlight:

  • I parse the dataset using fromJSON function from rjson package.
  • I use purrr package to apply a linear regression to the points defining the line for each drawing.
  • I easily convert the summary of the linear regression into a data frame using tidy function from broom package.
  • I use the slope of the regression to obtain the angle which describes the line (depending on where it is started I add pi to de arctangent of the slope)
  • I represent the frequence of angles using polar coordinates dividing circle in sections of 30 degrees in the following way: 345°- 15°, 15°- 45°, 45°-75°, 75°-105°, …, 315°-345° so for example, horizontal lines from left to right will fall into 345º- 15º category.

This is how do we draw lines analysing the entire dataset, without doing any distinction by country:

The fact seems clear: an average human who plays to the Quick Draw! game, draws a line horizontally from left to right with a probability of 59%. I have to admite that I expected a majority of horizontal-left-to-right lines, but not as crushingly as the plot shows. Maybe my a priori is far from the reality because I am lefty and I would draw it in another way. Remember as well that this mean human will probably come from The United States.

Are there differences by country? Yes, and they are very interesting. I removed all that countries with less then 150 drawings. Taking this into account, these are the four countries where more people draw vertical bottom-up lines:

And these are where more people draw horizontal right-left lines:

We’ve seen that on average, 59% of lines are drawn from left to right. This figure reaches more than 75% in the following countries:

And where do people draw more oblique lines? Here:

Surprisingly, a very small amount of lines are drawn toward down, which seems me quite intriguing.

Some thoughts (let me know yours):

  • Humans prefer doing horizontal lines from left-to-right everywhere
  • In case of drawing vertical, we clearly prefer bottom-up movement rather than the opposite; maybe the device configuration or the arrangement of the application motivates this behaviour.
  • Arab and hebrew are written from right-to-left: this fact seems to have a significant influence on the way that people draw lines.

You can find the code of this experiment here.

Exploring The Quick, Draw! Dataset With R: The Mona Lisa

All that noise, and all that sound, all those places I have found (Speed of Sound, Coldplay)

Some days ago, my friend Jorge showed me one of the coolest datasets I’ve ever seen: the Google quick draw dataset. In its Github website you can see a detailed description of the data. Briefly, it contains  around 50 million of drawings of people around the world in .ndjson format. In this experiment, I used the simplified version of drawings where strokes are simplified and resampled with a 1 pixel spacing. Drawings are also aligned to top-left corner and scaled to have a maximum value of 255. All these things make data easier to manage and to represent into a plot.

Since .ndjson files may be very large, I used LaF package to access randon lines of the file rather than reading it completely. I wrote a script to explore The Mona Lisa.ndjson file, which contains more than 120.000 drawings that the TensorFlow engine from Google recognized as being The Mona Lisa. It is quite funny to see them. Whit this script you can:

  • Reproduce a random single drawing
  • Create a 9×9 mosaic of random drawings
  • Create an animation simulating the way the drawing was created

I use ggplot2 package to render drawings and gganimate package of David Robinson to create animations.

This is an example of a single drawing:

This is an example of a 3×3 mosaic:

This is an example of animation:

If you want to try by yourself, you can find the code here.

Note: to work with gganimate, I downloaded the portable version and pointed to it with Sys.setenv command as explained here.

Coloring Sudokus

Someday you will find me
caught beneath the landslide
(Champagne Supernova, Oasis)

I recently read a book called Snowflake Seashell Star: Colouring Adventures in Numberland by Alex Bellos and Edmund Harris which is full of mathematical patterns to be coloured. All images are truly appealing and cause attraction to anyone who look at them, independently of their age, gender, education or political orientation. This book demonstrates how maths are an astonishing way to reach beauty.

One of my favourite patterns are tridokus, a sophisticated colored version of sudokus. Coloring a sudoku is simple: once that is solved it is enough to assign a color to each number (from 1 to 9).  If you superimpose three colored sudokus with no cells at the same position sharing the same color, and using again nine colors, the resulting image is a tridoku:

There is something attractive in a tridoku due to the balance of colors but also they seem a quite messy: they are a charmingly unbalanced.  I wrote a script to generalize the concept to n-dokus. The idea is the same: superimpose n sudokus without cells sharing color and position (I call them disjoint sudokus) using just nine different colors. I did’n’t prove it, but I think the maximum amount of sudokus can be overimposed with these constrains is 9. This is a complete series from 1-doku to 9-doku (click on any image to enlarge):

I am a big fan of colourlovers package. These tridokus are colored with some of my favourite palettes from there:

Just two technical things to highlight:

  • There is a package called sudoku that generates sudokus (of course!). I use it to obtain the first solved sudoku which forms the base.
  • Subsequent sudokus are obtained from this one doing two operations: interchanging groups of columns first (there are three groups: columns 1 to 3, 4 to 6 and 7 to 9) and interchanging columns within each group then.

You can find the code here: do you own colored n-dokus!

Pencil Scribbles

Con las bombas que tiran los fanfarrones, se hacen las gaditanas tirabuzones (Palma y corona, Carmen Linares)

This time I draw Franky again using an algorithm to solve the Travelling Salesman Problem as I did in my last post. On this occasion, instead of doing just one single line drawing, I overlap many of them (250 concretely), each of them sampling 400 points on the original image (in my previous post I sampled 8.000 points). Last difference is that I don’t convert the image to pure black and white with threshold function: now I use the gray scale number of each pixel to weight the sample.

Once again, I use ggplot2 package, and its magical geom_path, to generate the image. The pencil effect is obtained giving a very high transparency to the lines. This is the result:

I love when someone else experiment with my experiments as Mara Averick did:

Or Erik-Jan van Kesteren:

You can do it as well with this one, since you will find the code here. Please, let me know your own creations if you do. You can find me on twitter or by email.

P.S.: Although it may seems otherwise, I’m not obsessed with Frankenstein 🙂

The Travelling Salesman Portrait

I have noticed even people who claim everything is predestined, and that we can do nothing to change it, look before they cross the road (Stephen Hawking)

Imagine a salesman and a set of cities. The salesman has to visit each one of the cities starting from a certain one and returning to the same city. The challenge is finding the route which minimizes the total length of the trip. This is the Travelling Salesman Problem (TSP): one of the most profoundly studied questions in computational mathematics. Since you can find a huge amount of articles about the TSP in the Internet, I will not give more details about it here.

In this experiment I apply an heuristic algorithm to solve the TSP to draw a portrait. The idea is pretty simple:

  • Load a photo
  • Convert it to black and white
  • Choose a sample of black points
  • Solve the TSP to calculate a route among the points
  • Plot the route

The result is a single line drawing of the image that you loaded. To solve the TSP I used the arbitrary insertion heuristic algorithm (Rosenkrantz et al. 1977), which is quite efficient.

To illustrate the idea, I have used again this image of Frankenstein (I used it before in this other experiment). This is the result:

You can find the code here.

Mandalas Colored

Apriétame bien la mano, que un lucero se me escapa entre los dedos (Coda Flamenca, Extremoduro)

I have the privilege of being teacher at ESTALMAT, a project run by Spanish Royal Academy of Sciences that tries to detect, guide and stimulate in a continuous way, along two courses, the exceptional mathematical talent of students of 12-13 years old. Some weeks ago I gave a class there about the importance of programming. I tried to convince them that learning R or Python is a good investment that always pays off; It will make them enjoy more of mathematics as well as to see things with their own eyes. The main part of my class was a workshop about Voronoi tesselations in R. We started drawing points on a circle and we finished drawing mandalas like these ones. You can find the details of the workshop here (in Spanish). It was a wonderful experience to see the faces of the students while generating their own mandalas.

In that case all mandalas were empty, ready to be printed and coloured as my 7 years old daughter does. In this experiment I colour them. These are the changes I have done to my  previous code:

  • Remove external segments which intersects the boundary of the enclosing
  • Convert the tesselation into a list of polygons with tile.list function
  • Use colourlovers package to fill the polygons with beautiful colour palettes

This is an example of the result:

Changing three simple parameters (iter, points and radius) you can obtain completely different images (clicking on any image you can see its full size version):

You can find details of these parameters in my previous post. I cannot resist to place more examples:

You can find the code here. Enjoy.


Mathematics is a place where you can do things which you can’t do in the real world (Marcus Du Sautoy, mathematician)

From time to time I have a look to some of my previous posts: it’s like seeing them through another’s eyes. One of my first posts was this one, where I draw fractals using the Multiple Reduction Copy Machine (MRCM) algorithm. That time I was not clever enough to write an efficient code able generate deep fractals. Now I am pretty sure I could do it using ggplot and I started to do it when I come across with the idea of mixing this kind of fractal patterns with Voronoi tessellations, that I have explored in some of my previous posts, like this one. Mixing both techniques, the mandalas appeared.

I will not explain in depth the mathematics behind this patterns. I will just give a brief explanation:

  • I start obtaining n equidistant points in a unit circle centered in (0,0)
  • I repeat the process with all these points, obtaining again n points around each of them; the radius is scaled by a factor
  • I discard the previous (parent) n points

I repeat these steps iteratively. If I start with n points and iterate k times, at the end I obtain nk points. After that, I calculate the Voronoi tesselation of them, which I represent with ggplot.

This is an example:

Some others:

You can find the code here. Enjoy it.

Tiny Art in Less Than 280 Characters

Now that Twitter allows 280 characters, the code of some drawings I have made can fit in a tweet. In this post I have compiled a few of them.

The first one is a cardioid inspired in string art (more info here):

ggplot(df,aes(x,y,xend=x2,yend=y2)) +

This other is based on Fermat’s spiral (more info here):

t=seq(from=0, to=100*pi, length.out=500*100)
data.frame(x= t^(1/2)*cos(t), y= t^(1/2)*sin(t))%>%
rbind(-.)%>%ggplot(aes(x, y))+geom_polygon()+theme_void()

A recurrence plot of Gauss error function (more info here):

seq(-5*pi,5*pi,by=.1)%>%expand.grid(x=., y=.)%>%
ggplot(aes(x=x, y=y, fill=erf(sec(x)-sec(y))))+geom_tile()+

A x-y scatter plot of a trigonometric function on R2 (more info here):

seq(from=-10, to=10, by = 0.05) %>%
expand.grid(x=., y=.) %>%
ggplot(aes(x=(x+pi*sin(y)), y=(y+pi*sin(x)))) +
geom_point(alpha=.1, shape=20, size=1, color="black")+

A turtle graphic (more info here):

  for (i in 1:150) {

A curve generated by a simulated harmonograph (more info here):

t=seq(1, 100, by=.001)
type="l", axes=FALSE, xlab="", ylab="")

A chord diagram of a 20×20 1-matrix (more info here):

chordDiagram(matrix(1, 20, 20), symmetric = TRUE, 
col="black", transparency = 0.85, annotationTrack = NULL)

Most of them are made with ggplot2 package. I love R and the sense of wonder of how just one or two lines of code can create beautiful and unexpected patterns.

I recently did this project for DataCamp to show how easy is to do art with R and ggplot. Starting from a extremely simple plot, and following a well guided path, you can end making beautiful images like this one:

Furthermore, you can learn also ggplot2 while you do art.

I have done the project together with Rasmus Bååth, instructor at DataCamp and the perfect mate to work with. He is looking for people to build more projects so if you are interested, here you can find more information. Do not hesitate to ask him for details.

All the best for 2018.

Merry Christmas.

Drawing 10 Million Points With ggplot: Clifford Attractors

For me, mathematics cultivates a perpetual state of wonder about the nature of mind, the limits of thoughts, and our place in this vast cosmos (Clifford A. Pickover – The Math Book: From Pythagoras to the 57th Dimension, 250 Milestones in the History of Mathematics)

I am a big fan of Clifford Pickover and I find inspiration in his books very often. Thanks to him, I discovered the harmonograph and the Parrondo’s paradox, among many other mathematical treasures. Apart of being a great teacher, he also invented a family of strange attractors wearing his name. Clifford attractors are defined by these equations:

x_{n+1}\, =\, sin(a\, y_{n})\, +\, c\, cos(a\, x_{n})  \\  y_{n+1}\, =\, sin(b\, x_{n})\, +\, d\, cos(b\, y_{n})  \\

There are infinite attractors, since a, b, c and d are parameters. Given four values (one for each parameter) and a starting point (x0, y0), the previous equation defines the exact location of the point at step n, which is defined just by its location at n-1; an attractor can be thought as the trajectory described by a particle. This plot shows the evolution of a particle starting at (x0, y0)=(0, 0) with parameters a=-1.24458046630025, b=-1.25191834103316, c=-1.81590817030519 and d=-1.90866735205054 along 10 million of steps:

Changing parameters is really entertaining. Drawings have a sandy appearance:

From a technical point of view, the challenge is creating a data frame with all locations, since it must have 10 milion rows and must be populated sequentially. A very fast way to do it is using Rcpp package. To render the plot I use ggplot, which works quite well. Here you have the code to play with Clifford Attractors if you want:


opt = theme(legend.position  = "none",
            panel.background = element_rect(fill="white"),
            axis.ticks       = element_blank(),
            panel.grid       = element_blank(),
            axis.title       = element_blank(),
            axis.text        = element_blank())

cppFunction('DataFrame createTrajectory(int n, double x0, double y0, 
            double a, double b, double c, double d) {
            // create the columns
            NumericVector x(n);
            NumericVector y(n);
            for(int i = 1; i < n; ++i) {
            x[i] = sin(a*y[i-1])+c*cos(a*x[i-1]);
            y[i] = sin(b*x[i-1])+d*cos(b*y[i-1]);
            // return a new data frame
            return DataFrame::create(_["x"]= x, _["y"]= y);


df=createTrajectory(10000000, 0, 0, a, b, c, d)

png("Clifford.png", units="px", width=1600, height=1600, res=300)
ggplot(df, aes(x, y)) + geom_point(color="black", shape=46, alpha=.01) + opt


Blue dragonflies dart to and fro
I tie my life to your balloon and let it go
(Warm Foothills, Alt-J)

In my last post I did some drawings based on L-Systems. These drawings are done sequentially. At any step, the state of the drawing can be described by the position (coordinates) and the orientation of the pencil. In that case I only used two kind of operators: drawing a straight line and turning a constant angle. Today I used two more symbols to do stack operations:

  • “[“ Push the current state (position and orientation) of the pencil onto a pushdown
    operations stack
  • “]” Pop a state from the stack and make it the current state of the pencil (no line is drawn)

These operators allow to return to a previous state to continue drawing from there. Using them you can draw plants like these:

Each image corresponds to a different axiom, rules, angle and depth. I described these terms in my previous post. If you want to reproduce them you can find the code below (each image corresponds to a different set of axiom, rules, angle and depth parameters). Change colors, add noise to angles, try your own plants … I am sure you will find nice images:


#Plant 1

#Plant 2
rules=list("X"="F[+X][-X]FX", "F"="FF")

#Plant 3
rules=list("X"="F[+X]F[-X]+X", "F"="FF")

#Plant 4
rules=list("X"="F-[[X]+X]+F[+FX]-X", "F"="FF")

#Plant 5

#Plant 6

for (i in 1:depth) axiom=gsubfn(".", rules, axiom)

actions=str_extract_all(axiom, "\\d*\\+|\\d*\\-|F|L|R|\\[|\\]|\\|") %>% unlist

status=data.frame(x=numeric(0), y=numeric(0), alfa=numeric(0))
points=data.frame(x1 = 0, y1 = 0, x2 = NA, y2 = NA, alfa=90, depth=1)

for (action in actions) 
  if (action=="F")
    x=points[1, "x1"]+cos(points[1, "alfa"]*(pi/180))
    y=points[1, "y1"]+sin(points[1, "alfa"]*(pi/180))
    data.frame(x1 = x, y1 = y, x2 = NA, y2 = NA, 
               alfa=points[1, "alfa"],
               depth=points[1,"depth"]) %>% rbind(points)->points
  if (action %in% c("+", "-")){
    alfa=points[1, "alfa"]
    points[1, "alfa"]=eval(parse(text=paste0("alfa",action, angle)))
    data.frame(x=points[1, "x1"], y=points[1, "y1"], alfa=points[1, "alfa"]) %>% 
      rbind(status) -> status
    points[1, "depth"]=points[1, "depth"]+1
    depth=points[1, "depth"]
    data.frame(x1=status[1, "x"], y1=status[1, "y"], x2=NA, y2=NA, 
               alfa=status[1, "alfa"],
               depth=depth-1) %>% 
      rbind(points) -> points

ggplot() + 
  geom_segment(aes(x = x1, y = y1, xend = x2, yend = y2), 
               lineend = "round", 
               data=na.omit(points)) + 
  coord_fixed(ratio = 1) +
        panel.background = element_rect(fill="black"),