The Goldbach’s Comet

Every even integer greater than 2 can be expressed as the sum of two primes (Christian Goldbach, 1742)

The point cloud known as Goldbach’s Comet represents the amount of different ways (y axis) an even number (x axis) can be writen as sum of two prime numbers. In this plot, x axis is between 2 and 50.000 (stars are just ornaments):

Goldbach

Mathematicians are still waiting for a proof of this conjecture. This is the code for drawing this plot:

require(schoolmath)
require(utils)
library(plyr)
n=50000
data(primlist)
primes=as.data.frame(t(combn(primlist[primlist>2 & primlist<n-2], 2)))
primes$V3=primes$V1+primes$V2
primes2=count(primes, "V3")
primes2=primes2[primes2$V3<=n,]
stars=cbind(runif(50, min=-n*0.05, max=n), runif(200, min=-n*0.001, max=max(primes2$freq)))
plot.new()
par(mai = rep(0, 4), bg = "gray12")
plot(NA,type="n", xlim=c(-n*0.05,n), xaxs="i", ylim=c(-n*0.001,max(primes2$freq)))
points(stars, col = "blue4", cex=.7, pch=16)
points(stars, col = "blue", cex=.3, pch=16)
points(stars, col = "gray75", cex=.1, pch=16)
apply(primes2, 1, function(x) points(x=x[1],y=x[2], col = if (runif(1)&gt;x[1]/n) {"white"} else {sample(colours(),1)}, cex=.1, pch=16))

The Ikeda’s Galaxy

Chaos is the score upon which reality is written (Henry Miller)

Nonlinear dynamical systems are an enormous seam of amazing images. The Ikeda Map is an example of strange attractor which represents the movement of particles under the rules of certain differential equations.

I have drawn the trajectories followed by of 200 particles under the 2D-Ikeda Map with the same tecnique I used in this previous post, resulting this nice galaxy:

ikeda0.918

Wold you like to create your own galaxies? Here you have the code:

u=0.918 #Parameter between 0 and 1
n=200 #Number of particles
m=40 #Number of iterations
ikeda=data.frame(it=1,x1=runif(n, min = -40, max = 40), y1=runif(n, min = -40, max = 40))
ikeda$x2=1+u*(ikeda$x1*cos(0.4-6/(1+ikeda$x1^2+ikeda$y1^2))-ikeda$y1*sin(0.4-6/(1+ikeda$x1^2+ikeda$y1^2)))
ikeda$y2=  u*(ikeda$x1*sin(0.4-6/(1+ikeda$x1^2+ikeda$y1^2))+ikeda$y1*cos(0.4-6/(1+ikeda$x1^2+ikeda$y1^2)))
for (k in 1:m)
{
df=as.data.frame(cbind(rep(k+1,n),
ikeda[ikeda$it==k,]$x2,
ikeda[ikeda$it==k,]$y2,
1+u*(ikeda[ikeda$it==k,]$x2*cos(0.4-6/(1+ikeda[ikeda$it==k,]$x2^2+ikeda[ikeda$it==k,]$y2^2))-ikeda[ikeda$it==k,]$y2*sin(0.4-6/(1+ikeda[ikeda$it==k,]$x2^2+ikeda[ikeda$it==k,]$y2^2))),
u*(ikeda[ikeda$it==k,]$x2*sin(0.4-6/(1+ikeda[ikeda$it==k,]$x2^2+ikeda[ikeda$it==k,]$y2^2))+ikeda[ikeda$it==k,]$y2*cos(0.4-6/(1+ikeda[ikeda$it==k,]$x2^2+ikeda[ikeda$it==k,]$y2^2)))))
names(df)=names(ikeda)
ikeda=rbind(df, ikeda)
}
plot.new()
par(mai = rep(0, 4), bg = "gray12")
plot(c(0,0),type="n", xlim=c(-35, 35), ylim=c(-35,35))
apply(ikeda, 1, function(x) lines(x=c(x[2],x[4]), y=c(x[3],x[5]), col = paste("gray", as.character(min(round(jitter(x[1]*80/(m-1)+(20*m-100)/(m-1), amount=5)), 100)), sep = ""), lwd=0.1))

The Gilbreath’s Conjecture

317 is a prime, not because we think so, or because our minds are shaped in one way rather than another, but because it is so, because mathematical reality is built that way (G.H. Hardy)

In 1958, the mathematician and magician Norman L. Gilbreath presented a disconcerting hypothesis conceived in the back of a napkin. Gilbreath wrote first prime numbers in a row. In the next rows, he wrote the difference  in absolute value of consecutive values of previous row. Beginning with first 20 primes in the first row, he obtained something like this:Gilbreath01

The conjecture is easy: except for the first one, all elements of the first column are 1. So far, no one has demonstrated this hypothesis. In fact, according to mathematician Richard Guy, it seems unlikely that we will see a demonstration of Gilbreath’s conjecture in the near future, though probably this conjecture is true.

In the previous chart, I coloured zeros in white, ones in violet and rest of numbers in gold. The conjecture says that except for the first element, the first column is entirely violet. Following you can see the coloured chart for first 20, 40, 60 and 80 prime numbers:

Gilbreath02

It is nice how zeros create triangular patterns similar to patterns created by cellular automata. This is the chart for first 200 primes:Gilbreath03

How much time will it take to demonstrate this simple conjecture? Who knows. Meanwhile, you can draw triangles with this code:

library(ggplot2)
create.gilbreath=function(n)
{
require(reshape)
require(schoolmath)
data(primlist)
gilbreath=t(matrix(primlist[2:(n+1)], n, n))
for (i in 2:n) {gilbreath[i,]=c(eval(parse(text=paste(c(paste("abs(diff(",collapse=""), "gilbreath[i-1,]", paste("))",collapse="")),collapse=""))),rep(NA,1))}
na.omit(melt(t(gilbreath)))
}
opt=theme(legend.position="none",
panel.background = element_blank(),
panel.grid = element_blank(),
axis.ticks=element_blank(),
axis.title=element_blank(),
axis.text =element_blank())
gilbreath=create.gilbreath(20)
gilbreath$value1=cut(gilbreath$value, breaks=c(-Inf,1:2,Inf), right = FALSE)
ggplot(gilbreath, aes(x=X1, y=X2)) +
geom_tile(aes(fill = value1), colour="grey") +
scale_fill_manual(values = c("white", "darkviolet", "gold"))+
geom_text(label=gilbreath$value, size=8)+
scale_y_reverse()+opt

The Three Little Pigs

Jesse, you asked me if I was in the meth business or the money business. Neither. I’m in the empire business (Walter White in Breaking Bad)

The game of pig has simple rules but complex strategies. It was described for the first time in 1945  by a magician called John Scarne. Playing the pig game is easy: each turn, a player repeatedly rolls a die until either a 1 is rolled or the player decides to hold:

  • If the player rolls a 1, they score nothing and it becomes the next player’s turn
  • If the player rolls any other number, it is added to their turn total and the player’s turn continues
  • If a player chooses to hold, their turn total is added to their score, and it becomes the next player’s turn

The first player who reach at least 100 points is the winner. For example: you obtain a 3 and then decide to roll again, obtaining a 1. Your score is zero in this turn. Next player gets the sequence 3-4-6 and decides to hold, obtaining a score of 13 points in this turn.

Despite its simplicity, the pig game has a very complex and against-intuition optimal strategy. It was calculated in 2004 by Todd W. Neller and Clifton Presser from Gettysburg College of Pennsilvania with the help of computers.

To illustrate the game, I simulated three players (pigs) playing the pig game with three different strategies:

  • The Coward pig, who only rolls the die a small number of times in every turn
  • The Risky pig, who rolls the die a more times than the coward one
  • The Ambitious pig, who tries to obtain in every turn more points than two others

I simulated several scenarios.

  • Some favorable scenarios for Coward pig:

pigs2Coward pigs4Coward

In first scenario, the Coward pig rolls the die between 1 and 5 times each round and wins if the Risky pig asumes an excessive level of risk (rolling each time between 10 and 15 times). Trying to obtain more than the Coward is a bad option for the Ambitious pig. Simulating this scenario 100 times gives victory to Coward a 51% of times (25% to Risky and 24% to Ambitious).

Second scenario puts closer Coward and Risky pigs (first one rolls the die between 4 and 7 times  each round and second one between 6 and 9 times). Coward wins 54% of times (34% Risky and only 12% Ambitious).

Being coward seems to be a good strategy when you play against a reckless or when you are just a bit more conservative than a Risky one.

  • Some favorable scenarios for Risky pig:

pigs5RiskyPigs1Risky

Rolling the die between 4 and 6 times each round seems to be a good option, even more when you are playing against a extremely conservative player who rolls no more than 3 times each time. Simulating 100 times these previous scenarios gives victory to Risky pig a 58% of times in first the case in which Coward rolls allways 1 and Risky 6 times each round (0% for Coward and only 42% form Ambitious) and 66% of times in the second one (only 5% to Coward and 29% to Ambitious).

Being Risky is a good strategy when you play against a chicken.

  • Some favorable scenarios for Ambitious pig:

pigs3Ambitious pigs6ambitious

The Ambitious pig wins when two others turn into extremely coward and risky pigs as can be seen in the first scenario in which Ambitious wins 65% of the times (31% for Coward and 4% for Risky). Ambitious pig also wins when two others get closer and hit the die a small number of times (2 rolls the Coward and 4 rolls the Risky). In this scenario the Ambitious wins 58% of times (5% for Coward and 37% for Risky). By the way, these two scenarios sound very unreal.

Being ambitious seems to be dangerous but works well when you play against a crazy and a chicken or against very conservative players.

From my point of view, this is a good example to experiment with simulations, game strategies and xkcd style graphics.

The code:

require(ggplot2)
require(extrafont)
#Number of hits for Coward
CowardLower=2
CowardUpper=2
#Number of hits for Risky
RiskyLower=4
RiskyUpper=4
game=data.frame(ROUND=0, part.p1=0, part.p2=0, part.p3=0, Coward=0, Risky=0, Ambitious=0)
while(max(game$Coward)<100 & max(game$Risky)<100 & max(game$Ambitious)<100)
{
#Coward Little Pig
p1=sample(1:6,sample(CowardLower:CowardUpper,1), replace=TRUE)
s1=min(min(p1-1),1)*sum(p1)
#Risky Little Pig
p2=sample(1:6,sample(RiskyLower:RiskyUpper,1), replace=TRUE)
s2=min(min(p2-1),1)*sum(p2)
#Ambitious Little Pig
s3=0
repeat {
p3=sample(1:6,1)
s3=(p3+s3)*min(min(p3-1),1)
if (p3==1|s3>max(s1,s2)) break
}
game[nrow(game)+1,]=c(max(game$ROUND)+1,s1,s2,s3,max(game$Coward)+s1,max(game$Risky)+s2,max(game$Ambitious)+s3)
}
opts=theme(
panel.background = element_rect(fill="darkolivegreen1"),
panel.border = element_rect(colour="black", fill=NA),
axis.line = element_line(size = 0.5, colour = "black"),
axis.ticks = element_line(colour="black"),
panel.grid.major = element_line(colour="white", linetype = 1),
panel.grid.minor = element_blank(),
axis.text.y = element_text(colour="black"),
axis.text.x = element_text(colour="black"),
text = element_text(size=25, family="xkcd"),
legend.key = element_blank(),
legend.position = c(.2,.75),
legend.background = element_blank(),
plot.title = element_text(size = 50)
)
ggplot(game, mapping=aes(x=game$ROUND, y=game$Coward)) +
geom_line(color="red", size=1.5) +
geom_line(aes(x=game$ROUND, y=game$Risky), color="blue", size=1.5) +
geom_line(aes(x=game$ROUND, y=game$Ambitious), color="green4", size=1.5) +
geom_point(aes(x=game$ROUND, y=game$Coward, colour="c1"), size=5.5) +
geom_point(aes(x=game$ROUND, y=game$Risky, colour="c2"), size=5.5) +
geom_point(aes(x=game$ROUND, y=game$Ambitious, colour="c3"), size=5.5) +
ggtitle("THE THREE LITTLE PIGS") +
xlab("ROUND") + ylab("SCORING") +
geom_text(aes(max(game$ROUND), max(max(game$Coward, game$Risky, game$Ambitious)), hjust=1.2, family="xkcd", label="WINNER!"), size=10)+
geom_hline(yintercept=100, linetype=2, size=1)+
scale_y_continuous(breaks=seq(0, max(max(game$Coward, game$Risky, game$Ambitious))+10, 10))+
scale_x_continuous(breaks=seq(0, max(game$ROUND), 1))+
scale_colour_manual("",
labels = c(paste("Coward: ", CowardLower, "-", CowardUpper, " hits", sep = ""), paste("Risky: ", RiskyLower, "-", RiskyUpper, " hits", sep = ""), "Ambitious"),
breaks = c("c1", "c2", "c3"),
values = c("red", "blue", "green4"))+ opts

floweR

It is the time you have wasted for your rose that makes your rose so important (Antoine de Saint-Exupéry, The Little Prince)

Yesterday I found a package called circular and I could not suppress to do this:

rose

Make your own floweRs:

require(circular)
for (i in 1:8)
{
  rose.diag(circular(runif(5000, 0, 2*pi)), 
            bins = (25-round(abs(jitter(0, amount=2*i)))), 
            axes=FALSE,
            border=rgb(255,50+204*((i-1)/8),50+204*((i-1)/8), alpha=255, max=255),
            ticks = FALSE,
            col=rgb(250,0+204*((i-1)/8),0+204*((i-1)/8), alpha=255, max=255), 
            control.circle=circle.control(lty=0), 
            shrink=0.22+(i-1)*(2-0.22)/8)
  par(new=TRUE)  
}
for (i in 1:150)
  {
  q = runif(1)*pi*2
  r = sqrt(runif(1))
  x = (0.18*r)* cos(q)
  y = (0.18*r)* sin(q)
  points(x, y, 
         col = rgb(255 ,sample(80:120, 1), 0, alpha= 95, max=255),
         bg = rgb(255 ,sample(100:200, 1), 0, alpha= 95, max=255), pch=21, cex=2)
  }

How Do Cities Feel?

If you are lost and feel alone, circumnavigate the globe (For You, Coldplay)

You can not consider yourself a R-blogger until you do an analysis of Twitter using twitteR package. Everybody knows it. So here I go.

Inspired by the fabulous work of Jonathan Harris I decided to compare human emotions of people living (or twittering in this case) in different cities. My plan was analysing tweets generated in different locations of USA and UK with one thing in common: all of them must contain the string “I FEEL”. These are the main steps I followed:

  • Locate cities I want to analyze using world cities database of maps package
  • Download tweets around these locations using searchTwitter function of twitteR package.
  • Cross tweets with positive and negative lists of words and calculate a simple scoring for each tweet as number of positive words – number of negative words
  • Calculate how many tweets have non-zero scoring; since these tweets put into words some emotion I call them sentimental tweets
  • Represent cities in a bubble chart where x-axis is percentage of sentimental tweets, y-axis is average scoring and size of bubble is population

This is the result of my experiment:HowDoCitiesFeel3

These are my conclusions (please, do not take it seriously):

  • USA cities seem to have better vibrations and are more sentimental than UK ones
  • Capital city is the happiest one for both countries
  • San Francisco (USA) is the most sentimental city of the analysis; on the other hand, Liverpool (UK) is the coldest one
  • The more sentimental, the better vibrations

From my point of view, this analysis has some important limitations:

  • It strongly depends on particular events (i.e. local football team wins the championship)
  • I have no idea of what kind of people is behind tweets
  • According to my experience, searchTwitter only works well for a small number of searches (no more than 300); for larger number of tweets to return, it use to give malformed JSON response error from server

Anyway, I hope it will serve as starting point of some other analysis in the future. At least, I learned interesting things about R doing it.

Here you have the code:

library(twitteR)
library(RCurl)
library(maps)
library(plyr)
library(stringr)
library(bitops)
library(scales)
#Register
if (!file.exists('cacert.perm'))
{
  download.file(url = 'http://curl.haxx.se/ca/cacert.pem', destfile='cacert.perm')
}
requestURL="https://api.twitter.com/oauth/request_token"
accessURL="https://api.twitter.com/oauth/access_token"
authURL="https://api.twitter.com/oauth/authorize"
consumerKey = "YOUR CONSUMER KEY HERE"
consumerSecret = "YOUR CONSUMER SECRET HERE"
Cred <- OAuthFactory$new(consumerKey=consumerKey,
                         consumerSecret=consumerSecret,
                         requestURL=requestURL,
                         accessURL=accessURL,
                         authURL=authURL)
Cred$handshake(cainfo=system.file("CurlSSL", "cacert.pem", package="RCurl"))
#Save credentials
save(Cred, file="twitter authentification.Rdata")
load("twitter authentification.Rdata")
registerTwitterOAuth(Cred)
options(RCurlOptions = list(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")))
#Cities to analyze
cities=data.frame(
  CITY=c('Edinburgh', 'London', 'Glasgow', 'Birmingham', 'Liverpool', 'Manchester',
         'New York', 'Washington', 'Las Vegas', 'San Francisco', 'Chicago','Los Angeles'),
  COUNTRY=c("UK", "UK", "UK", "UK", "UK", "UK", "USA", "USA", "USA", "USA", "USA", "USA"))
data(world.cities)
cities2=world.cities[which(!is.na(match(
str_trim(paste(world.cities$name, world.cities$country.etc, sep=",")),
str_trim(paste(cities$CITY, cities$COUNTRY, sep=","))
))),]
cities2$SEARCH=paste(cities2$lat, cities2$long, "10mi", sep = ",")
cities2$CITY=cities2$name
#Download tweets
tweets=data.frame()
for (i in 1:nrow(cities2))
{
  tw=searchTwitter("I FEEL", n=400, geocode=cities2[i,]$SEARCH)
  tweets=rbind(merge(cities[i,], twListToDF(tw),all=TRUE), tweets)
}
#Save tweets
write.csv(tweets, file="tweets.csv", row.names=FALSE)
#Import csv file
city.tweets=read.csv("tweets.csv")
#Download lexicon from http://www.cs.uic.edu/~liub/FBS/opinion-lexicon-English.rar
hu.liu.pos = scan('lexicon/positive-words.txt',  what='character', comment.char=';')
hu.liu.neg = scan('lexicon/negative-words.txt',  what='character', comment.char=';')
#Function to clean and score tweets
score.sentiment=function(sentences, pos.words, neg.words, .progress='none')
{
  require(plyr)
  require(stringr)
  scores=laply(sentences, function(sentence, pos.word, neg.words) {
    sentence=gsub('[[:punct:]]','',sentence)
    sentence=gsub('[[:cntrl:]]','',sentence)
    sentence=gsub('\\d+','',sentence)
    sentence=tolower(sentence)
    word.list=str_split(sentence, '\\s+')
    words=unlist(word.list)
    pos.matches=match(words, pos.words)
    neg.matches=match(words, neg.words)
    pos.matches=!is.na(pos.matches)
    neg.matches=!is.na(neg.matches)
    score=sum(pos.matches) - sum(neg.matches)
    return(score)
  }, pos.words, neg.words, .progress=.progress)
  scores.df=data.frame(score=scores, text=sentences)
  return(scores.df)
}
cities.scores=score.sentiment(city.tweets[1:nrow(city.tweets),], hu.liu.pos, hu.liu.neg, .progress='text')
cities.scores$pos2=apply(cities.scores, 1, function(x) regexpr(",",x[2])[1]-1)
cities.scores$CITY=apply(cities.scores, 1, function(x) substr(x[2], 1, x[3]))
cities.scores=merge(x=cities.scores, y=cities, by='CITY')
df1=aggregate(cities.scores["score"], by=cities.scores[c("CITY")], FUN=length)
names(df1)=c("CITY", "TWEETS")
cities.scores2=cities.scores[abs(cities.scores$score)>0,]
df2=aggregate(cities.scores2["score"], by=cities.scores2[c("CITY")], FUN=length)
names(df2)=c("CITY", "TWEETS.SENT")
df3=aggregate(cities.scores2["score"], by=cities.scores2[c("CITY")], FUN=mean)
names(df3)=c("CITY", "TWEETS.SENT.SCORING")
#Data frame with results
df.result=join_all(list(df1,df2,df3,cities2), by = 'CITY', type='full')
#Plot results
radius <- sqrt(df.result$pop/pi)
symbols(100*df.result$TWEETS.SENT/df.result$TWEETS, df.result$TWEETS.SENT.SCORING, circles=radius,
        inches=0.85, fg="white", bg="gold", xlab="Sentimental Tweets", ylab="Scoring Of Sentimental Tweets (Average)",
        main="How Do Cities Feel?")
text(100*df.result$TWEETS.SENT/df.result$TWEETS, df.result$TWEETS.SENT.SCORING, paste(df.result$CITY, df.result$country.etc, sep="-"), cex=1, col="gray50")

Shakespeare Is More Monkey-Friendly Than Cervantes

Ford, there is an infinite number of monkeys outside who want to talk to us about this script for Hamlet they have worked out (from Episode 2 of The Hitchhiker’s Guide to the Galaxy by Douglas Adams)

Some days ago I was talking with a friend about the infinite monkey theorem which is a funny interpretation of what thinking-in-infinite can produce. The same day, in my weekly English class, my teacher said that Anglo-saxon words do tend to be short, very often monosyllabic such as function words such as to, of, from etc and everyday words such as go, see run, eat, etc.

Both things made me think that a monkey could have easier to type a Shakespeare text rather than a Cervantes one. I cannot imagine a definitive way to demonstrate this but this experiment support my hypothesis. After simulating random words of 2, 3, 4 and 5 characters I look for them in English(1) and Spanish(2) dictionaries, which I previously downloaded from here. Result: I find more random words in the English one. These are the results of my experiment:monkey_typewriter

For example, around 38% of two-chars words match with English dictionary and only 9% with Spanish one. This is why I think that, in the infinite, I would be easier for a monkey to replicate a Shakespeare text than a Cervantes one.

Here you have the code:

library(ggplot2)
library(scales)
esp.dic=data.frame(LANG="ESP", WORD=readLines("ES.dic"))
eng.dic=data.frame(LANG="ENG", WORD=readLines("UK.dic"))
df.lang=do.call("rbind", list(esp.dic, eng.dic))
df.lang$WORD=tolower(iconv(df.lang$WORD, to="ASCII//TRANSLIT"))
df.lang=unique(df.lang)
results=data.frame(LANG=character(0), OCCURRENCES=numeric(0), SIZE=numeric(0), LENGTH=numeric(0))
for (i in 2:5)
{
df.monkey=data.frame(WORD=replicate(20000, paste(sample(c(letters), i, replace = TRUE), collapse='')))
results=rbind(results, data.frame(setNames(aggregate(WORD ~ ., data = merge(df.lang, df.monkey, by="WORD"), FUN=length), c("LANG","OCCURRENCES")), SIZE=20000, LENGTH=i))
}
opt=theme(panel.background = element_rect(fill="gray92"),
panel.grid.minor = element_blank(),
panel.grid.major.x = element_blank(),
panel.grid.major.y = element_line(color="white", size=1.5),
plot.title = element_text(size = 35),
axis.title = element_text(size = 20, color="gray35"),
axis.text = element_text(size=16),
axis.ticks = element_blank(),
axis.line = element_line(colour = "white"))
ggplot(data=results, aes(x=LENGTH, y=OCCURRENCES/SIZE, colour=LANG))+
geom_line(size = 2)+
scale_colour_discrete(guide = FALSE) +
geom_point(aes(fill=LANG),size=10, colour="gray92",pch=21)+
scale_x_continuous("word length", labels=c("two chars", "three chars", "four chars", "five chars"))+
scale_y_continuous("probability of existence", limits=c(0, 0.4), labels = percent)+
labs(title = "What if you put a monkey in front of a typewriter?")+
opt + scale_fill_discrete(name="Dictionary", breaks=c("ESP", "ENG"), labels=c("Spanish", "English"))

(1) The English dictionary was originally compiled from public domain sources
for the amSpell spell-checker by Erik Frambach e-mail: e.h.m.frambach@eco.rug.nl
(2) The Spanish dictionary has been elaborated by Juan L. Varona, Dpto. de Matematicas y Computacion, Universidad de La Rioja, Calle Luis de Ulloa s/n, 26004 SPAIN e-mail: jvarona@siur.unirioja.es

The mnemoneitoR

AND I HAVE A GREAT REJOICING DAY (mnemonic rule generated by mnemoneitoR for first 7 digits of Pi according to The Wonderful Wizard Of Oz)

Is there some number impossible to memorize? Do not worry, here comes mnemoneitoR: the tool that you was always looking for! With mnemoneitoR you can translate any number into an easy-to-remember phrase inspired by your favorite book. It is very easy: choose a book, enter the number and mnemoneitoR will show you as many possibilities as you want. Just choose the one you like most!

There are many webs about mnemonics in the Internet, like this one. One of my favourite menmonic devices for Pi is:

HOW I WANT A DRINK, ALCOHOLIC OF COURSE, AFTER THE HEAVY LECTURES INVOLVING QUANTUM MECHANICS

The number of letters in each word gives the respective number in the sequence (i.e., 3.14159265358979).

For professional purposes, I am learning how to manage texts in R and I discovered a very useful package called stringr. This is the only one I need for this experiment. The process is simple: I download a book from Project Gutenberg site, clean and split the text and do simulations on the fly of a Markov Chain generated from the words of the book. Step by step:

  • Downloading the book is quite simple. You search the one you want, copy the url in the code (after line “CHOOSE YOUR FAVORITE BOOK HERE”) and no more.
  • After loading the text, some easy tasks are needed: remove header and footer lines, split text into words, turn them into uppercase, remove non-text characters … typical things working with texts.
  • After reading the number you want to translate, I choose a word sampling along all words with the same number of letters as the first digit with probability equal to the number of appearances. This is how I initialize the phrase. Next word are chose among the set of words which are preceded by the first one and have the same number of letters as the second digit with probability equal to number of appearances, and so on. This is a simulation on the fly of Markov Chain because I do not have to calculate the chain explicitly.
  • I always translate Zero with the same word you choose. I like using “OZ” instead Zero.

Most of the phrases do not have any sense but are quite funny. Few of them have some sense and maybe with a small tweak, can change into full of meaning sentences. Here you have some samples of the output of mnemoneitoR:

mnemoneitoR

I like how the phrases smell like the original book. I will try to improve mnemoneitoR in the future but I can imagine some uses of this current version: message generator for fortune cookies,  a cool way to translate your telephone number into a sentence …

Here you have the code. If you discover nice outputs in your experiments, please let me know:

library(stringr)
# CHOOSE YOUR FAVORITE BOOK HERE (Currently "The Wonderful Wizard of Oz")
TEXTFILE = "data/pg55.txt"
if (!file.exists(TEXTFILE)) {download.file("http://www.gutenberg.org/cache/epub/55/pg55.txt", destfile = TEXTFILE)}
textfile <- readLines(TEXTFILE)
# Remove header and footer, concatenate all of the lines, remove non-text and double spaces chars and to upper
textfile = textfile[(grep('START OF THIS PROJECT', textfile, value=FALSE)+1:grep('END OF THIS PROJECT', textfile, value=FALSE)-1)]
textfile <- paste(textfile, collapse = " ")
textfile <- gsub("[^a-zA-Z ]","", textfile)
textfile <- toupper(textfile)
textfile <- gsub("^ *|(?<= ) | *$", "", textfile, perl=T)
# Split file into words
textfile.words <- strsplit(textfile," ")
textfile.words.freq <- as.data.frame(table(textfile.words));
names(textfile.words.freq) <- c("word", "freq")
textfile.words.freq$length <- apply(data.frame(textfile.words.freq[,c("word")]), 1, function(x) nchar(x))
# ENTER YOUR NUMBER HERE!!!!!!
number <- 3.1415926
number <- gsub("[^0-9]","", as.character(number))
# Define the word representing Zero
zero.word = "OZ"
fg <- as.integer(substr(number, 1, 1))
df <- textfile.words.freq[textfile.words.freq$length==fg,]
wd <- sample(df$word, size=1, prob=df$freq)
phrase <- c(as.character(wd))
for (j in 2:nchar(number))
{
fg <- as.integer(substr(number, j, j)) if (fg>0)
{
lc <- as.data.frame(str_locate_all(textfile, as.vector(paste(wd, " ", sep = ""))))
lc$char <- apply(lc, 1, function(x) substr(textfile, as.integer(x[2])+1+fg, as.integer(x[2])+1+fg))
fq <- as.data.frame(table(apply(lc[lc$char==" ",], 1, function(x) substr(textfile, as.integer(x[2])+1, as.integer(x[2])+fg))))
if (nrow(fq)==0) fq <- data.frame(word= character(0), freq= integer(0))
names(fq) <- c("word", "freq")
fq$length <- apply(fq, 1, function(x) nchar(gsub(" ","", x[1])))
fq <- fq[fq$length==fg,]
wd <- if(nrow(fq)>0) sample(fq$word, size=1, prob=fq$freq)
else
{
df <- textfile.words.freq[textfile.words.freq$length==fg,]
wd <- sample(df$word, size=1, prob=df$freq)
}
}
else wd <- zero.word
phrase <- c(phrase, as.character(wd))
}
print(paste(phrase, collapse = " "))

What If You Dig A Hole Through The Earth?

It suddenly struck me that that tiny pea, pretty and blue, was the Earth. I put up my thumb and shut one eye, and my thumb blotted out the planet Earth. I didn’t feel like a giant. I felt very, very small (Neil Armstrong)

Where would you come out if you dig a hole straight downward from where you live through the center of the Earth? Supposing you survive to the extremely high pressure and temperature of the nucleus, would you find water or land at the other side? It maybe sound a ridiculous question (I will not refute that) but in this post I will estimate how many people would die drowned and how many will find land in the antipode of where they live. At least, knowledge does not take up any space.

I found a database of the United Nations with very useful information for my experiment: longitude, latitude and population of all capital cities of the world in 2011(1). I assumed that capital cities are a good sample of where people live. Maybe is not the best one since some very big countries are represented by only a city but is a good way to obtain a quick estimation. On the other hand, capital cities represent approximately 7% of the world population so in this sense is a very good sample.

Process is simple: loading the xls file, calculating the antipode of each point and checking where it is. Google provides information about country where a coordinate belongs. For coordinates on the sea, no information is returned. Once you have this, is easy to calculate proportion of people that will find water. My estimation is around 77% of people will find water on the other side. Taking into account that all people leave from land and approximately 70% of the Earth’s surface is water, this figure seems to be small but since both poles are symmetrical and are uninhabited, the estimation makes sense. Here you have a map with the result of the experimet. Points are capital cities and size is related with population. In blue, capitals with antipode on the sea and in brown, capitals with antipode in land:
WorldMapR
By the way, I am one of the 23% of lucky people that would find land in the other side. I live in Madrid, Spain:
madrid
An if some rainy afternoon having little to do I dig a hole through the Earth, I will appear in a place called Weber, in New Zealand:
weber
This estimation can be very silly but the physics involved in the experiment are very interesting as you can see here. By the way, there is a film called Total Recall (2012) where the only way to travel between last two cities in the world is using an elevator through the Earth. Here you have the code:

library(xlsx)
library(ggmap)
library(mapdata)
library(ggplot2)
#The xls file is in http://esa.un.org/unpd/wup/CD-ROM/WUP2011-F13-Capital_Cities.xls
CapitalCities <- read.xlsx("WUP2011-F13-Capital_Cities.xls", sheetName="Capital_Cities", startRow=13,header=TRUE)
names(CapitalCities) = gsub("\\.", "", names(CapitalCities))
#Obtain symmetric coordinates for each capital
CapitalCities$LatitudeSym <- -CapitalCities$Latitude
CapitalCities$LongitudeSym <- -sign(CapitalCities$Longitude)*(180-abs(CapitalCities$Longitude))
CapitalCities$DigResult <- apply(CapitalCities, 1, function(x) {unlist(revgeocode(c(as.numeric(x[11]),as.numeric(x[10]))))})
CapitalCities$Drowned <- is.na(CapitalCities$DigResult)*1
#Percentage of population saved
sum(CapitalCities$Drowned*CapitalCities$Populationthousands)/sum(CapitalCities$Populationthousands)
world <- map_data("world")
opt <- theme(legend.position="none",
axis.ticks=element_blank(),
axis.title=element_blank(),
axis.text =element_blank(),
plot.title = element_text(size = 35),
panel.background = element_rect(fill="turquoise1"))
p <- ggplot()
p <- p + geom_polygon(data=world, aes(x=long, y=lat, group = group),colour="white", fill="lightgoldenrod2" )
p <- p + geom_point(data=CapitalCities, aes(x=Longitude, y=Latitude, color=Drowned, size = Populationthousands)) + scale_size(range = c(2, 20), name="Population (thousands)")
p <- p + labs(title = "What if you dig a hole through the Earth?")
p <- p + scale_colour_gradient(low = "brown", high = "blue")
p <- p + annotate("rect", xmin = -135, xmax = -105, ymin = -70, ymax = -45, fill = "white")
p <- p + annotate("text", label = "Drowned", x = -120, y = -60, size = 6, colour = "blue")
p <- p + annotate("text", label = "Saved", x = -120, y = -50, size = 6, colour = "brown")
p <- p + geom_point(aes(x = -120, y = -65), size=8, colour="blue")
p <- p + geom_point(aes(x = -120, y = -55), size=8, colour = "brown")
p + opt
# Get a map of Spain, centered and signed in Madrid
madrid <- geocode('Madrid, Spain')
map.madrid <- get_map( location = as.numeric(madrid), color = "color", maptype = "roadmap", scale = 2, zoom = 6)
ggmap(map.madrid) + geom_point(aes(x = lon, y = lat), data = madrid, colour = 'red', size = 4)
# Get a map of New Zealand, centered and signed in Weber (the antipode of Madrid)
weber <- geocode('Weber, New Zealand')
map.weber <- get_map( location = as.numeric(weber), color = "color", maptype = "roadmap", scale = 2, zoom = 6)
ggmap(map.weber) + geom_point(aes(x = lon, y = lat), data = weber, colour = 'red', size = 4)

(1) United Nations, Department of Economic and Social Affairs, Population Division (2012). World Urbanization Prospects: The 2011 Revision, CD-ROM Edition.

The Pythagorean Tree Is In Bloom

There is geometry in the humming of the strings, there is music in the spacing of the spheres (Pythagoras)

Spring is here and I will be on holiday next week. I cannot be more happy! It is time to celebrate so I have drawn another fractal. It is called the Pythagorean Tree:

PythagoreanTree

Here you have the code. See you soon:

library("grid")
l=0.15 #Length of the square
grid.newpage()
gr <- rectGrob(width=l, height=l, name="gr") #Basic Square
pts <- data.frame(level=1, x=0.5, y=0.1, alfa=0) #Centers of the squares
for (i in 2:10) #10=Deep of the fractal. Feel free to change it
{
  df<-pts[pts$level==i-1,]
  for (j in 1:nrow(df))
  {
    pts <- rbind(pts, 
                 c(i, 
                   df[j,]$x-2*l*((1/sqrt(2))^(i-1))*sin(df[j,]$alfa+pi/4)-0.5*l*((1/sqrt(2))^(i-2))*sin(df[j,]$alfa+pi/4-3*pi/4), 
                   df[j,]$y+2*l*((1/sqrt(2))^(i-1))*cos(df[j,]$alfa+pi/4)+0.5*l*((1/sqrt(2))^(i-2))*cos(df[j,]$alfa+pi/4-3*pi/4),                  
                   df[j,]$alfa+pi/4))
    pts <- rbind(pts, 
                 c(i, 
                   df[j,]$x-2*l*((1/sqrt(2))^(i-1))*sin(df[j,]$alfa-pi/4)-0.5*l*((1/sqrt(2))^(i-2))*sin(df[j,]$alfa-pi/4+3*pi/4), 
                   df[j,]$y+2*l*((1/sqrt(2))^(i-1))*cos(df[j,]$alfa-pi/4)+0.5*l*((1/sqrt(2))^(i-2))*cos(df[j,]$alfa-pi/4+3*pi/4),                  
                   df[j,]$alfa-pi/4))
  }
}
for (i in 1:nrow(pts))
{
  grid.draw(editGrob(gr, vp=viewport(x=pts[i,]$x, y=pts[i,]$y, w=((1/sqrt(2))^(pts[i,]$level-1)), h=((1/sqrt(2))^(pts[i,]$level-1)), angle=pts[i,]$alfa*180/pi), 
            gp=gpar(col=0, lty="solid", fill=rgb(139*(nrow(pts)-i)/(nrow(pts)-1), 
                                                 (186*i+69*nrow(pts)-255)/(nrow(pts)-1), 
                                                  19*(nrow(pts)-i)/(nrow(pts)-1), 
                                                  alpha= (-110*i+200*nrow(pts)-90)/(nrow(pts)-1), max=255))))
}