Tag Archives: ggplot2

The Awesome Parrondo’s Paradox

December 16, 2014Curiosities, Gamesggplot, ggplot2, paradox, Parrondo, R, Rstats, simulation@aschinchon

A technique succeeds in mathematical physics, not by a clever trick, or a happy accident, but because it expresses some aspect of physical truth (O. G. Sutton)

Imagine three unbalanced coins:

Coin 1: Probability of head=0.495 and probability of tail=0.505
Coin 2: Probability of head=0.745 and probability of tail=0.255
Coin 3: Probability of head=0.095 and probability of tail=0.905

Now let’s define two games using these coins:

Game A: You toss coin 1 and if it comes up head you receive 1€ but if not, you lose 1€
Game B: If your present capital is a multiple of 3, you toss coin 2. If not, you toss coin 3. In both cases, you receive 1€ if coin comes up head and lose 1€ if not.

Played separately, both games are quite unfavorable. Now let’s define Game A+B in which you toss a balanced coin and if it comes up head, you play Game A and play Game B otherwise. In other words, in Game A+B you decide between playing Game A or Game B randomly.

Starting with 0€, it is easy to simulate the three games along 500 plays. This is an example of one of these simulations:

Resulting profit of Game A+B after 500 plays is +52€ and is -9€ and -3€ for Games A and B respectively. Let’s do some more simulations (I removed legends and titles but colors of games are the same):

As you can see, Game A+B is the most profitable in almost all the previous simulations. Coincidence? Not at all. This is a consequence of the stunning Parrondo’s Paradox which states that two losing games can combine into a winning one.

If you still don’t believe in this brain-crashing paradox, following you can see the empirical distributions of final profits of three games after 1.000 plays:

After 1000 plays, mean profit of Game A is -13€, is -7€ for Game B and 17€ for Game A+B.

This paradox was discovered in the last nineties by the Spanish physicist Juan Parrondo and can help to explain, among other things, why investing in losing shares can result in obtaining big profits. Amazing:

require(ggplot2)
require(scales)
library(gridExtra)
opts=theme(
  legend.position = "bottom",
  legend.background = element_rect(colour = "black"),
  panel.background = element_rect(fill="gray98"),
  panel.border = element_rect(colour="black", fill=NA),
  axis.line = element_line(size = 0.5, colour = "black"),
  axis.ticks = element_line(colour="black"),
  panel.grid.major = element_line(colour="gray75", linetype = 2),
  panel.grid.minor = element_blank(),
  axis.text.y = element_text(colour="gray25", size=15),
  axis.text.x = element_text(colour="gray25", size=15),
  text = element_text(size=20),
  plot.title = element_text(size = 35))
PlayGameA = function(profit, x, c) {if (runif(1) < c-x) profit+1 else profit-1}
PlayGameB = function(profit, x1, c1, x2, c2) {if (profit%%3>0) PlayGameA(profit, x=x1, c=c1) else PlayGameA(profit, x=x2, c=c2)}
####################################################################
#EVOLUTION
####################################################################
noplays=500
alpha=0.005
profit0=0
results=data.frame(Play=0, ProfitA=profit0, ProfitB=profit0, ProfitAB=profit0)
for (i in 1:noplays) {results=rbind(results, c(i,
    PlayGameA(profit=results[results$Play==(i-1),2], x =alpha, c =0.5),
    PlayGameB(profit=results[results$Play==(i-1),3], x1=alpha, c1=0.75, x2=alpha, c2=0.1),
    if (runif(1)<0.5) PlayGameA(profit=results[results$Play==(i-1),4], x =alpha, c =0.5) else PlayGameB(profit=results[results$Play==(i-1),4], x1=alpha, c1=0.75, x2=alpha, c2=0.1)
    ))}
results=rbind(data.frame(Play=results$Play, Game="A",   Profit=results$ProfitA),
              data.frame(Play=results$Play, Game="B",   Profit=results$ProfitB),
              data.frame(Play=results$Play, Game="A+B", Profit=results$ProfitAB))
ggplot(results, aes(Profit, x=Play, y=Profit, color = Game)) +
  scale_x_continuous(limits=c(0,noplays), "Plays")+
  scale_y_continuous(limits=c(-75,75), expand = c(0, 0), "Profit")+
  labs(title="Evolution of profit games along 500 plays")+
  geom_line(size=3)+opts
####################################################################
#DISTRIBUTION
####################################################################
noplays=1000
alpha=0.005
profit0=0
results2=data.frame(Play=numeric(0), ProfitA=numeric(0), ProfitB=numeric(0), ProfitAB=numeric(0))
for (j in 1:100) {results=data.frame(Play=0, ProfitA=profit0, ProfitB=profit0, ProfitAB=profit0)
  for (i in 1:noplays) {results=rbind(results, c(i,
      PlayGameA(profit=results[results$Play==(i-1),2], x =alpha, c =0.5),
      PlayGameB(profit=results[results$Play==(i-1),3], x1=alpha, c1=0.75, x2=alpha, c2=0.1),
      if (runif(1)<0.5) PlayGameA(profit=results[results$Play==(i-1),4], x =alpha, c =0.5)
      else PlayGameB(profit=results[results$Play==(i-1),4], x1=alpha, c1=0.75, x2=alpha, c2=0.1)))}
      results2=rbind(results2, results[results$Play==noplays, ])}
results2=rbind(data.frame(Game="A", Profit=results2$ProfitA),
data.frame(Game="B", Profit=results2$ProfitB),
data.frame(Game="A+B", Profit=results2$ProfitAB))
ggplot(results2, aes(Profit, fill = Game)) +
  scale_x_continuous(limits=c(-150,150), "Profit")+
  scale_y_continuous(limits=c(0,0.02), expand = c(0, 0), "Density", labels = percent)+
  labs(title=paste("Parrondo's Paradox (",as.character(noplays)," plays)",sep=""))+
  geom_density(alpha=.75)+opts

The World We Live In #3: Breastfeeding

December 2, 2014The World We Live InBreastfeeding, Fertility, ggplot, ggplot2, R, Rstats, UNdata@aschinchon

Facts are stubborn, but statistics are more pliable (Mark Twain)

According to World Health Organization, exclusive breastfeeding is recommended up to 6 months of age, with continued breastfeeding along with appropriate complementary foods up to two years of age or beyond. Thus, the defining characteristic of continued breastfeeding is that the infant between 6 months and 2 years of age receives at least some breast milk regardless of the quantity or the presence of other foods or liquids in the diet.

On the other hand, as can be read in The World Factbook of Central Intelligence Agency, the Total Fertility Rate (TFR) is the average number of children that would be born to a woman over her lifetime if she were to experience the exact current age-specific fertility rates through her lifetime and she were to survive from birth through the end of her reproductive life. It is obtained by summing the single-year age-specific rates at a given time.

This is how the world is arranged according to these two rates:

There are many differences between countries. Both rates are very low in some east European countries like Ukraine, Bosnia, Belarus and Moldova. On the other hand both of them are very high in Benin, Rwanda, Burkina Faso and Malawi, all of them African. Also African countries are Angola, Nigeria and Somalia where fertility rate is very high but breastfeeding is not very established (Timor-Leste in Asia belongs to this segment as well); and women in Nepal, Bangladesh, Sri-Lanka and India feed their moderate number of descendants with their own milk.

We live in a complex and beautiful world which cannot be measured only with averages nor standard deviations:

#Continued breastfeeding rate: http://data.un.org/Data.aspx?d=SOWC&f=inID%3a89
#Total fertility rate (TFR): http://data.un.org/Data.aspx?d=SOWC&f=inID%3a127
#Population: http://data.un.org/Data.aspx?d=SOWC&f=inID%3a105
require("sqldf")
require("ggplot2")
require("scales")
breastfeeding=read.csv("UNdata_Export_20141122_122134175.csv", nrows=124, header=T, row.names=NULL)
fertility=read.csv("UNdata_Export_20141122_122330581.csv", nrows=570, header=T, row.names=NULL)
population=read.csv("UNdata_Export_20141122_142359579.csv", nrows=999, header=T, row.names=NULL)
colnames(breastfeeding)[1]="Country"
colnames(fertility)[1]="Country"
colnames(population)[1]="Country"
data=sqldf("SELECT a.Country, a.Value as Pop, b.Value as Fertility, c.Value as Breastfeeding
           FROM population a inner join fertility b
           on (a.Country=b.Country) INNER JOIN breastfeeding c
           on (a.Country=c.Country)
           where a.Subgroup = 'Total' AND b.Year = 2011
           AND a.Country NOT IN ('World', 'South Asia',
           'Sub-Saharan Africa', 'Least Developed Countries/Territories', 'Eastern and Southern Africa',
           'East Asia and Pacific')")
opts=theme(
  panel.background = element_rect(fill="gray98"),
  panel.border = element_rect(colour="black", fill=NA),
  axis.line = element_line(size = 0.5, colour = "black"),
  axis.ticks = element_line(colour="black"),
  panel.grid.major = element_line(colour="gray75", linetype = 2),
  panel.grid.minor = element_blank(),
  axis.text.y = element_text(colour="gray25", size=15),
  axis.text.x = element_text(colour="gray25", size=15),
  text = element_text(size=20),
  legend.key = element_blank(),
  legend.position = "none",
  legend.background = element_blank(),
  plot.title = element_text(size = 45))
ggplot(data, aes(x=Fertility, y=Breastfeeding/100, size=log(Pop), label=Country), guide=FALSE)+
  geom_point(colour="white", fill="darkorchid2", shape=21, alpha=.55)+
  scale_size_continuous(range=c(2,40))+
  scale_x_continuous(limits=c(1,7))+
  scale_y_continuous(limits=c(0,1), labels = percent)+
  labs(title="The World We Live In #3: Breastfeeding",
       x="Total fertility rate (TFR)",
       y="Continued breastfeeding rate")+
  geom_text(data=subset(data, Fertility>5 & (Breastfeeding>75|Breastfeeding<40)), size=5.5, colour="gray25", hjust=0, vjust=0)+
  geom_text(data=subset(data, Fertility<3 & Breastfeeding>75), size=5.5, colour="gray25", hjust=0, vjust=0)+
  geom_text(data=subset(data, Fertility<2 & Breastfeeding<12), size=5.5, colour="gray25", hjust=0, vjust=0)+
  geom_text(aes(5, 0), colour="gray25", hjust=0, label="Source: United Nations (size of bubble depending on population)", size=4)+opts

A Little Present For Coldplay

November 4, 2014DrawingsButterfly Curve, Coldplay, ggplot, ggplot2, polar coordinates, R@aschinchon

Gravity, release me, and don’t ever hold me down, now my feet won’t touch the ground (Coldplay, Life In Technicolor II)

Inspired by this nice post and by this cover of a Coldplay’s album:

I have dared to do this using ggplot, polar coordinates and Google Fonts:

Coldplay: feel free to use it for some future album.

library(ggplot2)
library(extrafont)
windowsFonts(Monoton=windowsFont("Monoton"))
butterfly=function(x) 8-sin(x)+2*sin(3*x)+2*sin(5*x)-sin(7*x)+3*cos(2*x)-2*cos(4*x)
opt=theme(legend.position="none",
          panel.background = element_rect(fill="black"),
          panel.grid = element_blank(),
          axis.ticks=element_blank(),
          axis.title=element_blank(),
          axis.text =element_blank())
ggplot(data.frame(x = c(0, 2*pi)), aes(x)) +
  stat_function(fun=butterfly, geom="density", fill="#FC0C54", colour="#FC0C54") +
  coord_polar(start=-pi)+
  geom_text(x=.5, y=-14, colour="turquoise2", family="Monoton", label="COLDPLAY", size=12)+
  geom_text(x=1.5, y=14, colour="turquoise2", family="Monoton", angle=90, label="Up Down Up Down Up Down", size=6)+
  opt

Size Doesn’t Matter

October 27, 2014Curiosities, Simulationggplot, ggplot2, gridExtra, matching, probability, R, scales, small multiples, xkcd@aschinchon

An invisible red thread connects those destined to meet, regardless of time, place or circumstances. The thread may stretch or tangle, but never break (Ancient Chinese Legend)

I use to play once a year with my friends to Secret Santa (in Spain we call it Amigo Invisible). As you can read in Wikipedia:

Secret Santa is a Western Christmas tradition in which members of a group or community are randomly assigned a person to whom they anonymously give a gift. Often practiced in workplaces or amongst large families, participation in it is usually voluntary. It offers a way for many people to give and receive a gift at low cost, since the alternative gift tradition is for each person to buy gifts for every other person. In this way, the Secret Santa tradition also encourages gift exchange groups whose members are not close enough to participate in the alternative tradition of giving presents to everyone else.

To decide who gives whom, every year is the same: one of us introduces small papers in a bag with the names of participants (one name per paper). Then, each of us picks one paper and sees the name privately. If no one picks their own name, the distribution is valid. If not, we have to start over. Every year we have to repeat process several times until obtaining a valid distribution. Why? Because we are victims of The Matching Problem.

Following the spirit of this talk I have done 16 simulations of the matching problem (for 10, 20, 30 … to 160 items). For example, given n items, I generate 5.000 random vectors sampling without replacement the set of natural numbers from 1 to n. Comparing these random vectors with the ordered one (1,2, …, n) I obtain number of matchings (that is, number of times where ith element of the random vector is equal to i). This is the result of the experiment:

In spite of each of one represents a different number of matchings, all plots are extremely similar. All of them say that probability of not matching any two identical items is around 36% (look at the first bar of all of them). In concrete terms, this probability tends to 1/e (=36,8%) as n increases but does it very quickly.

This result is shocking. It means that if some day the 7 billion people of the world agree to play Secret Santa all together (how nice it would be!), the probability that at least one person chooses his/her own name is around 2/3. Absolutely amazing.

This is the code (note: all lines except two are for plotting):

library(ggplot2)
library(scales)
library(RColorBrewer)
library(gridExtra)
library(extrafont)
results=data.frame(size=numeric(0), x=numeric(0))
for (i in seq(10, by=10, length.out = 16)){results=rbind(results, data.frame(size=i, x=replicate(5000, {sum(seq(1:i)-sample(seq(1:i), size=i, replace=FALSE)==0)})))}
opts=theme(
  panel.background = element_rect(fill="gray98"),
  panel.border = element_rect(colour="black", fill=NA),
  axis.line = element_line(size = 0.5, colour = "black"),
  axis.ticks = element_line(colour="black"),
  panel.grid.major.y = element_line(colour="gray80"),
  panel.grid.major.x = element_blank(),
  panel.grid.minor = element_blank(),
  axis.text.y = element_text(colour="gray25", size=15),
  axis.text.x = element_text(colour="gray25", size=15),
  text = element_text(family="Humor Sans", size=15, colour="gray25"),
  legend.key = element_blank(),
  legend.position = "none",
  legend.background = element_blank(),
  plot.title = element_text(size = 18))
sizes=unique(results$size)
for (i in 1:length(sizes))
{
  data=subset(results, size==sizes[i])
  assign(paste("g", i, sep=""),
         ggplot(data, aes(x=as.factor(x), weight=1/nrow(data)))+
           geom_bar(binwidth=.5, fill=sample(brewer.pal(9,"Set1"), 1), alpha=.85, colour="gray50")+
           scale_y_continuous(limits=c(0,.4), expand = c(0, 0), "Probability", labels = percent)+
           scale_x_discrete(limit =as.factor(0:8), expand = c(0, 0), "Number of matches")+
           labs(title = paste("Matching", as.character(sizes[i]), "items ...", sep=" "))+
           opts)
}
grid.arrange(g1, g2, g3, g4, g5, g6, g7, g8, g9, g10, g11, g12, g13, g14, g15, g16, ncol=4)

The World We Live In #2: To Study Or To Work

October 20, 2014The World We Live InChildren, ggplot2, Malala, Nobel Prize, Peace, R, UNdata@aschinchon

I was getting ready for school and about to wear my uniform when I remembered that our principal had told us not to wear uniforms. So I decided to wear my favorite pink dress (Malala Yousafzai)

After reading the diary of a Pakistani schoolgirl and Malala’s history, there is no doubt of being in front of a brave girl. A girl that will fight against monsters who deprive children of their childhood. A girl who knows that one book, one pen, one child and one teacher can change this unfair world. A girl who knew she had won the Nobel Prize of Peace in her chemistry lesson and finished the school time before making her first statement. A girl for whom the prize is just the beginning: a girl that gives us hope. Long live Malala:

To know where to obtain data for this plot, check out this post. This is the code:

require("sqldf")
require("plyr")
require("stringdist")
childlabour=read.csv("UNdata_Export_20141013_ChildLabour.csv", nrows=335, header=T, row.names=NULL)
education=read.csv("UNdata_Export_20141013_Education.csv", nrows=2994, header=T, row.names=NULL)
population =read.csv("UNdata_Export_20140930_Population.csv",  nrows=12846, header=T, row.names=NULL)
population=rename(population, replace = c("Country.or.Area" = "Country"))
education=rename(education, replace = c("Reference.Area" = "Country"))
education=rename(education, replace = c("Time.Period" = "Year"))
childlabour=rename(childlabour, replace = c("Country.or.Area" = "Country"))
population=sqldf("SELECT a.Country, a.Year, a.Value as Pop
FROM population a INNER JOIN (SELECT Country, MAX(Year) AS Year FROM population GROUP BY 1) b
ON (a.Country=b.Country AND a.Year=b.Year)
WHERE (a.Country NOT LIKE '%INCOME%')
AND (a.Country NOT LIKE '%WORLD%')
AND (a.Country NOT LIKE '%developing%')
AND (a.Country NOT LIKE '%OECD%')
AND (a.Country NOT LIKE '%countries%')
AND (a.Country NOT LIKE '%South Asia%')
AND (a.Country NOT LIKE '%Small states%')
AND (a.Country NOT LIKE '%Euro area%')
AND (a.Country NOT LIKE '%European Union%')
AND (a.Country NOT LIKE '%North America%')")
childlabour=sqldf("SELECT * FROM childlabour WHERE Subgroup='Total 5-14 yr'")
education=sqldf("SELECT a.* FROM education a INNER JOIN (SELECT Country, MAX(Year) AS Year FROM education GROUP BY 1) b
ON (a.Country=b.Country AND a.Year=b.Year)")
data=sqldf("SELECT a.Country, a.Pop, b.Value as ChildLabour, c.Observation_Value as Education
FROM
population a INNER JOIN childlabour b
ON (a.Country=b.Country) INNER JOIN education c
ON (a.Country=c.Country)")
require(ggplot2)
require(scales)
opts=theme(
panel.background = element_rect(fill="gray98"),
panel.border = element_rect(colour="black", fill=NA),
axis.line = element_line(size = 0.5, colour = "black"),
axis.ticks = element_line(colour="black"),
panel.grid.major = element_line(colour="gray75", linetype = 2),
panel.grid.minor = element_blank(),
axis.text.y = element_text(colour="gray25", size=15),
axis.text.x = element_text(colour="gray25", size=15),
text = element_text(size=20),
legend.key = element_blank(),
legend.position = "none",
legend.background = element_blank(),
plot.title = element_text(size = 45)
)
ggplot(data, aes(x=ChildLabour/100, y=Education/100, size=log(Pop), label=Country), guide=FALSE)+
geom_point(colour="white", fill="red", shape=21, alpha=.55)+
scale_size_continuous(range=c(2,40))+
scale_x_continuous(limits=c(0,.5), labels = percent)+
scale_y_continuous(limits=c(0,.12), labels = percent)+
labs(title="The World We Live In #2: To Study Or To Work",
x="% of Child Workers between 5-14 years old",
y="Public Expenditure on Education as % of GNI")+
geom_text(data=subset(data, ChildLabour/100>.3 | Education/100>.07| Education/10<.022), size=5.5, colour="gray25", hjust=0, vjust=0)+
geom_text(aes(.2, .0), colour="gray25", hjust=0, label="Countries of the world (Source: United Nations Statistics Division) Size of bubble depending on population", size=5)+
opts

The World We Live In #1: Obesity And Cells

October 7, 2014The World We Live Incellular, ggplot2, obesity, R, UNdata@aschinchon

Lesson learned, and the wheels keep turning (The Killers – The world we live in)

I discovered this site with a huge amount of data waiting to be analyzed. The first thing I’ve done is this simple graph, where you can see relationship between cellular subscribers and obese people. Bubbles are countries and its size depends on the population:

Some quick conclusions:

The more cellular subscribers, the more obese people
Pacific islands such as Kiribati, Palau and Tonga are plenty of happy people
Singapore people are thinner than they should be
How do Saudi Arabian and Panamanian manage two cellulars?

This is the world we live in.

cellular  =read.csv("UNdata_Export_20140930_cellular.csv",   nrows=193,   header=T, row.names=NULL)
obese     =read.csv("UNdata_Export_20140930_obese.csv",      nrows=567,   header=T, row.names=NULL)
population=read.csv("UNdata_Export_20140930_population.csv", nrows=12846, header=T, row.names=NULL)
require("sqldf")
require("plyr")
population=rename(population, replace = c("Country.or.Area" = "Country"))
population=sqldf("SELECT a.Country, a.Year, a.Value as Population
FROM population a INNER JOIN (SELECT Country, MAX(Year) AS Year FROM population GROUP BY 1) b
      ON (a.Country=b.Country AND a.Year=b.Year)")
cellular=rename(cellular, replace = c("Country.or.Area" = "Country"))
cellular=rename(cellular, replace = c("Value" = "Cellular"))
obese=rename(obese, replace = c("Country.or.Area" = "Country"))
obese=rename(obese, replace = c("Year.s." = "Year"))
obese=sqldf("SELECT a.Country, a.Year, SUBSTR(TRIM(Value), 1, CHARINDEX(' [', TRIM(Value))) as Obeses
FROM obese a INNER JOIN (SELECT Country, MAX(Year) AS Year FROM obese WHERE GENDER='Both sexes' GROUP BY 1) b
ON (a.Country=b.Country AND a.Year=b.Year AND a.GENDER='Both sexes')")
obese$Obeses=as.numeric(obese$Obeses)
data=sqldf("SELECT a.Country, a.Cellular, c.Obeses, b.Population FROM cellular a inner join population b on a.Country = b.Country
      inner join obese c on (a.Country = c.Country) WHERE a.Country NOT IN ('World', 'South Asia')")
require(ggplot2)
require(scales)
opts=theme(
  panel.background = element_rect(fill="gray98"),
  panel.border = element_rect(colour="black", fill=NA),
  axis.line = element_line(size = 0.5, colour = "black"),
  axis.ticks = element_line(colour="black"),
  panel.grid.major = element_line(colour="gray75", linetype = 2),
  panel.grid.minor = element_blank(),
  axis.text.y = element_text(colour="gray25", size=15),
  axis.text.x = element_text(colour="gray25", size=15),
  text = element_text(size=20),
  legend.key = element_blank(),
  legend.position = "none",
  legend.background = element_blank(),
  plot.title = element_text(size = 45)
    )
ggplot(data, aes(x=Cellular/100, y=Obeses/100, size=Population, label=Country), guide=FALSE)+
  geom_point(colour="white", fill="red", shape=21, alpha=.65)+
  scale_size_continuous(range=c(3,35))+
  scale_x_continuous(limits=c(0,2.1), labels = percent)+
  scale_y_continuous(limits=c(0,.6), labels = percent)+
  labs(title="The World We Live In #1: Obesity And Cells",
       x="Cellular Subscribers (per 100 population)",
       y="Adults aged >= 20 years who are obese (%)")+
  geom_text(data=subset(data, Cellular/100 > 1.9 | Obeses/100 > .4 | (Cellular/100 > 1.4 & Obeses/100 < .15)), size=5, colour="gray25", hjust=0, vjust=0)+
  geom_text(aes(.9, .0), colour="blue", hjust=0, label="World's Countries (Source: United Nations Statistics Division. Size of bubble depending on population", size=4)+
  opts

Complex Domain Coloring

October 1, 2014Drawingscomplex numbers, ggplot2, hsv, R@aschinchon

Why don’t you stop doodling and start writing serious posts in your blog? (Cecilia, my beautiful wife)

Choose a function, apply it to a set of complex numbers, paint the result using the HSV technique and be ready to be impressed because images can be absolutely amazing. You only need ggplot2 package and your imagination. This is what happens when function is f(x)=(1+i)log(sin((x³-1)/x)):

To learn more about complex domain coloring, you can go here. If you want to try your own functions, you can find the code below. I will try to write a serious post next time but meanwhile, long live doodles!

require(ggplot2)
f = function(x) (1+1i)*log(sin((x^3-1)/x))
z=as.vector(outer(seq(-5, 5, by =.01),1i*seq(-5, 5, by =.01),'+'))
z=data.frame(x=Re(z),
y=Im(z),
h=(Arg(f(z))<0)*1+Arg(f(z))/(2*pi),
s=(1+sin(2*pi*log(1+Mod(f(z)))))/2,
v=(1+cos(2*pi*log(1+Mod(f(z)))))/2)
z=z[is.finite(apply(z,1,sum)),]
opt=theme(legend.position="none",
panel.background = element_blank(),
panel.grid = element_blank(),
axis.ticks=element_blank(),
axis.title=element_blank(),
axis.text =element_blank())
ggplot(data=z, aes(x=x, y=y)) + geom_tile(fill=hsv(z$h,z$s,z$v))+ opt

Space Invaders

September 18, 2014Drawingsaliens, ggplot, ggplot2, invaders, R, reshape@aschinchon

I burned through all of my extra lives in a matter of minutes, and my two least-favorite words appeared on the screen: GAME OVER (Ernest Cline, Ready Player One)

Inspired by the book I read this summer and by this previous post, I decided to draw these aliens:

Do not miss to check this indispensable document to choose your favorite colors:

require("ggplot2")
require("reshape")
mars1=matrix(c(0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,1,0,0,0,0,1,0,0,0,
0,0,0,0,1,0,0,1,0,0,0,0,
0,0,0,1,1,1,1,1,1,0,0,0,
0,0,1,1,0,1,1,0,1,1,0,0,
0,1,1,1,1,1,1,1,1,1,1,0,
0,1,1,1,1,1,1,1,1,1,1,0,
0,1,1,1,1,1,1,1,1,1,1,0,
0,1,1,1,1,1,1,1,1,1,1,0,
0,1,0,1,0,0,0,0,1,0,1,0,
0,1,0,0,1,0,0,1,0,0,1,0,
0,0,0,0,0,0,0,0,0,0,0,0), nrow=12, byrow = TRUE)
mars2=matrix(c(0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,1,1,0,0,0,0,0,
0,0,0,0,1,1,1,1,0,0,0,0,
0,0,0,1,1,1,1,1,1,0,0,0,
0,0,1,1,0,1,1,0,1,1,0,0,
0,1,1,1,1,1,1,1,1,1,1,0,
0,1,1,1,1,1,1,1,1,1,1,0,
0,0,0,0,1,0,0,1,0,0,0,0,
0,0,0,1,0,1,1,0,1,0,0,0,
0,0,1,0,1,0,0,1,0,1,0,0,
0,1,0,1,0,0,0,0,1,0,1,0,
0,0,0,0,0,0,0,0,0,0,0,0), nrow=12, byrow = TRUE)
mars3=matrix(c(0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,1,0,0,1,0,0,0,0,
0,0,0,1,0,0,0,0,1,0,0,0,
0,0,0,1,1,1,1,1,1,0,0,0,
0,0,1,1,0,1,1,0,1,1,0,0,
0,1,1,1,1,1,1,1,1,1,1,0,
0,1,1,1,1,1,1,1,1,1,1,0,
0,1,1,1,1,1,1,1,1,1,1,0,
0,1,0,1,0,0,0,0,1,0,1,0,
0,1,0,0,1,1,1,1,0,0,1,0,
0,0,0,0,1,0,0,1,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0), nrow=12, byrow = TRUE)
mars4=matrix(c(0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,1,1,0,0,0,0,0,
0,0,0,0,1,1,1,1,0,0,0,0,
0,0,0,1,1,1,1,1,1,0,0,0,
0,0,1,1,0,1,1,0,1,1,0,0,
0,1,1,1,1,1,1,1,1,1,1,0,
0,1,1,1,1,1,1,1,1,1,1,0,
0,1,0,0,1,0,0,1,0,0,1,0,
0,0,0,1,0,0,0,0,1,0,0,0,
0,0,0,0,1,0,0,1,0,0,0,0,
0,0,0,1,0,0,0,0,1,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0), nrow=12, byrow = TRUE)
opt=theme(legend.position="none",
panel.background = element_blank(),
panel.grid = element_blank(),
axis.ticks = element_blank(),
axis.title = element_blank(),
axis.text = element_blank())
p1=ggplot(melt(mars1), aes(x=X2, y=X1))+geom_tile(aes(fill=jitter(value, amount=.1)), colour="gray65", lwd=.025)+
scale_fill_gradientn(colours = c("chartreuse", "navy"))+scale_y_reverse()+opt
p2=ggplot(melt(mars2), aes(x=X2, y=X1))+geom_tile(aes(fill=jitter(value, amount=.1)), colour="gray65", lwd=.025)+
scale_fill_gradientn(colours = c("olivedrab1", "magenta4"))+scale_y_reverse()+opt
p3=ggplot(melt(mars3), aes(x=X2, y=X1))+geom_tile(aes(fill=jitter(value, amount=.1)), colour="gray65", lwd=.025)+
scale_fill_gradientn(colours = c("violetred4", "yellow"))+scale_y_reverse()+opt
p4=ggplot(melt(mars4), aes(x=X2, y=X1))+geom_tile(aes(fill=jitter(value, amount=.1)), colour="gray65", lwd=.025)+
scale_fill_gradientn(colours = c("tomato4", "lawngreen"))+scale_y_reverse()+opt

Princess Jasmine’s Trick

September 12, 2014CuriositiesAladdin, ggplot, ggplot2, Jasmine, R, sqldf, xkcd@aschinchon

I’m history! No, I’m mythology! Nah, I don’t care what I am; I’m free hee! (Genie, when he is released from the magical oil lamp by Aladdin)

A long time ago, in a kingdom far away, lived a beautiful princess named Jasmine. There also lived a very rich and evil wizard named Jafar, who was in love with the princess. In order to married with Jasmine, Jafar bought her father’s will with treasures, but the princess was harder to convince. One day Jafar told the princess: Request me whatever you want and if I am able to bring it to you, you will become my wife. The princess, tired of the insistence of Jafar, answered: I only want a gold chain, but I want you to give it to me as follows: the first day I should have just one link of the chain. The second day I should have two links. The third day, three … and so on. When you give me all the links of the chain I will marry you. Jafar, intrigued, asked: But how many links should have the chain? And Jasmine replied: I want you to give me the longest chain that allows you to pay me breaking only 30 links. Jafar began to laugh out loud as he walked away and said to the princess: Tomorrow I’ll bring you such chain!. But as he went to his palace, his happiness turned into anger: he realized that there was not enough gold in the world to build the chain that asked Jasmine.

This is my own version of one of my favorite anti-common-sense mathematical curiosities. To explain it, let me start with an example. Imagine a simple chain with 7 links. If you open the 3rd link, the you split the chain into 3 pieces: a single link (the one you opened), a piece of 2 links and another one of 4 links. You could pay to Jasmine during seven days combining these 3 pieces:

Day 1: Give her the single link
Day 2: Give her the 2-links piece and take the single link, leaving her with 2 links
Day 3: Give her the single link again, leaving her with 3 links
Day 4: Give her the 4-links piece and take all pieces she has, leaving her with 4 links
Day 5: Give her the single link again, leaving her with 5 links
Day 6: Give her the 2-links piece and take 2-links piece, leaving her with 6 links
Day 7: Give her the single link piece, leaving her with all links

Is easy to see that having a chain with 63 links, you could pay Jasmine breaking only 3 links (positions 5th, 14th and 31st). It easy to prove that the length of the biggest chain you can manage breaking only n links is (2ⁿ⁺¹-1)*(n+1)+n

Next plot represents the minimum number of breaks to pay Jasmine daily for a given chain’s length. I call it the Jasmine’s Staircase:

Some curiosities around chains:

Jasmine asked Jafar a chain of 66.571.993.087 links
Supposing one link weights 4 grams, the chain of Jasmine would weight around 266 tons. It is supposed to be around 171 tons of gold in the world
If you spend 1 second to climb the first step of the staircase, you will spend 302 years to climb the step number 100

Jafar was right. Jasmine was clever:

library(sqldf)
library(ggplot2)
library(extrafont)
max.breaks=5
CalculateLength = function(n) {n+sum(sapply(0:n, function(x) 2^x*(n+1)))}
results=data.frame(breaks=1:max.breaks, length=sapply(1:max.breaks, CalculateLength))
links=data.frame(links=2:CalculateLength(max.breaks))
results=sqldf("SELECT links.links, min(results.breaks) as minbreaks FROM links, results WHERE links.links <= results.length GROUP BY 1")
opts=theme(
panel.background = element_rect(fill="mistyrose"),
panel.border = element_rect(colour="black", fill=NA),
axis.line = element_line(size = 0.5, colour = "black"),
axis.ticks = element_line(colour="black"),
panel.grid = element_line(colour="white", linetype = 2),
axis.text.y = element_text(colour="black"),
axis.text.x = element_text(colour="black"),
text = element_text(size=20, family="Humor Sans"),
plot.title = element_text(size = 40)
)
ggplot(results, aes(links,minbreaks))+
geom_area(fill="violet", alpha=.4)+
geom_step(color="violetred", lwd=1.5)+
labs(x="Chain's Length", y="Minimum Number of Breaks", title="Princess Jasmine's Staircase")+
scale_x_continuous(expand = c(0, 0), breaks = sapply(1:max.breaks, CalculateLength))+
opts

The Andrica’s Conjecture

August 28, 2014ConjecturesAndrica, conjectures, ggplot, ggplot2, matlab package, primes, R@aschinchon

Things should be as simple as possible, but not simpler (Albert Einstein)

Following with conjectures about primes, it is time for Andrica’s conjecture. The great mathematician Leonhard Euler (1707-1783) pointed: “Mathematicians have tried with no success to find some kind of order in the sequence of prime numbers and today we have reasons to believe that this is a mystery that human mind will never understand”.

In 1985, the Romanian mathematician Dorin Andrica published his conjecture, still unproved, which makes reference to gap between consecutive prime numbers. In concrete, his conjecture establishes that difference between square roots between two consecutive prime numbers is always less than 1. The highest difference encountered until now is 0.67087, located between p₄=7 and p₅=11.

Following you can find the plot of these differences for first 400 prime numbers:

It is very interesting how dots form hyperbolic patterns. Does not seem similar in some sense to the Ulam spiral? Primes: how challenging you are!

Two more comments:

It is better to find primes using matlab package than doing with schoolmath one. Reason is simple: for schoolmath package, 133 is prime!
Why did Andrica formulated his conjecture as √p_n+1-√p_n < 1 instead of √p_n+1-√p_n < 3/4? In terms of statistical error, the second formulation is more accurate. Maybe the charisma of number 1 is hard to avoid.

This is the code. I learned how to insert mathematical expressions inside a ggplot chart:

library(matlab)
library(ggplot2)
ubound=2800
primes=primes(ubound)
andrica=data.frame(X=seq(1:(length(primes)-1)), Y=diff(sqrt(primes)))
opt=theme(panel.background = element_rect(fill="gray92"),
panel.grid.minor = element_blank(),
panel.grid.major.x = element_blank(),
panel.grid.major.y = element_line(color="white", size=1.5),
plot.title = element_text(size = 45),
axis.title = element_text(size = 28, color="gray35"),
axis.text = element_text(size=16),
axis.ticks = element_blank(),
axis.line = element_line(colour = "white"))
ggplot(andrica, aes(X, Y, colour=Y))+geom_point(size=5, alpha=.75)+
scale_colour_continuous(guide = FALSE)+
scale_x_continuous("n", limits=c(0, length(primes)-1), breaks = seq(0,length(primes)-1,50))+
scale_y_continuous(expression(A[n]==sqrt(p[n+1])-sqrt(p[n])), limits=c(0, .75), breaks = seq(0,.75,.05))+
labs(title = "The Andrica's Conjecture")+
opt

Fronkonstin

Experiments in R