Tag Archives: highcharter

Visualizing the Spanish Contribution to The Metropolitan Museum of Art

Well I walk upon the river like it’s easier than land
(Love is All, The Tallest Man on Earth)


The Metropolitan Museum of Art provides here a dataset with information on more than 450.000 artworks in its collection. You can do anything you want with these data: there are no restrictions of use. Each record contains information about the author, title, type of work, dimensions, date, culture and  geography of a particular piece.

I can imagine a bunch of things to do with these data but since I am a big fan of highcharter,  I have done a treemap, which is an artistic (as well as efficient) way to visualize hierarchical data. A treemap is useful to visualize frequencies. They can handle levels, allowing to navigate to go into detail about any category. Here you can find a good example of treemap.

To read data I use fread function from data.table package. I also use this package to do some data wrangling operations on the data set. After them, I filter it looking for the word SPANISH in the columns Artist Nationality and Culture and looking for the word SPAIN in the column Country. For me, any piece created by an Spanish artist (like this one), coming from Spanish culture (like this one) or from Spain (like this one) is Spanish (this is my very own definition and may do not match with any academical one). Once it is done, it is easy to extract some interesting figures:

  • There are 5.294 Spanish pieces in The Met, which means a 1,16% of the collection
  • This percentage varies significantly between departments: it raises to 9,01% in The Cloisters and to 4,83% in The Robert Lehman Collection; on the other hand, it falls to 0.52% in The Libraries and to 0,24% in Photographs.
  • The Met is home to 1.895 highlights and 44 of them (2,32%) are Spanish; It means that Spanish art is twice as important as could be expected (remember that represents a 1,16% of the entire collection)

My treemap represents the distribution of Spanish artworks by department (column Department) and type of work (column Classification). There are two important things to know before doing a treemap with highcharter:

  • You have to use treemap function from treemap package to create a list with your data frame that will serve as input for hctreemap function
  • hctreemap fails if some category name is the same as any of its subcategories. To avoid this, make sure that all names are distinct.

This is the treemap:

Here you can see a full size version of it.

There can be seen several things at a glance: most of the pieces are drawings and prints and european sculpture and decorative arts (in concrete, prints and textiles), there is also big number of costumes, arms and armor is a very fragmented department … I think treemap is a good way to see what kind of works owns The Met.

Mi favorite spanish piece in The Met is the stunning Portrait of Juan de Pareja by Velázquez, which illustrates this post: how nice would be to see it next to El Primo in El Museo del Prado!

Feel free to use my code to do your own experiments:

library(data.table)
library(dplyr)
library(stringr)
library(highcharter)
library(treemap)

file="MetObjects.csv"
# Download data
if (!file.exists(file)) download.file(paste0("https://media.githubusercontent.com/media/metmuseum/openaccess/master/", file), 
                                      destfile=file,
                                      mode='wb')
# Read data
data=fread(file, sep=",", encoding="UTF-8")

# Modify column names to remove blanks
colnames(data)=gsub(" ", ".", colnames(data))

# Clean columns to prepare for searching
data[,`:=`(Artist.Nationality_aux=toupper(Artist.Nationality) %>% str_replace_all("\\[\\d+\\]", "") %>% 
             iconv(from='UTF-8', to='ASCII//TRANSLIT'),
           Culture_aux=toupper(Culture) %>% str_replace_all("\\[\\d+\\]", "") %>% 
             iconv(from='UTF-8', to='ASCII//TRANSLIT'),
           Country_aux=toupper(Country) %>% str_replace_all("\\[\\d+\\]", "") %>% 
             iconv(from='UTF-8', to='ASCII//TRANSLIT'))]

# Look for Spanish artworks
data[Artist.Nationality_aux %like% "SPANISH" | 
       Culture_aux %like% "SPANISH" | 
       Country_aux %like% "SPAIN"] -> data_spain

# Count artworks by Department and Classification
data_spain %>% 
  mutate(Classification=ifelse(Classification=='', "miscellaneous", Classification)) %>% 
  mutate(Department=tolower(Department),
         Classification1=str_match(Classification, "(\\w+)(-|,|\\|)")[,2],
         Classification=ifelse(!is.na(Classification1), 
                               tolower(Classification1), 
                               tolower(Classification))) %>% 
  group_by(Department, Classification) %>% 
  summarize(Objects=n()) %>% 
  ungroup %>% 
  mutate(Classification=ifelse(Department==Classification, paste0(Classification, "#"), 
                               Classification)) %>% 
  as.data.frame() -> dfspain

# Do treemap without drawing
tm_dfspain <- treemap(dfspain, index = c("Department", "Classification"),
                      draw=F,
                      vSize = "Objects", 
                      vColor = "Objects",
                      type = "index")

# Do highcharter treemap 
hctreemap(
  tm_dfspain,
  allowDrillToNode = TRUE,
  allowPointSelect = T,
  levelIsConstant = F,
  levels = list(
    list(
      level = 1,
      dataLabels = list (enabled = T, color = '#f7f5ed', style = list("fontSize" = "1em")),
      borderWidth = 1
    ),
    list(
      level = 2,
      dataLabels = list (enabled = F,  align = 'right', verticalAlign = 'top', 
                         style = list("textShadow" = F, "fontWeight" = 'light', "fontSize" = "1em")),
      borderWidth = 0.7
    ) 
  )) %>% 
  hc_title(text = "Spanish Artworks in The Met") %>% 
  hc_subtitle(text = "Distribution by Department") -> plot

plot

Visualizing the Gender of US Senators With R and Highmaps

I wake up every morning in a house that was built by slaves (Michelle Obama)

Some days ago I was invited by the people of Highcharts to write a post in their blog. What I have done is a simple but revealing map of women senators of the United States of America. Briefly, this is what I’ve done to generate it:

  • read from the US senate website a XML file with senators info
  • clean and obtain gender of senators from their first names
  • summarize results by state
  • join data with a US geojson dataset to create the highmap

You can find details and R code here.

It is easy creating a highcharts using highcharter, an amazing library as genderizeR, the one I use to obtain gender names. I like them a lot.

Visualizing Stirling’s Approximation With Highcharts

I said, “Wait a minute, Chester, you know I’m a peaceful man”, He said, “That’s okay, boy, won’t you feed him when you can” (The Weight, The Band)

It is quite easy to calculate the probability of obtaining the same number of heads and tails when tossing a coin N times, and N is even. There are 2^{N} possible outcomes and only C_{N/2}^{N} are favorable so the exact probability is the quotient of these numbers (# of favorable divided by # of possible).

There is another way to approximate this number incredibly well: to use the Stirling’s formula, which is 1/\sqrt{\pi\cdot N/2}

The next plot represents both calculations for N from 2 to 200. Although for small values of N, Stirling’s approximation tends to overestimate probability, you can see hoy is extremely precise as N becomes bigger:

James Stirling published this amazing formula in 1730. It simplifies the calculus to the extreme and also gives a quick way to obtain the answer to a very interesting question: how many tosses are needed to be sure that the probability of obtaining the same number of heads and tails is under any given threshold? Just solve the formula for N and you will obtain the answer. And, also, the formula is another example of the presence of pi in the most unexpected places, as happens here.

Just another thing: the more I use highcharter package the more I like it.

This is the code:

library(highcharter)
library(dplyr)
data.frame(N=seq(from=2, by=2, length.out = 100)) %>%
  mutate(Exact=choose(N,N/2)/2**N, Stirling=1/sqrt(pi*N/2))->data
hc <- highchart() %>% 
  hc_title(text = "Stirling's Approximation") %>% 
  hc_subtitle(text = "How likely is getting 50% heads and 50% tails tossing a coin N times?") %>% 
  hc_xAxis(title = list(text = "N: Number of tosses"), categories = data$N) %>% 
  hc_yAxis(title = list(text = "Probability"), labels = list(format = "{value}%", useHTML = TRUE)) %>% 
  hc_add_series(name = "Stirling", data = data$Stirling*100,  marker = list(enabled = FALSE), color="blue") %>% 
  hc_add_series(name = "Exact", data = data$Exact*100,  marker = list(enabled = FALSE), color="lightblue") %>% 
  hc_tooltip(formatter = JS("function(){return ('<b>Number of tosses: </b>'+this.x+'
<b>Probability: </b>'+Highcharts.numberFormat(this.y, 2)+'%')}")) %>%
  hc_exporting(enabled = TRUE) %>%
  hc_chart(zoomType = "xy")
hc