Tag: football

library(rvest) library(stringr) library(BradleyTerry2) library(dplyr) library(reshape) library(rCharts) nseasons=20 results=data.frame() for (i in 1:nseasons) { webpage=paste0("http://www.marca.com/estadisticas/futbol/primera/2015_16/jornada_",i,"/") html(webpage) %>% html_nodes("table") %>% .[[1]] %>% html_table(header=FALSE, fill=TRUE) %>% mutate(X4=i) %>% rbind(results)->results } colnames(results)=c("home", "score", "visiting", "season") results %>% mutate(home = iconv(home, from="UTF8",to="ASCII//TRANSLIT"), visiting = iconv(visiting, from="UTF8",to="ASCII//TRANSLIT")) %>% #filter(grepl("-", score)) %>% mutate(score=replace(score, score=="18:30 - 17/02/2016", "0-2")) %>% # resultado fake para el Barcelona mutate(score_home = as.numeric(str_split_fixed(score, "-", 2)[,1])) %>% mutate(score_visiting = as.numeric(str_split_fixed(score, "-", 2)[,2])) %>% mutate(points_home =ifelse(score_home > score_visiting, 3, ifelse(score_home < score_visiting, 0, 1))) %>% mutate(points_visiting =ifelse(score_home > score_visiting, 0, ifelse(score_home < score_visiting, 3, 1))) -> data prob_BT=function(x, y) {exp(x-y) / (1 + exp(x-y))} BTabilities=data.frame() for (i in 13:nseasons) { data %>% filter(season<=i) %>% BTm(cbind(points_home, points_visiting), home, visiting, data=.) -> footballBTModel BTabilities(footballBTModel) %>% as.data.frame() -> tmp cbind(tmp, as.character(rownames(tmp)), i) %>% mutate(ability=round(ability, digits = 2)) %>% rbind(BTabilities) -> BTabilities } colnames(BTabilities)=c("ability", "s.e.", "team", "season") sort(unique(BTabilities[,"team"])) -> teams BTprobabilities=data.frame() for (i in 13:nseasons) { BTabilities[BTabilities$season==i,1] %>% outer( ., ., prob_BT) -> tmp colnames(tmp)=teams rownames(tmp)=teams cbind(melt(tmp),i) %>% rbind(BTprobabilities) -> BTprobabilities } colnames(BTprobabilities)=c("team1", "team2", "probability", "season") BTprobabilities %>% filter(team1=="Villarreal") %>% mutate(probability=round(probability, digits = 2)) %>% filter(team2 %in% c("R. Madrid", "Barcelona", "Atletico")) -> BTVillareal BTprobabilities %>% filter(team2=="Barcelona") %>% mutate(probability=round(probability, digits = 2)) %>% filter(team1 %in% c("R. Madrid", "Villarreal", "Atletico")) -> BTBarcelona AbilityPlot <- nPlot( ability ~ season, data = BTabilities, group = "team", type = "lineChart") AbilityPlot$yAxis(axisLabel = "Estimated Ability", width = 62) AbilityPlot$xAxis(axisLabel = "Season") VillarealPlot <- nPlot( probability ~ season, data = BTVillareal, group = "team2", type = "lineChart") VillarealPlot$yAxis(axisLabel = "Probability of beating", width = 62) VillarealPlot$xAxis(axisLabel = "Season") BarcelonaPlot <- nPlot( probability ~ season, data = BTBarcelona, group = "team1", type = "lineChart") BarcelonaPlot$yAxis(axisLabel = "Probability of being beaten", width = 62) BarcelonaPlot$xAxis(axisLabel = "Season")

Prediction is difficult, especially of the future (Mark Twain)

Let me start with two important premises. First of all, I am not into football so I do not support any team. Second, this post is just an opinion based on mathematics but football, as all of you know, is not an exact science. Football is football.

This is a good moment to analyse Spanish Liga of football. F. C. Barcelona and Atletico de Madrid share first place of the championship followed closely by Real Madrid. But analysing results over the time can give us an interesting insight about capabilities of top three teams.

I have run a Bradley-Terry model for pairwise comparisons. The Bradley-Terry model deals with a situation in which n individuals or items are compared to one another in paired contests. In my case the model uses confrontations and its results as input. The Bradley-Terry model (Bradley and Terry 1952) assumes that in a contest between any two players, say player i and player j, the odds that i beats j are x_i/x_j, where x_i and x_j are positive-valued parameters which might be thought of as representing ability.

Time plays a key role in my analysis. This is what happens when you estimate abilities of top three teams over the time:

After 20 rounds, Atletico de Madrid and Barcelona have the same estimated ability but while Barcelona is continuosly losing ability since the beginning, Atletico de Madrid presents a robust or even growing evolution. Of course, it depends on how both teams begun the championship. The higher you start, the more you can lose; but watching this graph I can not help feeling that Atletico de Madrid keep their morale higher than Barcelona.

Another interesting output of the Bradley-Terry model are estimated probabilites of beating teams each others. Since these probabilities depends on previous abilities, Barcelona and Atletico de Madrid have same chances of winning a hypothetical match. But once again, evolution of these probabilities can change our perception:

As you can see, Atletico de Madrid has increased the probability of beating Barcelona from 0.25 to 0.50 in just one round and Barcelona has lost more than this probability in the same time. Once again, it seems that Atletico de Madrid is increasingly confidence time by time. And confidence is important in this game. Luckily, football is unpredictable but after taking time into account I dare to say that Atletico de Madrid will win the championship. I am pretty sure.

Here you have the code I wrote for the analysis. Maybe you would like to make your own predictions:

library("BradleyTerry2")
library("xlsx")
library("ggplot2")
library("reshape")
football <-read.xlsx("CalendarioLiga2013-14 2.xls", sheetName= "results", header=TRUE)
inv_logit <- function(p) {exp(p) / (1 + exp(p))}
prob_BT   <- function(ability_1, ability_2) {inv_logit(ability_1 - ability_2)}
rounds <- sort(unique(football$round))
# Initialization
football.pts.ev <- as.data.frame(c())
football.abl.ev <- as.data.frame(c())
football.prb.ev <- as.data.frame(c())
# Points evolution: football.pts.ev
for (i in 1:length(rounds))
{
  football.home <-aggregate( home.wins ~ home.team, data=football[football$round<=rounds[i],], FUN=sum)
  colnames(football.home) <- c('Team', 'Points')
  football.away <-aggregate( away.wins ~ away.team, data=football[football$round<=rounds[i],], FUN=sum)
  colnames(football.away) <- c('Team', 'Points')
  football.all <-rbind(football.home,football.away)
  football.points <-aggregate( Points ~ Team, data=football.all, FUN=sum)
  football.points$round<-rounds[i]
  football.pts.ev <- rbind(football.points, football.pts.ev)
}
# BT Models 
# Abilities and probabilities evolution: football.abl.ev and football.prb.ev
# We start from 6th. round to have good information
for (i in 6:length(rounds))
{
  footballBTModel      <- BTm(cbind(home.wins, away.wins), home.team, away.team, data = football[football$round<=rounds[i],], id = "team")
  team_abilities       <- data.frame(BTabilities(footballBTModel))$ability 
  names(team_abilities) <-unlist(attr(BTabilities(footballBTModel), "dimnames")[1][1])
  team_probs           <- outer(team_abilities, team_abilities, prob_BT) 
  diag(team_probs)     <- 0 
  team_probs           <- melt(team_probs)
  colnames(team_probs) <- c('team', 'adversary', 'probability')
  team_probs$round<-rounds[i]
  football.prb.ev <- rbind(team_probs, football.prb.ev)
  football.abl.ev.df <- data.frame(rownames(data.frame(BTabilities(footballBTModel))),BTabilities(footballBTModel))
  football.abl.ev.df$round<-rounds[i]
  colnames(football.abl.ev.df) <- c('team', 'ability', 's.e.', 'round')
  football.abl.ev <- rbind(football.abl.ev.df, football.abl.ev)
}
# Probabilities of top 3 teams
football.prb.ev.3 <- football.prb.ev[
    ((football.prb.ev$team == "At. Madrid" & football.prb.ev$adversary == "R. Madrid")|
     (football.prb.ev$team == "At. Madrid" & football.prb.ev$adversary == "Barcelona")|
     (football.prb.ev$team == "Barcelona"  & football.prb.ev$adversary == "R. Madrid"))&
      football.prb.ev$round>=10, ]
football.prb.ev.3$teambyadver <- interaction(football.prb.ev.3$team, football.prb.ev.3$adversary, sep = " Beating ")
# Abilities of top 3 teams
football.abl.ev.3 <- football.abl.ev[(football.abl.ev$team == "At. Madrid" | 
                                     football.abl.ev$team == "R. Madrid"  | 
                                     football.abl.ev$team == "Barcelona")&
                                     football.abl.ev$round>=10, ]
ggplot(data = football.prb.ev.3, aes(x = round, y = probability, colour = teambyadver)) +  
  stat_smooth(method = "loess", formula = y ~ x, size = 1, alpha = 0.25)+
  geom_point(size = 4) +
  theme(legend.position = c(.75, .15))+
  labs(list(x = "Round", y = "Probability"))+
  labs(colour = "Probability of ...")+
  ggtitle("Evolution Of Beating Probabilities \nAmong Top 3 First-Team") + 
  theme(plot.title = element_text(size=25, face="bold"))+
  scale_x_continuous(breaks = c(10,11,12,13,14,15,16,17,18,19,20))
ggplot(data = football.abl.ev.3, aes(x = round, y = ability, colour = team)) +  
  stat_smooth(method = "loess", formula = y ~ x, size = 1, alpha = 0.25)+
  geom_point(size = 4) +
  theme(legend.position = c(.75, .75))+
  labs(list(x = "Round", y = "Ability"))+
  labs(colour = "Ability of ...")+
  ggtitle("Evolution Of Abilities \nOf Top 3 First-Team") + 
  theme(plot.title = element_text(size=25, face="bold"))+
  scale_x_continuous(breaks = c(10,11,12,13,14,15,16,17,18,19,20))

Fronkonstin

Experiments in R

Tag Archives: football

A Checkpoint Of Spanish Football League

Why I Think Atletico De Madrid Will Win 2013/14 Spanish Liga Of Football