Far away, this ship has taken me far away (Starlight, Muse)
Madrid City has an Open Data platform where can be found around 300 data sets about a number of topics. One of these sets is the one I used for this experiment. It contains information about cycling accidents happened in the city from January to July 2017. I have done a map to locate where the accidents took place. This experiment shows how R makes very easy to create professional maps with Leaflet (in this case I use Carto basemaps).
To locate accidents the data set only contains the address where they happened so the first thing I did is to obtain their geographical coordinates using geocode
function from ggmap
package. There were 431 accidents during the first 7 months of 2017 (such a big number!) and I got coordinates of 407 so I can locate 94% of the accidents.
Obviously, the amount of accidents in some place depend on how many bikers circulate there as well as on its infrastructure. None of these things can be seen in the map: It only shows number of accidents.
The categorization of accidents is:
- Double collision (Colisión doble): Traffic accident occurred between two moving vehicles.
- Multiple collision (Colisión múltiple): Traffic accident occurred between more than two moving vehicles.
- Fixed object collision (Choque con objeto fijo): Accident occurred between a moving vehicle with a driver and an immovable object that occupies the road or separated area of the same, whether parked vehicle, tree, street lamp, etc.
- Accident (Atropello): Accident occurred between a vehicle and a pedestrian that occupies the road or travels by sidewalks, refuges, walks or zones of the public road not destined to the circulation of vehicles.
- Overturn (Vuelco): Accident suffered by a vehicle with more than two wheels which by some circumstance loses contact with the road and ends supported on one side or on its roof.
- Motorcycle fall (Caída motocicleta): Accident suffered by a motorcycle, which at some moment loses balance, because of the driver or due to the conditions of the road.
- Moped fall (Caída ciclomotor): Accident suffered by a moped, which at some moment loses balance, because of the driver or due to the conditions of the road.
- Bicycle fall (Caída bicicleta): Accident suffered by a bicycle, which at some moment loses balance, because of the driver or due to the conditions of the road.
These categories are redundant (e.g. Double and Multiple collision), difficult to understand (e.g. Overturn) or both things at the same time (e.g. Motorcycle fall and Moped fall). This categorization also forgets human damages incurred by the accident.
Taking all these things in mind, this is the map:
Here is a full-screen version of the map.
My suggestions to the city council of Madrid are:
- Add geographical coordinates to data (I guess many of the analysis will need them)
- Rethink the categorization to make it clearer and more informative
- Add more cycling data sets to the platform (detail of bikeways, traffic …) to understand accidents better
- Attending just to the number of accidents , put the focus around Parque del Retiro, specially on its west surroundings, from Plaza de Cibeles to Plaza de Carlos V: more warning signals, more (or better) bikeways …
I add the code below to update the map (If someone ask it to me, I can do it myself regularly):
library(dplyr) library(stringr) library(ggmap) library(leaflet) # First, getting the data download.file(paste0("http://datos.madrid.es/egob/catalogo/", file), destfile="300110-0-accidentes-bicicleta.csv") data=read.csv("300110-0-accidentes-bicicleta.csv", sep=";", skip=1) # Prepare data for geolocation data %>% mutate(direccion=paste(str_trim(Lugar), str_trim(Numero), "MADRID, SPAIN", sep=", ") %>% str_replace("NA, ", "") %>% str_replace(" - ", " CON ")) -> data # Geolocation (takes some time ...) coords=c() for (i in 1:nrow(data)) { coords %>% rbind(geocode(data[i,"direccion"])) -> coords Sys.sleep(0.5) } # Save data, just in case data %>% cbind(coords) %>% saveRDS(file="bicicletas.RDS") data=readRDS(file="bicicletas.RDS") # Remove non-successfull geolocations data %>% filter(!is.na(lon)) %>% droplevels()-> data # Remove non-successfull geolocations data %>% mutate(Fecha=paste0(as.Date(data$Fecha, "%d/%m/%Y"), " ", TRAMO.HORARIO), popup=paste0("<b>Dónde:</b>", direccion, "<b>Cuándo:</b>", Fecha, "<b>Qué pasó:</b>", Tipo.Accidente)) -> data # Do the map data %>% split(data$Tipo.Accidente) -> data.df l <- leaflet() %>% addProviderTiles(providers$CartoDB.Positron) names(data.df) %>% purrr::walk( function(df) { l <<- l %>% addCircleMarkers(data=data.df[[df]], lng=~lon, lat=~lat, popup=~popup, color="red", stroke=FALSE, fillOpacity = 0.8, group = df, clusterOptions = markerClusterOptions(removeOutsideVisibleBounds = F)) }) l %>% addLayersControl( overlayGroups = names(data.df), options = layersControlOptions(collapsed = FALSE) )
Curious to know at what hours of the day & days of the week most accidents happen. Can you add a filter on weekdays/holiday and on rush-hours vs quit hours?
Thank you for your clear article and good advice!! People must be careful when they ride their bikes.