Five Minute Football Trivia - Trans-Europe Express

generally as I have less and less time to waste on meaningless football stats I get halfway through a post and abandon it. To remedy this, I want to start pushing out posts that give a reasonable half-guess at an answer within an hour or so without needing to really check my working or write good prose. This is the third of these

For this weeks question, I’m stealing straight from the source of most of my posts, The Knowledge column at The Guardian: I’m going to turn it on it’s head a bit, and find the longest distance campaigns, mostly because I find it more interesting, but also because it reminded me of this tweet from a few years ago

And generally I love weird quirks of geography that lead to commutes of 13 hours like this.

As always, first load the libraries we need. Having looked around, the most organised dataset seemed to be at worldfootball.net.

#scrape
library(rvest)
#using data from worldfootbal.net
base_url <- "https://www.worldfootball.net"

#tidy
library(tidyverse)
library(magrittr)
#map
library(sf)
library(rnaturalearth)
library(ggthemes)

To find the location of every team, we need a data.frame of every team to have competed in the Champions League (and Qualifying) since it’s inception. We can get that by sprintf’ing a list of urls and scraping the links to each team page from there. For this, and most of the scraping jobs below, I saved the data from the first time I scrape so that I don’t have to continually re-stress the worldfootball server. The datasets can be found in the static folder of my website GitHub.

#the years each competition took place
years <- 1955:2018
qual_years <- c(1966, 1969:1971, 1978:1982, 1992:2018)

#sprintf the correct urls together
all_urls <- c(sprintf("/champions-league-%d-%d", years, years+1),
      sprintf("/champions-league-qualifikation-%d-%d", qual_years, qual_years+1))

#some exceptions
all_urls[grepl("champions-league-2010-2011", all_urls)] %<>% paste0(., "_3")
all_urls[grepl("champions-league-2008-2009", all_urls)] %<>% paste0(., "_2")
all_urls[grepl("qualifikation-2008-2009", all_urls)] %<>% gsub("qualifikation", "qf", .)
all_urls[grepl("qualifikation", all_urls) & as.numeric(gsub(".*-","",all_urls)>2009)]  %<>%
  gsub("qualifikation", "qual", .)

#scrape the list of each team's links
teams <- map_df(all_urls, function(competition_link) {
  #read once and scrape from there
  read <- read_html(paste0(base_url, "/players", competition_link))
  #get the useful info
  competition <- read %>% html_nodes("h1") %>% html_text() %>% gsub(" » .*", "", .)
  team_name <- read %>% html_nodes("td:nth-child(2) a") %>% html_text()
  team_info <- read %>% html_nodes("td:nth-child(4) a") %>% html_attr("href")
  #compile into a df to return
  df <- data.frame(competition, team_name, team_info)
  return(df)
})

we can then take a look at what we have on our hands

head(teams)
##                  competition              team_name
## 1 Champions League 1955/1956      1. FC Saarbrücken
## 2 Champions League 1955/1956              Aarhus GF
## 3 Champions League 1955/1956               AC Milan
## 4 Champions League 1955/1956 Budapesti Vörös Lobogó
## 5 Champions League 1955/1956         Djurgårdens IF
## 6 Champions League 1955/1956       Gwardia Warszawa
##                     team_info
## 1 /teams/1-fc-saarbruecken/1/
## 2         /teams/aarhus-gf/1/
## 3          /teams/ac-milan/1/
## 4      /teams/mtk-budapest/1/
## 5    /teams/djurgardens-if/1/
## 6  /teams/gwardia-warszawa/1/

As a short aside, one of the things I really enjoy about posts like this one is it exposes you to lots of history from the ‘early’ days of organised football and the teams (some of which remain, some do not) that were present then.

After this, we then want to scrape the data on every match played in the Champions League in a similar manner:

match_data <- map_df(all_urls, function(competition_link) {
  #read once
  read <- read_html(paste0(base_url, "/all_matches", competition_link))
  #get the competition/season id from the url
  season <- gsub("(^.*-)([0-9]{4}-[0-9]{4})(.*)", "\\2", competition_link)
  competition <- ifelse(grepl("-qual|-qf", competition_link), "ucl-quals", "ucl")
  
  #scrape the links to each match- we'll need some of these later
  match_link <- read %>%
    html_nodes("td:nth-child(6) a") %>%
    html_attr("href")
  
  #save the champions league matches into a df
  matches_df <- read %>% 
    html_nodes("#site > div.white > div.content > div > div.box > div > table") %>% 
    html_table(fill = TRUE, header = FALSE) %>%
    as.data.frame() %>%
    #rename
    select(date = X1, round = X4, home = X3, away = X5, result = X6) %>%
    #mutate the correct round to matches
    mutate(round = case_when(
      round != "-" ~ round
    )) %>%
    mutate(date = case_when(
      date != "" ~ date
    )) %>%
    mutate(round = zoo::na.locf(round)) %>%
    #filter out valid matches
    filter(grepl("^[0-9]*:[0-9]*|abor.", result)) %>%
    mutate(date = zoo::na.locf(date)) %>%
    #few exceptions of matches that wern't played
    filter(!(grepl("dec.", result) & date == "01/12/1965")) %>%
    filter(!(home == "FK Partizani" & date == "30/09/1987")) %>%
    mutate(match_link, season, competition)
  return(matches_df)
})

Which we can glimpse to see that there are 7206 matches listed across the competition proper and qualification rounds since the 1950s. This resolves down to 2875 unique team-seasons (from ~561 unique teams) who have been involved in either competition.

head(match_data)
##         date    round                   home             away    result
## 1 04/09/1955 1. Round            Sporting CP         Partizan 3:3 (1:1)
## 2 07/09/1955 1. Round Budapesti Vörös Lobogó   RSC Anderlecht 6:3 (3:2)
## 3 08/09/1955 1. Round        Servette Genève      Real Madrid 0:2 (0:0)
## 4 14/09/1955 1. Round        Rot-Weiss Essen     Hibernian FC 0:4 (0:2)
## 5 20/09/1955 1. Round         Djurgårdens IF Gwardia Warszawa 0:0 (0:0)
## 6 21/09/1955 1. Round              Aarhus GF      Stade Reims 0:2 (0:1)
##                                                                    match_link
## 1            /report/champions-league-1955-1956-1-runde-sporting-cp-partizan/
## 2     /report/champions-league-1955-1956-1-runde-mtk-budapest-rsc-anderlecht/
## 3     /report/champions-league-1955-1956-1-runde-servette-geneve-real-madrid/
## 4    /report/champions-league-1955-1956-1-runde-rot-weiss-essen-hibernian-fc/
## 5 /report/champions-league-1955-1956-1-runde-djurgardens-if-gwardia-warszawa/
## 6           /report/champions-league-1955-1956-1-runde-aarhus-gf-stade-reims/
##      season competition
## 1 1955-1956         ucl
## 2 1955-1956         ucl
## 3 1955-1956         ucl
## 4 1955-1956         ucl
## 5 1955-1956         ucl
## 6 1955-1956         ucl

To work out the distances travelled, we then need to find the locations of each of these matches. The easiest way would be to run through each of those match links and scrape the location data, but that would put a lot of load on the worldfootball servers, so we can be smarter than that.

Matches generally take place at the home location (or in some exceptions, very close to) of every team (e.g. Arsenal’s home matches take place in North London). However, in the history of the competition, 2-legged matches that ended as a draw used to go to a third leg at a neutral location (for example Leeds United vs. VFB Stuttgart in 1992 took place at the Nou Camp). Also, each final is played at a pre-selected venue that is independent of the eventual finalists.

We can find the data for these matches and scrape the exact location from the match link, while taking the rest from the location of the home team in the tie.

#split data by neutral venue or not
match_locations <- match_data %>%
  split(f = (.$round == "Final" | 
               duplicated(paste(.$home, .$away, .$round, .$season))))

#function for scraping the location of the neutral matches
#uses a link to a specific match
get_neutral_location <- function(link) {
  full_url <- paste0(base_url, link)
  
  #get and munge the location
  node <- ".standard_tabelle tr:nth-child(1) .dunkel~ .dunkel+ .dunkel"
  read <- read_html(full_url)
  location <- read %>% html_nodes(node) %>% html_text() %>%
    gsub("\\(|\\)|\\/", "", .)
  return(location)
}

#run through this function to locate all neutral matches
neutral_matches <- match_locations[[2]] %>%
  mutate(location = unlist(lapply(match_link, get_neutral_location))) %>%
  mutate(type = "neutral") %>%
  select(-match_link)

We can see we’ve gathered a few extra matches that wern’t actually neutral, but given we get their correct location anyway, it’s not big deal.

We then have to use the information on each team to get the location of thier home ground. For larger teams we can get this to within an exct postcode if we so wish, but many (e.g. former Polish champions Szombierki Bytom) all we can get from their page is the country. This is fine because we’ll combine this with the team name to use a google search to get more exact locations later. (in any case it’s probably fine because the proportion of teams with poor geographic data probably gets lost in noise overall).

#scrape the information on the teams location from their
#worldfootball profile page
get_team_location <- function(link) {
  read <- read_html(paste0(base_url, link))
  
  stadium_link <- read %>%
    html_nodes(".yellow tr:nth-child(5) a") %>%
    html_attr("href")
  
  #if the link contains a link to a stadium scrape from there
  if(length(stadium_link) > 0) {
    stadium_link <- paste0(base_url, stadium_link)
    location <- read_html(stadium_link) %>%
      html_nodes(".yellow tr:nth-child(1) td , .yellow tr:nth-child(2) td") %>%
      html_text() %>%
      .[c(2,4)] %>%
      gsub("\\r|\\t|\\n", "", .) %>%
      paste0(collapse  = " ")
    return(location)
  #otherwise get a best approximation
  } else {
    country <- read %>%
      html_nodes(".portfolio tr:nth-child(3) .hell+ .hell") %>%
      html_text() %>%
      gsub("\\r|\\t|\\n", "", .)
    return(country)
  }
}

#run the function over each team
team_info <- teams %>%
  filter(!duplicated(team_name)) %>%
  mutate(location = unlist(lapply(team_info, get_team_location)))
head(team_info)
##                  competition              team_name
## 1 Champions League 1955/1956      1. FC Saarbrücken
## 2 Champions League 1955/1956              Aarhus GF
## 3 Champions League 1955/1956               AC Milan
## 4 Champions League 1955/1956 Budapesti Vörös Lobogó
## 5 Champions League 1955/1956         Djurgårdens IF
## 6 Champions League 1955/1956       Gwardia Warszawa
##                     team_info            location
## 1 /teams/1-fc-saarbruecken/1/ Saarbrücken Germany
## 2         /teams/aarhus-gf/1/      Aarhus Denmark
## 3          /teams/ac-milan/1/        Milano Italy
## 4      /teams/mtk-budapest/1/    Budapest Hungary
## 5    /teams/djurgardens-if/1/    Stockholm Sweden
## 6  /teams/gwardia-warszawa/1/         Poland 0000

Now we have a rough location for each team we can join everything back together to get a complete list of matches and where (to a best approximation sometimes) they took place.

#join the team location into the non-neutral matches
nonneutral_matches <- match_locations[[1]] %>%
  left_join(., select(team_info, -competition), by = c("home" = "team_name")) %>%
  mutate(type = "normal") %>%
  select(names(neutral_matches)) 

#join neutral and non neutral matches back together
all_matches <- rbind(neutral_matches, nonneutral_matches) %>%
  mutate(match_location = case_when(
    type == "normal" ~ paste(home, "football club", location),
    type == "neutral" ~ location
  ))

Now we have the locations for each match, but not in a quantative form. For that, we’ll use the googleway package that provides access to a variety of Google APIs to access the map geolocation feature of Google Mapes. Obviously, I haven’t included my unique key for this below, but you can get one for free using this link.

For each location we’ll return a latitude and longitude that will allow us to calculate exactly the distances between a teams home location and each match they played.

#fake key
google_key <- "myGooGLeKEy1234567"

#function to get lat/lon data from Google Maps
googleway_geocode <- function(location, key){
  data <- google_geocode(location, key = key)
  latlon <- data$results$geometry$location[1,]
  
  if(length(latlon) == 0) {
    return(data.frame(lat = NA, lng = NA, location))
  } else {
    return(latlon %>% mutate(location))
  }
}

#run the function over each unique location
locations <- unique(all_matches$match_location) %>%
  map_df(., googleway_geocode, key = google_key)

This gets us 99% of the way there, though the API does miss a few smaller/less well formatted clubs (e.g. Monaco is not ‘in’ France per se, but an enclave in the French territory, which fucks Google Maps up)

locations %>%
  filter(is.na(lat))
##    lat lng                                           location
## 1   NA  NA              AS Monaco football club Monaco France
## 2   NA  NA St Patrick's Athletic football club Dublin Ireland
## 3   NA  NA  FK Sloga Jugomagnat football club North Macedonia
## 4   NA  NA        Tsement Ararat football club Ararat Armenia
## 5   NA  NA        NK Brotnjo football club Bosnia-Herzegovina
## 6   NA  NA             Dunaferr SE football club Hungary 0000
## 7   NA  NA          Araks Ararat football club Ararat Armenia
## 8   NA  NA                FK Gomel football club Belarus 1959
## 9   NA  NA                Sioni Bolnisi football club Georgia
## 10  NA  NA                 SS Murata football club San Marino
## 11  NA  NA      KF Shkëndija 79 football club North Macedonia
## 12  NA  NA              SP Tre Penne football club San Marino
## 13  NA  NA                   Ulisses FC football club Armenia
## 14  NA  NA             SP La Fiorita football club San Marino
## 15  NA  NA      Lincoln Red Imps football club Gibraltar 0000
## 16  NA  NA                    Dila Gori football club Georgia
## 17  NA  NA                  KF Trepça'89 football club Kosovo
## 18  NA  NA                  Europa FC football club Gibraltar
## 19  NA  NA                   FK Spartaks football club Latvia
## 20  NA  NA                    FK Kukësi football club Albania

To solve this, the best way sometimes is just the stupidest, so here are the manually found locations of these clubs

#manually enter lat lon for the missing locations
missing_locs <- data.frame(
  lat = c(43.73, 53.34, 42.02, 39.86, 43.2, 46.96, 39.86, 52.44, 41.44, 43.93, 42.01, 43.93, 40.17, 43.93, 36.14, 41.98, 42.88, 36.14, 56.94, 42.07),
  lng = c(7.41, -6.27, 21.44, 44.69, 17.7, 18.94, 44.69, 31.01, 44.53, 12.44, 20.97, 12.44, 44.52, 12.44, -5.35, 44.10, 20.86, -5.35, 23.61, 20.42),
  location = locations$location[is.na(locations$lat)]
)

#bind everything together
all_locations <- locations %>%
  filter(!is.na(lat)) %>%
  rbind(., missing_locs) %>%
  #convert to an sf object with worldwide projection
  st_as_sf(coords = c("lng", "lat"), crs = st_crs("+init=epsg:4326"))

At the end, I also cast the object to an simple features (sf) data.frame to allow for easier manipulation of geographic data and add the reference for Earth’s lat/lon coordinate system (epsg:4326).

We can then merge the geographic data into our dataframe of every match and see the location of every club to have played in (some stage) of the Champions League over the last ~60 years

#join in the geographic information
all_matches %<>% left_join(., all_locations, by = c("match_location" = "location"))

#plot the home locations of all teams
p1 <- all_matches %>%
  filter(type == "normal") %>%
  filter(!duplicated(home)) %>%
  ggplot(.) +
  geom_sf(data = st_as_sf(ne_countries(scale=110), st_crs("+init:epsg=4326")),
          colour = NA) +
  geom_sf_text(aes(label = home, geometry = geometry), alpha = 0.5) +
  #taken from st_bbox(all_matches$geometry)
  coord_sf(xlim = c(-24, 78), ylim = c(30, 67)) +
  ggtitle("Home location of every Champions League team",
          subtitle = "1955-2019, includes qualifying rounds") +
  theme_map()

#plot 
p1

It’s quite nice to see the distribution- hubs around large cities with competitive leagues (e.g. Denmark, Czech Republic, The Rhine), with extremes in the north in Iceland/Faroe Islands, to the south in Israel, and the far far East with the Central Asian UEFA countries.

The first thing to then work out is the matches per team, which can be done via a simple gather. (in theory you’d want to use pivot_long which has deprecated gather but afaik it doesnt play well with geometry data yet). We also mutate in 2 variables for the home and away teams to keep the matches for data presentation purposes.

Once we have that, we have each match played by each team, each season. A nice little result is we can see which teams have had the longest campaigns (in terms of number of matches), which it turns out are the Valencia and Bayer Leverkusen teams that qualified and got to the finals of the Champions League during the longer two-group-stage format at the turn of the century.

#melt the mach data by team
team_campaigns <- all_matches %>%
  select(season, date, competition, round, home, away, result, geometry) %>%
  #keep the home and away columns for later
  mutate(home_keep = home, away_keep = away) %>%
  gather("location", "team_name",
         -season, -competition, -round, -result, -geometry, -date,
         -home_keep, -away_keep)

#get the longest campaigns in terms of n matches
longest_campaigns <- team_campaigns %>%
  group_by(season, team_name) %>%
  summarise(matches = n()) %>%
  arrange(-matches)

head(longest_campaigns)
## # A tibble: 6 x 3
## # Groups:   season [4]
##   season    team_name        matches
##   <chr>     <chr>              <int>
## 1 1999-2000 Valencia CF           19
## 2 2000-2001 Valencia CF           19
## 3 2001-2002 Bayer Leverkusen      19
## 4 2002-2003 AC Milan              19
## 5 2000-2001 Leeds United          18
## 6 2001-2002 FC Barcelona          18

But we want to work out the distance to each match, not the number. To do this, first we want to work backwards and get the lat/lon of each clubs home ground. We can then merge this with the match location data and find the difference between these two locations (in metres). I.e. for every home game, a team will travel 0m to the game, whereas the away club will travel probably many kilometres.

#work backwards and get the home location of each team
team_locations <- all_matches %>%
  filter(type == "normal") %>%
  filter(!duplicated(home)) %>%
  select(team_name = home, location = match_location) %>%
  left_join(., all_locations, by = "location")

#merge this in
#for each team match have location of match and home location of team
match_travel <- team_campaigns %>%
  left_join(., select(team_locations, team_name, geometry), by = "team_name") %>%
  #calculate the distance between each teams home location the match
  mutate(distance = st_distance(geometry.x, geometry.y, by_element = TRUE))

head(match_travel)
##      season       date competition       round             result
## 1 1955-1956 13/06/1956         ucl       Final          4:3 (2:2)
## 2 1956-1957 16/09/1956         ucl    1. Round          7:0 (4:0)
## 3 1956-1957 28/11/1956         ucl Round of 16          3:1 (1:0)
## 4 1956-1957 12/12/1956         ucl Round of 16          2:0 (2:0)
## 5 1956-1957 30/05/1957         ucl       Final          2:0 (0:0)
## 6 1957-1958 15/10/1957         ucl    1. Round 1:1 (0:1, 1:1) aet
##                   geometry.x                 home_keep        away_keep
## 1  POINT (2.253049 48.84144)               Real Madrid      Stade Reims
## 2  POINT (7.453112 51.49276)         Borussia Dortmund  Spora Luxemburg
## 3  POINT (2.253049 48.84144)                  OGC Nice       Rangers FC
## 4 POINT (-3.688344 40.45305)               Real Madrid       Rapid Wien
## 5 POINT (-3.688344 40.45305)               Real Madrid   ACF Fiorentina
## 6  POINT (13.40849 52.54356) SC Wismut Karl-Marx-Stadt Gwardia Warszawa
##   location                 team_name                 geometry.y
## 1     home               Real Madrid POINT (-3.688344 40.45305)
## 2     home         Borussia Dortmund  POINT (7.450945 51.49807)
## 3     home                  OGC Nice  POINT (7.195828 43.68232)
## 4     home               Real Madrid POINT (-3.688344 40.45305)
## 5     home               Real Madrid POINT (-3.688344 40.45305)
## 6     home SC Wismut Karl-Marx-Stadt  POINT (12.69902 50.58733)
##           distance
## 1 1043745.9567 [m]
## 2     609.3633 [m]
## 3  688197.6337 [m]
## 4       0.0000 [m]
## 5       0.0000 [m]
## 6  223136.3518 [m]

Then all we need to do is group by each team and season and calculate the total distance travelled by that team. I then printed the top 10 total distances (in km) that team had to travel to complete all of their matches

longest_distance_campaigns <- match_travel %>%
  group_by(season, team_name) %>%
  mutate(total_travel = sum(distance), 
         date = as.Date(gsub("\\/", "-", date), "%d-%m-%Y")) %>%
  select(season, date, competition, round, team = team_name,
         home = home_keep, away = away_keep, result, distance, total_travel) %>%
  arrange(-total_travel, date) 

longest_distance_campaigns %>%
  filter(!duplicated(paste(season, team))) %>%
  select(season, team, total_travel) %>%
  mutate(total_travel = total_travel / 1000) %>%
  head(., n = 10)
## # A tibble: 10 x 3
## # Groups:   season, team [10]
##    season    team             total_travel
##    <chr>     <chr>                   <dbl>
##  1 2015-2016 FK Astana              25874.
##  2 2011-2012 APOEL Nikosia          19112.
##  3 2009-2010 APOEL Nikosia          18649.
##  4 2011-2012 SL Benfica             17817.
##  5 2009-2010 Maccabi Haifa          17808.
##  6 2000-2001 Galatasaray            17371.
##  7 2010-2011 Hapoel Tel Aviv        17350.
##  8 2017-2018 Qarabag FK             17286.
##  9 2015-2016 Maccabi Tel Aviv       17041.
## 10 2002-2003 Lokomotiv Moskva       16732.

Perhaps unsurprisingly FK Astana from the capital of Kazakhstan come out top (by far), having worked through the qualifying round and making it to the group stages (where they were unbeaten at home). After that, succesful teams from the far corners of Europe (Benfica, APOEL, Hapoel Tel Aviv) come out on top. I was surprised that Lokomotiv are the only Russian team in the list, and as far back as 2002-2003. Also that all of these seasons are from this century (perhaps due to the ever increasing number of fixtures in the Champions League).

I select the matches FK Astana played in their record-breaking 2015-2016 below:

longest_distance_campaigns %>%
  filter(season == "2015-2016" & team == "FK Astana") %>%
  select(-team)
## # A tibble: 12 x 10
## # Groups:   season, team [1]
##    team  season date       competition round home  away  result distance
##    <chr> <chr>  <date>     <chr>       <chr> <chr> <chr> <chr>       [m]
##  1 FK A… 2015-… 2015-07-14 ucl-quals   2. R… NK M… FK A… 1:0 (…  4025459
##  2 FK A… 2015-… 2015-07-22 ucl-quals   2. R… FK A… NK M… 3:1 (…        0
##  3 FK A… 2015-… 2015-07-29 ucl-quals   3. R… HJK … FK A… 0:0 (…  3022119
##  4 FK A… 2015-… 2015-08-05 ucl-quals   3. R… FK A… HJK … 4:3 (…        0
##  5 FK A… 2015-… 2015-08-18 ucl-quals   Play… FK A… APOE… 1:0 (…        0
##  6 FK A… 2015-… 2015-08-26 ucl-quals   Play… APOE… FK A… 1:1 (…  3510897
##  7 FK A… 2015-… 2015-09-15 ucl         Grou… SL B… FK A… 2:0 (…  6180579
##  8 FK A… 2015-… 2015-09-30 ucl         Grou… FK A… Gala… 2:2 (…        0
##  9 FK A… 2015-… 2015-10-21 ucl         Grou… Atlé… FK A… 4:0 (…  5713668
## 10 FK A… 2015-… 2015-11-03 ucl         Grou… FK A… Atlé… 0:0 (…        0
## 11 FK A… 2015-… 2015-11-25 ucl         Grou… FK A… SL B… 2:2 (…        0
## 12 FK A… 2015-… 2015-12-08 ucl         Grou… Gala… FK A… 1:1 (…  3421522
## # … with 1 more variable: total_travel <dbl>

Finally, one of the real niche joys in my love is making maps and what better oppurtunity than to map these long distance Champions League campaigns. It’s a bit of a munge to get the lines from point data but sf does at least make it possible.

#get the top ten longest campaigns
data <- filter(longest_distance_campaigns,
               !duplicated(paste(season, team)))[1:10,] %>%
  ungroup() %>%
  select(season, team_name = team, total_travel) %>%
  left_join(., match_travel) %>%
  #munge the geometry
  filter(st_geometry(.$geometry.x) != st_geometry(.$geometry.y)) %>%
    mutate(versus = case_when(
        location == "home" ~ away_keep,
        location == "away" ~ home_keep
    )) %>%
    select(season, team_name, versus, round, total_travel,
           geometry.x, geometry.y) %>%
  split(f = rownames(.)) %>%
  #calculate lines from points
  lapply(., function(row) {
    coords1 <- st_coordinates(row$geometry.x) %>%
      split(f = rownames(.))
    coords2 <- st_coordinates(row$geometry.y) %>%
      split(f = rownames(.))
    
    lines <- map2(coords1, coords2, ~st_linestring(rbind(.x, .y)))
    row$lines <- st_as_sfc(lines, crs = st_crs("+init=epsg:4326"))
    return(row)
  }) %>%
  do.call(rbind, .) %>%
  mutate(title = paste(season, team_name, "=", round(total_travel/1000), "km"))

#plot the travel of each team
p2 <- ggplot() +
  geom_sf(data = st_as_sf(ne_countries(scale=110), st_crs("+init:epsg=4326")), 
          colour = NA) +
  geom_sf(data = data, aes(geometry = lines),
          colour = "red", size = 2) +
  geom_sf_text(data = data, aes(geometry = geometry.x, label = versus),
               size = 4, nudge_y = 2) +
  #again taken from st_bbox
  coord_sf(xlim = c(-23, 77), ylim = c(66, 30)) +
  theme_map() +
  theme(
    strip.background = element_rect(fill = "white"),
    strip.text = element_text(size = 10)
  ) +
  facet_wrap(~title)

plot of the longest UCL campaigns

Click on the image for a higher-res version :)

Two quick finishing pieces:

Firstly, what is the single longest journey in the history of the Champions League? Unsurprisingly it involves the 2015-2016 FK Astana season travelling to Benfica on the coast of Portugal (and of course the return fixture).

match_travel[which.max(match_travel$distance),]
##          season       date competition   round    result
## 12000 2015-2016 15/09/2015         ucl Group C 2:0 (0:0)
##                       geometry.x  home_keep away_keep location team_name
## 12000 POINT (-9.184503 38.75253) SL Benfica FK Astana     away FK Astana
##                      geometry.y    distance
## 12000 POINT (71.40261 51.10822) 6180579 [m]

And secondly, answering the original question- what the shortest average commute for a winning side?

#get UCL champions
winners <- match_travel %>%
  filter(round == "Final") %>%
  mutate(result = gsub(" .*", "", result)) %>%
  separate(result, into = c("h_goal", "a_goal"), sep = ":") %>%
  filter((location == "home" & h_goal > a_goal) | (location == "away" & a_goal > h_goal)) %>%
  select(season, team_name)

#find the matches played by champions
winners_matches <- left_join(winners, match_travel, by = c("season", "team_name")) %>%
  group_by(season, team_name) %>%
  mutate(matches = n(), total_travel = sum(distance/1000)) %>%
  ungroup() %>%
  #calculate average travel per game
  mutate(average_travel = total_travel / matches,
         date = as.Date(gsub("\\/", "-", date), "%d-%m-%Y")) %>%
  select(season, date, round, home = home_keep, away = away_keep, result, distance, average_travel) %>%
  arrange(average_travel, date)

#print the 3 campaigns with the lowest average travel
head(winners_matches, n = 27)
## # A tibble: 27 x 8
##    season  date       round   home    away   result distance average_travel
##    <chr>   <date>     <chr>   <chr>   <chr>  <chr>       [m]          <dbl>
##  1 1963-1… 1963-09-18 1. Rou… Everto… Inter  0:0 (… 1247112…           410.
##  2 1963-1… 1963-09-25 1. Rou… Inter   Evert… 1:0 (…       0…           410.
##  3 1963-1… 1963-11-27 Round … Inter   AS Mo… 1:0 (…       0…           410.
##  4 1963-1… 1963-12-04 Round … AS Mon… Inter  1:3 (…  236127…           410.
##  5 1963-1… 1964-02-26 Quarte… Partiz… Inter  0:2 (…  891298…           410.
##  6 1963-1… 1964-03-04 Quarte… Inter   Parti… 2:1 (…       0…           410.
##  7 1963-1… 1964-04-15 Semi-f… Boruss… Inter  2:2 (…  684493…           410.
##  8 1963-1… 1964-04-29 Semi-f… Inter   Borus… 2:0 (…       0…           410.
##  9 1963-1… 1964-05-27 Final   Inter   Real … 3:1 (…  632859…           410.
## 10 1971-1… 1971-09-15 1. Rou… AFC Aj… Dynam… 2:0 (…       0…           433.
## # … with 17 more rows

Where the top three are Inter’s 1963-1964, Ajax’s 1971-1972, and Bayern Munich’s 1973-1974 seasons all of which have an average travel of just over 400km per game. It’s fairly striking how many more Central European teams there are further in the competitions in these seasons comapred to today.

And that’s all for now! Thanks for reading and I’ll try and put out another post soon :)

Related