Advent Calendar of Football Trivia Analyses

One of the most consistent fonts of posts on this blog is The Guardian’s football trivia page The Knowledge. A particular reason for this is that the small contained questions lend themselves to small blogposts that I can turn around in an hour or two, as opposed to being endlessly redrafted until I lose interest. However, I still sometimes don’t quite get round to finishing some of these posts, or have trouble justifying a blog post on a very small and ‘trivial’ answer to a question.

Could Yorkshire Win the World Cup

In 2018, after watching the CONIFA World Cup final live, I wondered if an Independent Yorkshire could win the FIFA World Cup. This resulted in a few blogposts that were turned into an article in Citymetric magazine

Guardian: The Knowledge

In my free time I enjoy answering football trivia from The Guardian’s The Knowledge blog programmatically

Statsbomb Conference

In Summer 2019, I won the chance to explore a hypothesis in football analytics using data from Statsbomb. My final project looked at Markov chain models of possession value in football, and considering how to incorporate defensive risk into such models.

Scraping Dynamic Websites with PhantomJS

For a recent blogpost, I required data on the ELO ratings of national football teams over time. Such a list exists online at and so in theory this was just a simple task for rvest to read the html pages on that site and then fish out the data I wanted. However, while this works for the static websites which make up the vast majority of sites containing tables of data, it struggles with websites that use JavaScript to dynamically generate pages.

The Guardian Knowledge June 2019

Most Wednesday’s I enjoy reading The Knowledge blog on the Guardian’s website and reading the football trivia therein. When time (and questions) allow, I like to answer some of the questions posed, example of which are here, here, and here. League of Nations The first question comes from Which player had the nationality with the lowest FIFA World Ranking at the time of him winning the Premier League? — The Tin Boonie (@TheTinBoonie) June 18, 2019 a similar question is also answered in this weeks column:

An Introduction to Modelling Soccer Matches in R (part 1)

For anyone watching football, being able to predict matches is a key aspect of the hobby. Whether explicitly (e.g. when betting on matches, or deciding on recruitment for an upcoming season), or more implicitly when discussing favourites to win the league in the pub, almost all discussion of the sport on some level require predictions about some set of upcoming games. The first step of prediction is some form of quantification of ability.

The Knowledge 7th February 2019

In what is becoming a repeated series, I enjoy answering trivia questions from The Guardian’s The Knowledge football trivia column. There’s a few questions that built up that seemed amenable to coding answers so I’ve taken a stab at them here #munging library(tidyverse) library(data.table) library(zoo) #english football data library(engsoccerdata) #web data scraping library(rvest) #plotting library(openair) Calendar Boys The first question this week concerns players scoring on (or nearest to) every day of the year

Which English County Has Won the Most Points

Every so often a question on The Guardian’s The Knowledge football trivia section piques my interest and is amenable to analysis using R. Previously, I looked at club name suffixes and young World Cup winners last August. This week (give or take), a question posed on twitter caught my attention: @TheKnowledge_GU was just chatting to some colleagues in the kitchen at work about why Essex doesn’t have many big football clubs and it got me thinking.

Predicting the 2018-19 Women's Super League Using xG and Dixon-Coles

Over the last few years since I started coding I’d always been interested in how data science could help predict football results/ identify footballing talents, and just generally ‘solve’ football. One of the major problems with analysing football had been the availability of data. Though there’s a lot of great published stuff freely available to read, a lot of the cutting edge work revolves around advanced metrics, such as expected goals, which it’s hard to get the data for.