In what is becoming a repeated series, I enjoy answering trivia questions from The Guardian’s The Knowledge football trivia column.
There’s a few questions that built up that seemed amenable to coding answers so I’ve taken a stab at them here
#munging library(tidyverse) library(data.table) library(zoo) #english football data library(engsoccerdata) #web data scraping library(rvest) #plotting library(openair) Calendar Boys The first question this week concerns players scoring on (or nearest to) every day of the year
Riddler Classic In my spare time I enjoy solving 538’s The Riddler column. This week I had a spare few hours waiting for the Superbowl to start and decided to code up a solution to the latest problem to keep me busy.
The question revolves around a card game in which whatever choice a player makes, they are likely to lose to a con artist. Formally this is phrased as:
Every so often a question on The Guardian’s The Knowledge football trivia section piques my interest and is amenable to analysis using R. Previously, I looked at club name suffixes and young World Cup winners last August. This week (give or take), a question posed on twitter caught my attention:
@TheKnowledge_GU was just chatting to some colleagues in the kitchen at work about why Essex doesn't have many big football clubs and it got me thinking.
Over the last few years since I started coding I’d always been interested in how data science could help predict football results/ identify footballing talents, and just generally ‘solve’ football.
One of the major problems with analysing football had been the availability of data. Though there’s a lot of great published stuff freely available to read, a lot of the cutting edge work revolves around advanced metrics, such as expected goals, which it’s hard to get the data for.
Given it’s the new year, I decided to try and get back onto more regular posting on this blog (mostly just to build up a portfolio of work).
A quick way to get something to work with that can be published unpolished is #TidyTuesday on twitter which (as far as I know/can tell) is organised by Thomas Mock from RStudio.
This week, the data comes in the form of a massive corpus of every tweet using the #rstats hashtag, curated by rtweet package creator Mike Kearney.