Scraping EPL Salary Data
EPL Scraping
In a previous post, I scraped some NFL data and learned the structure of Sportrac. Now, I want to scrape the available data on the EPL. The EPL data is organized in a few distinct but potentially linked tables. The basic structure is organized around team folders. Let me begin by isolating those URLs.
library(rvest)
library(tidyverse)
base_url <- "http://www.spotrac.com/epl/"
read.base <- read_html(base_url)
team.URL <- read.base %>% html_nodes(".team-name") %>% html_attr('href')
team.URL
## [1] "https://www.spotrac.com/epl/arsenal-fc/"
## [2] "https://www.spotrac.com/epl/aston-villa-fc/"
## [3] "https://www.spotrac.com/epl/brighton-hove-albion/"
## [4] "https://www.spotrac.com/epl/burnley-fc/"
## [5] "https://www.spotrac.com/epl/chelsea-fc/"
## [6] "https://www.spotrac.com/epl/crystal-palace/"
## [7] "https://www.spotrac.com/epl/everton-fc/"
## [8] "https://www.spotrac.com/epl/fulham-fc/"
## [9] "https://www.spotrac.com/epl/leeds-united-fc/"
## [10] "https://www.spotrac.com/epl/leicester-city/"
## [11] "https://www.spotrac.com/epl/liverpool-fc/"
## [12] "https://www.spotrac.com/epl/manchester-city-fc/"
## [13] "https://www.spotrac.com/epl/manchester-united-fc/"
## [14] "https://www.spotrac.com/epl/newcastle-united-fc/"
## [15] "https://www.spotrac.com/epl/sheffield-united-fc/"
## [16] "https://www.spotrac.com/epl/southampton-fc/"
## [17] "https://www.spotrac.com/epl/tottenham-hotspur-fc/"
## [18] "https://www.spotrac.com/epl/west-bromwich-albion-fc/"
## [19] "https://www.spotrac.com/epl/west-ham-united-fc/"
## [20] "https://www.spotrac.com/epl/wolverhampton-wanderers-fc/"
# Clean up the URLs to get the team names by themselves.
team.names <- gsub(base_url, "", team.URL)
team.names <- gsub("-f.c", " FC", team.names)
team.names <- gsub("afc", "AFC", team.names)
team.names <- gsub("a.f.c", "AFC", team.names)
# Dashes and slashes need to removed.
team.names <- gsub("-", " ", team.names)
team.names <- gsub("/", "", team.names)
# Fix FC and AFC for Bournemouth
simpleCap <- function(x) {
s <- strsplit(x, " ")[[1]]
paste(toupper(substring(s, 1,1)), substring(s, 2), sep="", collapse=" ")
}
# Capitalise and trim white space
team.names <- sapply(team.names, simpleCap)
#team.names <- sapply(team.names, trimws)
names(team.names) <- NULL
# Now I have a vector of 20 names.
short.names <- gsub(" FC","", team.names)
short.names <- gsub(" AFC","", short.names)
EPL.names <- data.frame(team.names,short.names,team.URL)
EPL.names
## team.names
## 1 Https:www.spotrac.comeplarsenal Fc
## 2 Https:www.spotrac.comeplaston Villa Fc
## 3 Https:www.spotrac.comeplbrighton Hove Albion
## 4 Https:www.spotrac.comeplburnley Fc
## 5 Https:www.spotrac.comeplchelsea Fc
## 6 Https:www.spotrac.comeplcrystal Palace
## 7 Https:www.spotrac.comepleverton Fc
## 8 Https:www.spotrac.comeplfulham Fc
## 9 Https:www.spotrac.comeplleeds United Fc
## 10 Https:www.spotrac.comeplleicester City
## 11 Https:www.spotrac.comeplliverpool Fc
## 12 Https:www.spotrac.comeplmanchester City Fc
## 13 Https:www.spotrac.comeplmanchester United Fc
## 14 Https:www.spotrac.comeplnewcastle United Fc
## 15 Https:www.spotrac.comeplsheffield United Fc
## 16 Https:www.spotrac.comeplsouthampton Fc
## 17 Https:www.spotrac.comepltottenham Hotspur Fc
## 18 Https:www.spotrac.comeplwest Bromwich Albion Fc
## 19 Https:www.spotrac.comeplwest Ham United Fc
## 20 Https:www.spotrac.comeplwolverhampton Wanderers Fc
## short.names
## 1 Https:www.spotrac.comeplarsenal Fc
## 2 Https:www.spotrac.comeplaston Villa Fc
## 3 Https:www.spotrac.comeplbrighton Hove Albion
## 4 Https:www.spotrac.comeplburnley Fc
## 5 Https:www.spotrac.comeplchelsea Fc
## 6 Https:www.spotrac.comeplcrystal Palace
## 7 Https:www.spotrac.comepleverton Fc
## 8 Https:www.spotrac.comeplfulham Fc
## 9 Https:www.spotrac.comeplleeds United Fc
## 10 Https:www.spotrac.comeplleicester City
## 11 Https:www.spotrac.comeplliverpool Fc
## 12 Https:www.spotrac.comeplmanchester City Fc
## 13 Https:www.spotrac.comeplmanchester United Fc
## 14 Https:www.spotrac.comeplnewcastle United Fc
## 15 Https:www.spotrac.comeplsheffield United Fc
## 16 Https:www.spotrac.comeplsouthampton Fc
## 17 Https:www.spotrac.comepltottenham Hotspur Fc
## 18 Https:www.spotrac.comeplwest Bromwich Albion Fc
## 19 Https:www.spotrac.comeplwest Ham United Fc
## 20 Https:www.spotrac.comeplwolverhampton Wanderers Fc
## team.URL
## 1 https://www.spotrac.com/epl/arsenal-fc/
## 2 https://www.spotrac.com/epl/aston-villa-fc/
## 3 https://www.spotrac.com/epl/brighton-hove-albion/
## 4 https://www.spotrac.com/epl/burnley-fc/
## 5 https://www.spotrac.com/epl/chelsea-fc/
## 6 https://www.spotrac.com/epl/crystal-palace/
## 7 https://www.spotrac.com/epl/everton-fc/
## 8 https://www.spotrac.com/epl/fulham-fc/
## 9 https://www.spotrac.com/epl/leeds-united-fc/
## 10 https://www.spotrac.com/epl/leicester-city/
## 11 https://www.spotrac.com/epl/liverpool-fc/
## 12 https://www.spotrac.com/epl/manchester-city-fc/
## 13 https://www.spotrac.com/epl/manchester-united-fc/
## 14 https://www.spotrac.com/epl/newcastle-united-fc/
## 15 https://www.spotrac.com/epl/sheffield-united-fc/
## 16 https://www.spotrac.com/epl/southampton-fc/
## 17 https://www.spotrac.com/epl/tottenham-hotspur-fc/
## 18 https://www.spotrac.com/epl/west-bromwich-albion-fc/
## 19 https://www.spotrac.com/epl/west-ham-united-fc/
## 20 https://www.spotrac.com/epl/wolverhampton-wanderers-fc/
With clean names, I can take each of the scraping tasks in order.
Payroll Data
The teams have payroll information that is broken down into active players, reserves, and loanees. The workflow is first to create the relevant URLs to scrape the payroll data.
team_links <- paste0(team.URL,"payroll/",sep="")
With URLs, I am going to set forth on the task. First, the SelectorGadget and a glimpse of the documents suggests an easy solution. I want to isolate the table nodes and keep the tables. First, a function for the URLs.
data.creator <- function(link) {
read_html(link) %>% html_nodes("table") %>% html_table(header=TRUE, fill=TRUE)
}
Now I want to apply data scraping function to the URLs. Then, I want to name the list items, assess the size of the active roster, and then clean up the relevant data.
EPL.salary <- sapply(team_links, function(x) {data.creator(x)})
names(EPL.salary) <- EPL.names$short.names
team.len <- sapply(seq(1,20), function(x) { dim(EPL.salary[[x]][[1]])[[1]]})
Team <- rep(EPL.names$short.names, team.len)
Players <- sapply(seq(1,20), function(x) { str_split(EPL.salary[[x]][[1]][,1], "\t", simplify=TRUE)[,31]})
Position <- sapply(seq(1,20), function(x) { EPL.salary[[x]][[1]][,2]})
Base.Salary <- sapply(seq(1,20), function(x) { Res <- gsub("£", "", EPL.salary[[x]][[1]][,3]); gsub(",","",Res)})
EPL.Result <- data.frame(Players=unlist(Players), Team=Team, Position=unlist(Position), Base.Salary=unlist(Base.Salary))
EPL.Result$Base.Salary <- str_replace(EPL.Result$Base.Salary, "-", NA_character_)
EPL.Result$Base.Num <- as.numeric(EPL.Result$Base.Salary)
EPL.Result %>% group_by(Position) %>% summarise(Mean.Base.Salary=mean(Base.Num, na.rm=TRUE),sdBS=sd(Base.Num, na.rm = TRUE))
## # A tibble: 4 x 3
## Position Mean.Base.Salary sdBS
## * <chr> <dbl> <dbl>
## 1 D 25 5.27
## 2 F 24.5 4.61
## 3 GK 28.2 4.68
## 4 M 24.8 4.95
EPL.Result %>% group_by(Position,Team) %>% summarise(Mean.Base.Salary=mean(Base.Num, na.rm=TRUE),sdBS=sd(Base.Num, na.rm = TRUE))
## # A tibble: 80 x 4
## # Groups: Position [4]
## Position Team Mean.Base.Salary sdBS
## <chr> <chr> <dbl> <dbl>
## 1 D Https:www.spotrac.comeplarsenal Fc 24.7 3.77
## 2 D Https:www.spotrac.comeplaston Villa Fc 25.6 3.62
## 3 D Https:www.spotrac.comeplbrighton Hove Albion 23.9 4.22
## 4 D Https:www.spotrac.comeplburnley Fc 28.6 3.62
## 5 D Https:www.spotrac.comeplchelsea Fc 26.2 4.49
## 6 D Https:www.spotrac.comeplcrystal Palace 28 4.81
## 7 D Https:www.spotrac.comepleverton Fc 24.7 3.86
## 8 D Https:www.spotrac.comeplfulham Fc 25.5 3.57
## 9 D Https:www.spotrac.comeplleeds United Fc 19.4 11.6
## 10 D Https:www.spotrac.comeplleicester City 25.7 6.11
## # … with 70 more rows
Finally, a little picture to describe spending on the active roster.
fplot <- ggplot(EPL.Result, aes(Base.Num,Team))
gpl <- fplot + geom_jitter(height=0.25, width=0) + facet_wrap(~Position) + labs(x="Base Salary")
gpl
Contracts
The contracts are stored in a different URL structure that is accessible via contracts in the html tree by tean. Firstm I want to paste the names together with links to explore.
team_links <- paste0(team.URL,"contracts/",sep="")
Now I have all the links that I need and can turn to processing the data. This is something of a mess. Let me first grab some data to showcase the problem. In what follows, first I will grab the HTML files.
Base.Contracts <- lapply(team_links, read_html)
Processing them is a bit more difficult. What does the basic table look like?
Base.Contracts[[1]] %>% html_nodes("table") %>% html_table(header=TRUE, fill=TRUE)
## [[1]]
## Player (30) Pos. Age
## 1 LacazetteAlexandre Lacazette F 29
## 2 AubameyangPierre-Emerick Aubameyang F 31
## 3 ParteyThomas Partey M 27
## 4 PepeNicolas Pepe F 25
## 5 da SilvaWillian da Silva M 32
## 6 BellerinHector Bellerin D 25
## 7 XhakaGranit Xhaka M 28
## 8 LenoBernd Leno GK 28
## 9 MartinelliGabriel Martinelli F 19
## 10 TierneyKieran Tierney D 23
## 11 KolasinacSead Kolasinac D 27
## 12 TorreiraLucas Torreira M 24
## 13 MaríPablo Marí D 27
## 14 SoaresCedric Soares D 29
## 15 LuizDavid Luiz D 33
## 16 MagalhãesGabriel Magalhães D 23
## 17 NketiahEdward Nketiah F 21
## 18 ElnenyMohamed Elneny M 28
## 19 ChambersCalum Chambers D 26
## 20 GuendouziMatteo Guendouzi M 21
## 21 SalibaWilliam Saliba D 19
## 22 Maitland-NilesAinsley Maitland-Niles M NA
## 23 RúnarssonRúnar Alex Rúnarsson GK 25
## 24 MavropanosKonstantinos Mavropanos D 23
## 25 HoldingRob Holding D 25
## 26 Smith RoweEmile Smith Rowe M 20
## 27 WillockJoe Willock M 21
## 28 CeballosDani Ceballos M 24
## 29 SakaBukayo Saka F 19
## 30 ØdegaardMartin Ødegaard M 22
## Contract Terms
## 1 47333365\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£47,333,365
## 2 39000000\n\t\t\t\t\t\t\t\t\t\t\t3 yr\n\t\t\t\t\t\t\t\t\t\t\t£39,000,000
## 3 39000000\n\t\t\t\t\t\t\t\t\t\t\t3 yr\n\t\t\t\t\t\t\t\t\t\t\t£39,000,000
## 4 36400000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£36,400,000
## 5 30000000\n\t\t\t\t\t\t\t\t\t\t\t3 yr\n\t\t\t\t\t\t\t\t\t\t\t£30,000,000
## 6 28600000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£28,600,000
## 7 26000000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£26,000,000
## 8 26000000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£26,000,000
## 9 23400000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£23,400,000
## 10 21000000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£21,000,000
## 11 20800000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£20,800,000
## 12 19500000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£19,500,000
## 13 17680000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£17,680,000
## 14 15600000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£15,600,000
## 15 13070000\n\t\t\t\t\t\t\t\t\t\t\t2 yr\n\t\t\t\t\t\t\t\t\t\t\t£13,070,000
## 16 13000000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£13,000,000
## 17 11700000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£11,700,000
## 18 10400000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£10,400,000
## 19 10400000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£10,400,000
## 20 10400000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£10,400,000
## 21 10400000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£10,400,000
## 22 9100000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£9,100,000
## 23 8320000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£8,320,000
## 24 7150000\n\t\t\t\t\t\t\t\t\t\t\t6 yr\n\t\t\t\t\t\t\t\t\t\t\t£7,150,000
## 25 6500000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£6,500,000
## 26 5200000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£5,200,000
## 27 4160000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£4,160,000
## 28 2700000\n\t\t\t\t\t\t\t\t\t\t\t1 yr\n\t\t\t\t\t\t\t\t\t\t\t£2,700,000
## 29 2080000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£2,080,000
## 30 0\n\t\t\t\t\t\t\t\t\t\t\t1 yr\n\t\t\t\t\t\t\t\t\t\t\t-
## Avg. Salary Transfer Fee Expires
## 1 £9,466,673 £45,050,000 2022
## 2 £13,000,000 £57,380,000 2023
## 3 £13,000,000 £55,000,000 2023
## 4 £7,280,000 £91,200,000 2024
## 5 £10,000,000 - 2023
## 6 £5,720,000 - 2023
## 7 £5,200,000 £36,000,000 2023
## 8 £5,200,000 £19,200,000 2023
## 9 £4,680,000 £7,640,000 2025
## 10 £4,200,000 £30,780,000 2024
## 11 £5,200,000 - 2022
## 12 £3,900,000 £26,000,000 2023
## 13 £4,420,000 £8,800,000 2024
## 14 £3,900,000 - 2024
## 15 £6,535,000 £9,920,000 2021
## 16 £2,600,000 £28,600,000 2025
## 17 £2,340,000 - 2022
## 18 £2,600,000 £10,630,000 2022
## 19 £2,600,000 £17,700,000 2022
## 20 £2,080,000 £7,000,000 2023
## 21 £2,080,000 £34,200,000 2024
## 22 £1,820,000 - 2023
## 23 £2,080,000 £2,200,000 2024
## 24 £1,191,667 £1,890,000 2023
## 25 £1,300,000 £2,600,000 2025
## 26 £1,040,000 - 2023
## 27 £1,040,000 - 2023
## 28 £2,700,000 - 2021
## 29 £520,000 - 2024
## 30 - - -
##
## [[2]]
## Player (30) Pos. Age Contract Terms Avg. Salary
## 1 Mathew Ryan GK 27 1 yr\n\t\t\t\t\t\t\t\t\t\t\t£1,820,000 -
## Transfer Fee Expires
## 1 - -
The names and the contract year and terms are going to require parsing. I have chosen the first html that corresponds to Bournemouth; other teams are worse because loan players are in a second table. That impacts the wage bill, perhaps, depending on the arrangement in the loan, but the contract details from the player do not have that team as signatory. This has to be fixed. That is easy enough to fix, there are two embedded tables and I can select the first one. When it comes to the names, there is no easy separation for the first column; I will grab them from nodes in the html.
data.creator <- function(data) {
data %>% html_nodes("table") %>% html_table(header=TRUE, fill=TRUE) -> ret.tab
nrowsm <- dim(ret.tab[[1]])[[1]]
split.me <- ret.tab[[1]][,4]
tempdf <- data.frame(matrix(data=gsub("\t|-","",unlist(strsplit(split.me, "\\n"))), nrow=nrowsm, byrow=TRUE))
names(tempdf) <- c("value","years","value.pds")
data %>% html_nodes(".player") %>% html_nodes("a") %>% html_text() -> Player.Names
Player.Names <- Player.Names[c(1:nrowsm)]
data %>% html_nodes(".player") %>% html_nodes("a") %>% html_attr("href") -> Player.Links
Player.links <- Player.Links[c(1:nrowsm)]
data %>% html_nodes(".player") %>% html_nodes("span") %>% html_text() -> Last.Name
Last.Name <- Last.Name[c(1:nrowsm)]
names(ret.tab[1][[1]])[c(1:2)] <- c("Player","Position")
# data.frame(ret.tab[,c(5,6,7)])
return(data.frame(ret.tab[1][[1]],tempdf,Player.Names,Player.links,Last.Name))
}
EPL.Contracts <- lapply(Base.Contracts, data.creator)
names(EPL.Contracts) <- EPL.names$short.names
EPL.Contracts[[1]]
## Player Position Age
## 1 LacazetteAlexandre Lacazette F 29
## 2 AubameyangPierre-Emerick Aubameyang F 31
## 3 ParteyThomas Partey M 27
## 4 PepeNicolas Pepe F 25
## 5 da SilvaWillian da Silva M 32
## 6 BellerinHector Bellerin D 25
## 7 XhakaGranit Xhaka M 28
## 8 LenoBernd Leno GK 28
## 9 MartinelliGabriel Martinelli F 19
## 10 TierneyKieran Tierney D 23
## 11 KolasinacSead Kolasinac D 27
## 12 TorreiraLucas Torreira M 24
## 13 MaríPablo Marí D 27
## 14 SoaresCedric Soares D 29
## 15 LuizDavid Luiz D 33
## 16 MagalhãesGabriel Magalhães D 23
## 17 NketiahEdward Nketiah F 21
## 18 ElnenyMohamed Elneny M 28
## 19 ChambersCalum Chambers D 26
## 20 GuendouziMatteo Guendouzi M 21
## 21 SalibaWilliam Saliba D 19
## 22 Maitland-NilesAinsley Maitland-Niles M NA
## 23 RúnarssonRúnar Alex Rúnarsson GK 25
## 24 MavropanosKonstantinos Mavropanos D 23
## 25 HoldingRob Holding D 25
## 26 Smith RoweEmile Smith Rowe M 20
## 27 WillockJoe Willock M 21
## 28 CeballosDani Ceballos M 24
## 29 SakaBukayo Saka F 19
## 30 ØdegaardMartin Ødegaard M 22
## Contract.Terms
## 1 47333365\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£47,333,365
## 2 39000000\n\t\t\t\t\t\t\t\t\t\t\t3 yr\n\t\t\t\t\t\t\t\t\t\t\t£39,000,000
## 3 39000000\n\t\t\t\t\t\t\t\t\t\t\t3 yr\n\t\t\t\t\t\t\t\t\t\t\t£39,000,000
## 4 36400000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£36,400,000
## 5 30000000\n\t\t\t\t\t\t\t\t\t\t\t3 yr\n\t\t\t\t\t\t\t\t\t\t\t£30,000,000
## 6 28600000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£28,600,000
## 7 26000000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£26,000,000
## 8 26000000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£26,000,000
## 9 23400000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£23,400,000
## 10 21000000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£21,000,000
## 11 20800000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£20,800,000
## 12 19500000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£19,500,000
## 13 17680000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£17,680,000
## 14 15600000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£15,600,000
## 15 13070000\n\t\t\t\t\t\t\t\t\t\t\t2 yr\n\t\t\t\t\t\t\t\t\t\t\t£13,070,000
## 16 13000000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£13,000,000
## 17 11700000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£11,700,000
## 18 10400000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£10,400,000
## 19 10400000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£10,400,000
## 20 10400000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£10,400,000
## 21 10400000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£10,400,000
## 22 9100000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£9,100,000
## 23 8320000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£8,320,000
## 24 7150000\n\t\t\t\t\t\t\t\t\t\t\t6 yr\n\t\t\t\t\t\t\t\t\t\t\t£7,150,000
## 25 6500000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£6,500,000
## 26 5200000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£5,200,000
## 27 4160000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£4,160,000
## 28 2700000\n\t\t\t\t\t\t\t\t\t\t\t1 yr\n\t\t\t\t\t\t\t\t\t\t\t£2,700,000
## 29 2080000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£2,080,000
## 30 0\n\t\t\t\t\t\t\t\t\t\t\t1 yr\n\t\t\t\t\t\t\t\t\t\t\t-
## Avg..Salary Transfer.Fee Expires value years value.pds
## 1 £9,466,673 £45,050,000 2022 47333365 5 yr £47,333,365
## 2 £13,000,000 £57,380,000 2023 39000000 3 yr £39,000,000
## 3 £13,000,000 £55,000,000 2023 39000000 3 yr £39,000,000
## 4 £7,280,000 £91,200,000 2024 36400000 5 yr £36,400,000
## 5 £10,000,000 - 2023 30000000 3 yr £30,000,000
## 6 £5,720,000 - 2023 28600000 5 yr £28,600,000
## 7 £5,200,000 £36,000,000 2023 26000000 5 yr £26,000,000
## 8 £5,200,000 £19,200,000 2023 26000000 5 yr £26,000,000
## 9 £4,680,000 £7,640,000 2025 23400000 5 yr £23,400,000
## 10 £4,200,000 £30,780,000 2024 21000000 5 yr £21,000,000
## 11 £5,200,000 - 2022 20800000 4 yr £20,800,000
## 12 £3,900,000 £26,000,000 2023 19500000 5 yr £19,500,000
## 13 £4,420,000 £8,800,000 2024 17680000 4 yr £17,680,000
## 14 £3,900,000 - 2024 15600000 4 yr £15,600,000
## 15 £6,535,000 £9,920,000 2021 13070000 2 yr £13,070,000
## 16 £2,600,000 £28,600,000 2025 13000000 5 yr £13,000,000
## 17 £2,340,000 - 2022 11700000 5 yr £11,700,000
## 18 £2,600,000 £10,630,000 2022 10400000 4 yr £10,400,000
## 19 £2,600,000 £17,700,000 2022 10400000 4 yr £10,400,000
## 20 £2,080,000 £7,000,000 2023 10400000 5 yr £10,400,000
## 21 £2,080,000 £34,200,000 2024 10400000 5 yr £10,400,000
## 22 £1,820,000 - 2023 9100000 5 yr £9,100,000
## 23 £2,080,000 £2,200,000 2024 8320000 4 yr £8,320,000
## 24 £1,191,667 £1,890,000 2023 7150000 6 yr £7,150,000
## 25 £1,300,000 £2,600,000 2025 6500000 5 yr £6,500,000
## 26 £1,040,000 - 2023 5200000 5 yr £5,200,000
## 27 £1,040,000 - 2023 4160000 4 yr £4,160,000
## 28 £2,700,000 - 2021 2700000 1 yr £2,700,000
## 29 £520,000 - 2024 2080000 4 yr £2,080,000
## 30 - - - 0 1 yr
## Player.Names Player.links
## 1 Alexandre Lacazette https://www.spotrac.com/redirect/player/24059/
## 2 Pierre-Emerick Aubameyang https://www.spotrac.com/redirect/player/24963/
## 3 Thomas Partey https://www.spotrac.com/redirect/player/62853/
## 4 Nicolas Pepe https://www.spotrac.com/redirect/player/32697/
## 5 Willian da Silva https://www.spotrac.com/redirect/player/22635/
## 6 Hector Bellerin https://www.spotrac.com/redirect/player/11978/
## 7 Granit Xhaka https://www.spotrac.com/redirect/player/22653/
## 8 Bernd Leno https://www.spotrac.com/redirect/player/26727/
## 9 Gabriel Martinelli https://www.spotrac.com/redirect/player/32041/
## 10 Kieran Tierney https://www.spotrac.com/redirect/player/32750/
## 11 Sead Kolasinac https://www.spotrac.com/redirect/player/23720/
## 12 Lucas Torreira https://www.spotrac.com/redirect/player/27685/
## 13 Pablo Marí https://www.spotrac.com/redirect/player/48798/
## 14 Cedric Soares https://www.spotrac.com/redirect/player/22941/
## 15 David Luiz https://www.spotrac.com/redirect/player/22629/
## 16 Gabriel Magalhães https://www.spotrac.com/redirect/player/50105/
## 17 Edward Nketiah https://www.spotrac.com/redirect/player/32751/
## 18 Mohamed Elneny https://www.spotrac.com/redirect/player/22656/
## 19 Calum Chambers https://www.spotrac.com/redirect/player/24015/
## 20 Matteo Guendouzi https://www.spotrac.com/redirect/player/27686/
## 21 William Saliba https://www.spotrac.com/redirect/player/32466/
## 22 Ainsley Maitland-Niles https://www.spotrac.com/redirect/player/24373/
## 23 Rúnar Alex Rúnarsson https://www.spotrac.com/redirect/player/62649/
## 24 Konstantinos Mavropanos https://www.spotrac.com/redirect/player/24730/
## 25 Rob Holding https://www.spotrac.com/redirect/player/22643/
## 26 Emile Smith Rowe https://www.spotrac.com/redirect/player/50184/
## 27 Joe Willock https://www.spotrac.com/redirect/player/24375/
## 28 Dani Ceballos https://www.spotrac.com/redirect/player/32467/
## 29 Bukayo Saka https://www.spotrac.com/redirect/player/48801/
## 30 Martin Ødegaard https://www.spotrac.com/redirect/player/71609/
## Last.Name
## 1 Lacazette
## 2 Aubameyang
## 3 Partey
## 4 Pepe
## 5 da Silva
## 6 Bellerin
## 7 Xhaka
## 8 Leno
## 9 Martinelli
## 10 Tierney
## 11 Kolasinac
## 12 Torreira
## 13 Marí
## 14 Soares
## 15 Luiz
## 16 Magalhães
## 17 Nketiah
## 18 Elneny
## 19 Chambers
## 20 Guendouzi
## 21 Saliba
## 22 Maitland-Niles
## 23 Rúnarsson
## 24 Mavropanos
## 25 Holding
## 26 Smith Rowe
## 27 Willock
## 28 Ceballos
## 29 Saka
## 30 Ødegaard
The data now have some junk alongside workable versions of the variables of interest. It is worth noting that the header of the contracts data allows us to verify the size of the table as we picked it up [though I do rename them to allow the rbind to work]. This also suggests a strategy for picking up the rownames that is different than the above method that uses the dimension of the html table. Perhaps I should just gsub the header to recover the integer number of players. To tidy the data, they need to be stacked. A simple do.call and row bind will probably work.
Team.Base <- sapply(EPL.Contracts, dim)[1,]
Team <- rep(as.character(names(Team.Base)),Team.Base)
EPL.Contracts.df <- do.call("rbind",EPL.Contracts)
rownames(EPL.Contracts.df) <- NULL
EPL.Contracts.df$Team <- Team
EPL.Contracts.df$value <- as.numeric(as.character(EPL.Contracts.df$value))
EPL.Contracts.df %>% group_by(Team) %>% summarise(Team.Mean=mean(value, na.rm=TRUE)/1e3, Team.SD=sd(value, na.rm=TRUE)) -> Team.mean
pp <- Team.mean %>% arrange(Team.Mean)
pp$Team <- factor(pp$Team, levels = pp$Team)
pp %>% ggplot(aes(Team.Mean,Team, size=Team.SD)) + geom_point() + labs(x="Avg. Contract (1000s)") -> cplot
cplot
EPL.Contracts.df %>% group_by(Team) %>% summarise(Age.Mean=mean(Age, na.rm=TRUE), Age.SD=sd(Age, na.rm=TRUE)) -> Team.mean
Team.mean %>% ungroup() %>% arrange(., Age.Mean) -> pp
pp$Team <- factor(pp$Team, levels = pp$Team)
pp %>% ggplot(aes(Age.Mean,Team,size=Age.SD)) + geom_point() + labs(x="Age") -> cplot
cplot