April 8, 2018

Scraping EPL Salary Data

EPL Scraping

In a previous post, I scraped some NFL data and learned the structure of Sportrac. Now, I want to scrape the available data on the EPL. The EPL data is organized in a few distinct but potentially linked tables. The basic structure is organized around team folders. Let me begin by isolating those URLs.

library(rvest)
library(tidyverse)
base_url <- "http://www.spotrac.com/epl/"
read.base <- read_html(base_url)
team.URL <- read.base %>% html_nodes(".team-name") %>% html_attr('href')
team.URL
##  [1] "https://www.spotrac.com/epl/arsenal-fc/"                
##  [2] "https://www.spotrac.com/epl/aston-villa-fc/"            
##  [3] "https://www.spotrac.com/epl/brighton-hove-albion/"      
##  [4] "https://www.spotrac.com/epl/burnley-fc/"                
##  [5] "https://www.spotrac.com/epl/chelsea-fc/"                
##  [6] "https://www.spotrac.com/epl/crystal-palace/"            
##  [7] "https://www.spotrac.com/epl/everton-fc/"                
##  [8] "https://www.spotrac.com/epl/fulham-fc/"                 
##  [9] "https://www.spotrac.com/epl/leeds-united-fc/"           
## [10] "https://www.spotrac.com/epl/leicester-city/"            
## [11] "https://www.spotrac.com/epl/liverpool-fc/"              
## [12] "https://www.spotrac.com/epl/manchester-city-fc/"        
## [13] "https://www.spotrac.com/epl/manchester-united-fc/"      
## [14] "https://www.spotrac.com/epl/newcastle-united-fc/"       
## [15] "https://www.spotrac.com/epl/sheffield-united-fc/"       
## [16] "https://www.spotrac.com/epl/southampton-fc/"            
## [17] "https://www.spotrac.com/epl/tottenham-hotspur-fc/"      
## [18] "https://www.spotrac.com/epl/west-bromwich-albion-fc/"   
## [19] "https://www.spotrac.com/epl/west-ham-united-fc/"        
## [20] "https://www.spotrac.com/epl/wolverhampton-wanderers-fc/"
# Clean up the URLs to get the team names by themselves.
team.names <- gsub(base_url, "", team.URL)
team.names <- gsub("-f.c", " FC", team.names)
team.names <- gsub("afc", "AFC", team.names)
team.names <- gsub("a.f.c", "AFC", team.names)
# Dashes and slashes need to  removed.
team.names <- gsub("-", " ", team.names)
team.names <- gsub("/", "", team.names)
# Fix FC and AFC for Bournemouth
simpleCap <- function(x) {
  s <- strsplit(x, " ")[[1]]
  paste(toupper(substring(s, 1,1)), substring(s, 2), sep="", collapse=" ")
  }
# Capitalise and trim white space
team.names <- sapply(team.names, simpleCap)
#team.names <- sapply(team.names, trimws)
names(team.names) <- NULL
# Now I have a vector of 20 names.
short.names <- gsub(" FC","", team.names)
short.names <- gsub(" AFC","", short.names)
EPL.names <- data.frame(team.names,short.names,team.URL)
EPL.names
##                                            team.names
## 1                  Https:www.spotrac.comeplarsenal Fc
## 2              Https:www.spotrac.comeplaston Villa Fc
## 3        Https:www.spotrac.comeplbrighton Hove Albion
## 4                  Https:www.spotrac.comeplburnley Fc
## 5                  Https:www.spotrac.comeplchelsea Fc
## 6              Https:www.spotrac.comeplcrystal Palace
## 7                  Https:www.spotrac.comepleverton Fc
## 8                   Https:www.spotrac.comeplfulham Fc
## 9             Https:www.spotrac.comeplleeds United Fc
## 10             Https:www.spotrac.comeplleicester City
## 11               Https:www.spotrac.comeplliverpool Fc
## 12         Https:www.spotrac.comeplmanchester City Fc
## 13       Https:www.spotrac.comeplmanchester United Fc
## 14        Https:www.spotrac.comeplnewcastle United Fc
## 15        Https:www.spotrac.comeplsheffield United Fc
## 16             Https:www.spotrac.comeplsouthampton Fc
## 17       Https:www.spotrac.comepltottenham Hotspur Fc
## 18    Https:www.spotrac.comeplwest Bromwich Albion Fc
## 19         Https:www.spotrac.comeplwest Ham United Fc
## 20 Https:www.spotrac.comeplwolverhampton Wanderers Fc
##                                           short.names
## 1                  Https:www.spotrac.comeplarsenal Fc
## 2              Https:www.spotrac.comeplaston Villa Fc
## 3        Https:www.spotrac.comeplbrighton Hove Albion
## 4                  Https:www.spotrac.comeplburnley Fc
## 5                  Https:www.spotrac.comeplchelsea Fc
## 6              Https:www.spotrac.comeplcrystal Palace
## 7                  Https:www.spotrac.comepleverton Fc
## 8                   Https:www.spotrac.comeplfulham Fc
## 9             Https:www.spotrac.comeplleeds United Fc
## 10             Https:www.spotrac.comeplleicester City
## 11               Https:www.spotrac.comeplliverpool Fc
## 12         Https:www.spotrac.comeplmanchester City Fc
## 13       Https:www.spotrac.comeplmanchester United Fc
## 14        Https:www.spotrac.comeplnewcastle United Fc
## 15        Https:www.spotrac.comeplsheffield United Fc
## 16             Https:www.spotrac.comeplsouthampton Fc
## 17       Https:www.spotrac.comepltottenham Hotspur Fc
## 18    Https:www.spotrac.comeplwest Bromwich Albion Fc
## 19         Https:www.spotrac.comeplwest Ham United Fc
## 20 Https:www.spotrac.comeplwolverhampton Wanderers Fc
##                                                   team.URL
## 1                  https://www.spotrac.com/epl/arsenal-fc/
## 2              https://www.spotrac.com/epl/aston-villa-fc/
## 3        https://www.spotrac.com/epl/brighton-hove-albion/
## 4                  https://www.spotrac.com/epl/burnley-fc/
## 5                  https://www.spotrac.com/epl/chelsea-fc/
## 6              https://www.spotrac.com/epl/crystal-palace/
## 7                  https://www.spotrac.com/epl/everton-fc/
## 8                   https://www.spotrac.com/epl/fulham-fc/
## 9             https://www.spotrac.com/epl/leeds-united-fc/
## 10             https://www.spotrac.com/epl/leicester-city/
## 11               https://www.spotrac.com/epl/liverpool-fc/
## 12         https://www.spotrac.com/epl/manchester-city-fc/
## 13       https://www.spotrac.com/epl/manchester-united-fc/
## 14        https://www.spotrac.com/epl/newcastle-united-fc/
## 15        https://www.spotrac.com/epl/sheffield-united-fc/
## 16             https://www.spotrac.com/epl/southampton-fc/
## 17       https://www.spotrac.com/epl/tottenham-hotspur-fc/
## 18    https://www.spotrac.com/epl/west-bromwich-albion-fc/
## 19         https://www.spotrac.com/epl/west-ham-united-fc/
## 20 https://www.spotrac.com/epl/wolverhampton-wanderers-fc/

With clean names, I can take each of the scraping tasks in order.

Payroll Data

The teams have payroll information that is broken down into active players, reserves, and loanees. The workflow is first to create the relevant URLs to scrape the payroll data.

team_links <- paste0(team.URL,"payroll/",sep="")

With URLs, I am going to set forth on the task. First, the SelectorGadget and a glimpse of the documents suggests an easy solution. I want to isolate the table nodes and keep the tables. First, a function for the URLs.

data.creator <- function(link) {
read_html(link) %>% html_nodes("table") %>% html_table(header=TRUE, fill=TRUE)
}

Now I want to apply data scraping function to the URLs. Then, I want to name the list items, assess the size of the active roster, and then clean up the relevant data.

EPL.salary <- sapply(team_links, function(x) {data.creator(x)})
names(EPL.salary) <- EPL.names$short.names
team.len <- sapply(seq(1,20), function(x) { dim(EPL.salary[[x]][[1]])[[1]]})
Team <- rep(EPL.names$short.names, team.len)
Players <- sapply(seq(1,20), function(x) { str_split(EPL.salary[[x]][[1]][,1], "\t", simplify=TRUE)[,31]})
Position <- sapply(seq(1,20), function(x) { EPL.salary[[x]][[1]][,2]})
Base.Salary <- sapply(seq(1,20), function(x) { Res <- gsub("£", "", EPL.salary[[x]][[1]][,3]); gsub(",","",Res)})
EPL.Result <- data.frame(Players=unlist(Players), Team=Team, Position=unlist(Position), Base.Salary=unlist(Base.Salary))
EPL.Result$Base.Salary <- str_replace(EPL.Result$Base.Salary, "-", NA_character_)
EPL.Result$Base.Num <- as.numeric(EPL.Result$Base.Salary)
EPL.Result %>% group_by(Position) %>% summarise(Mean.Base.Salary=mean(Base.Num, na.rm=TRUE),sdBS=sd(Base.Num, na.rm = TRUE))
## # A tibble: 4 x 3
##   Position Mean.Base.Salary  sdBS
## * <chr>               <dbl> <dbl>
## 1 D                    25    5.27
## 2 F                    24.5  4.61
## 3 GK                   28.2  4.68
## 4 M                    24.8  4.95
EPL.Result %>% group_by(Position,Team) %>% summarise(Mean.Base.Salary=mean(Base.Num, na.rm=TRUE),sdBS=sd(Base.Num, na.rm = TRUE))
## # A tibble: 80 x 4
## # Groups:   Position [4]
##    Position Team                                         Mean.Base.Salary  sdBS
##    <chr>    <chr>                                                   <dbl> <dbl>
##  1 D        Https:www.spotrac.comeplarsenal Fc                       24.7  3.77
##  2 D        Https:www.spotrac.comeplaston Villa Fc                   25.6  3.62
##  3 D        Https:www.spotrac.comeplbrighton Hove Albion             23.9  4.22
##  4 D        Https:www.spotrac.comeplburnley Fc                       28.6  3.62
##  5 D        Https:www.spotrac.comeplchelsea Fc                       26.2  4.49
##  6 D        Https:www.spotrac.comeplcrystal Palace                   28    4.81
##  7 D        Https:www.spotrac.comepleverton Fc                       24.7  3.86
##  8 D        Https:www.spotrac.comeplfulham Fc                        25.5  3.57
##  9 D        Https:www.spotrac.comeplleeds United Fc                  19.4 11.6 
## 10 D        Https:www.spotrac.comeplleicester City                   25.7  6.11
## # … with 70 more rows

Finally, a little picture to describe spending on the active roster.

fplot <- ggplot(EPL.Result, aes(Base.Num,Team))
gpl <- fplot + geom_jitter(height=0.25, width=0) + facet_wrap(~Position) + labs(x="Base Salary")
gpl

Contracts

The contracts are stored in a different URL structure that is accessible via contracts in the html tree by tean. Firstm I want to paste the names together with links to explore.

team_links <- paste0(team.URL,"contracts/",sep="")

Now I have all the links that I need and can turn to processing the data. This is something of a mess. Let me first grab some data to showcase the problem. In what follows, first I will grab the HTML files.

Base.Contracts <- lapply(team_links, read_html)

Processing them is a bit more difficult. What does the basic table look like?

Base.Contracts[[1]] %>% html_nodes("table") %>% html_table(header=TRUE, fill=TRUE)
## [[1]]
##                             Player (30) Pos. Age
## 1          LacazetteAlexandre Lacazette    F  29
## 2   AubameyangPierre-Emerick Aubameyang    F  31
## 3                   ParteyThomas Partey    M  27
## 4                      PepeNicolas Pepe    F  25
## 5              da SilvaWillian da Silva    M  32
## 6               BellerinHector Bellerin    D  25
## 7                     XhakaGranit Xhaka    M  28
## 8                        LenoBernd Leno   GK  28
## 9          MartinelliGabriel Martinelli    F  19
## 10                TierneyKieran Tierney    D  23
## 11              KolasinacSead Kolasinac    D  27
## 12               TorreiraLucas Torreira    M  24
## 13                       MaríPablo Marí    D  27
## 14                  SoaresCedric Soares    D  29
## 15                       LuizDavid Luiz    D  33
## 16           MagalhãesGabriel Magalhães    D  23
## 17                NketiahEdward Nketiah    F  21
## 18                 ElnenyMohamed Elneny    M  28
## 19               ChambersCalum Chambers    D  26
## 20            GuendouziMatteo Guendouzi    M  21
## 21                 SalibaWilliam Saliba    D  19
## 22 Maitland-NilesAinsley Maitland-Niles    M  NA
## 23        RúnarssonRúnar Alex Rúnarsson   GK  25
## 24    MavropanosKonstantinos Mavropanos    D  23
## 25                   HoldingRob Holding    D  25
## 26           Smith RoweEmile Smith Rowe    M  20
## 27                   WillockJoe Willock    M  21
## 28                CeballosDani Ceballos    M  24
## 29                      SakaBukayo Saka    F  19
## 30              ØdegaardMartin Ødegaard    M  22
##                                                             Contract Terms
## 1  47333365\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£47,333,365
## 2  39000000\n\t\t\t\t\t\t\t\t\t\t\t3 yr\n\t\t\t\t\t\t\t\t\t\t\t£39,000,000
## 3  39000000\n\t\t\t\t\t\t\t\t\t\t\t3 yr\n\t\t\t\t\t\t\t\t\t\t\t£39,000,000
## 4  36400000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£36,400,000
## 5  30000000\n\t\t\t\t\t\t\t\t\t\t\t3 yr\n\t\t\t\t\t\t\t\t\t\t\t£30,000,000
## 6  28600000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£28,600,000
## 7  26000000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£26,000,000
## 8  26000000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£26,000,000
## 9  23400000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£23,400,000
## 10 21000000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£21,000,000
## 11 20800000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£20,800,000
## 12 19500000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£19,500,000
## 13 17680000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£17,680,000
## 14 15600000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£15,600,000
## 15 13070000\n\t\t\t\t\t\t\t\t\t\t\t2 yr\n\t\t\t\t\t\t\t\t\t\t\t£13,070,000
## 16 13000000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£13,000,000
## 17 11700000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£11,700,000
## 18 10400000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£10,400,000
## 19 10400000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£10,400,000
## 20 10400000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£10,400,000
## 21 10400000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£10,400,000
## 22   9100000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£9,100,000
## 23   8320000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£8,320,000
## 24   7150000\n\t\t\t\t\t\t\t\t\t\t\t6 yr\n\t\t\t\t\t\t\t\t\t\t\t£7,150,000
## 25   6500000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£6,500,000
## 26   5200000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£5,200,000
## 27   4160000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£4,160,000
## 28   2700000\n\t\t\t\t\t\t\t\t\t\t\t1 yr\n\t\t\t\t\t\t\t\t\t\t\t£2,700,000
## 29   2080000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£2,080,000
## 30                  0\n\t\t\t\t\t\t\t\t\t\t\t1 yr\n\t\t\t\t\t\t\t\t\t\t\t-
##    Avg. Salary Transfer Fee Expires
## 1   £9,466,673  £45,050,000    2022
## 2  £13,000,000  £57,380,000    2023
## 3  £13,000,000  £55,000,000    2023
## 4   £7,280,000  £91,200,000    2024
## 5  £10,000,000            -    2023
## 6   £5,720,000            -    2023
## 7   £5,200,000  £36,000,000    2023
## 8   £5,200,000  £19,200,000    2023
## 9   £4,680,000   £7,640,000    2025
## 10  £4,200,000  £30,780,000    2024
## 11  £5,200,000            -    2022
## 12  £3,900,000  £26,000,000    2023
## 13  £4,420,000   £8,800,000    2024
## 14  £3,900,000            -    2024
## 15  £6,535,000   £9,920,000    2021
## 16  £2,600,000  £28,600,000    2025
## 17  £2,340,000            -    2022
## 18  £2,600,000  £10,630,000    2022
## 19  £2,600,000  £17,700,000    2022
## 20  £2,080,000   £7,000,000    2023
## 21  £2,080,000  £34,200,000    2024
## 22  £1,820,000            -    2023
## 23  £2,080,000   £2,200,000    2024
## 24  £1,191,667   £1,890,000    2023
## 25  £1,300,000   £2,600,000    2025
## 26  £1,040,000            -    2023
## 27  £1,040,000            -    2023
## 28  £2,700,000            -    2021
## 29    £520,000            -    2024
## 30           -            -       -
## 
## [[2]]
##   Player (30) Pos. Age                         Contract Terms Avg. Salary
## 1 Mathew Ryan   GK  27 1 yr\n\t\t\t\t\t\t\t\t\t\t\t£1,820,000           -
##   Transfer Fee Expires
## 1            -       -

The names and the contract year and terms are going to require parsing. I have chosen the first html that corresponds to Bournemouth; other teams are worse because loan players are in a second table. That impacts the wage bill, perhaps, depending on the arrangement in the loan, but the contract details from the player do not have that team as signatory. This has to be fixed. That is easy enough to fix, there are two embedded tables and I can select the first one. When it comes to the names, there is no easy separation for the first column; I will grab them from nodes in the html.

data.creator <- function(data) { 
  data %>% html_nodes("table") %>% html_table(header=TRUE, fill=TRUE) -> ret.tab
  nrowsm <- dim(ret.tab[[1]])[[1]]
  split.me <- ret.tab[[1]][,4]
  tempdf <- data.frame(matrix(data=gsub("\t|-","",unlist(strsplit(split.me, "\\n"))), nrow=nrowsm, byrow=TRUE))
  names(tempdf) <- c("value","years","value.pds")
  data %>% html_nodes(".player") %>% html_nodes("a") %>% html_text() -> Player.Names
  Player.Names <- Player.Names[c(1:nrowsm)]
  data %>% html_nodes(".player") %>% html_nodes("a") %>% html_attr("href") -> Player.Links
  Player.links <- Player.Links[c(1:nrowsm)]
  data %>% html_nodes(".player") %>% html_nodes("span") %>% html_text() -> Last.Name
  Last.Name <- Last.Name[c(1:nrowsm)]
  names(ret.tab[1][[1]])[c(1:2)] <- c("Player","Position")
#  data.frame(ret.tab[,c(5,6,7)]) 
  return(data.frame(ret.tab[1][[1]],tempdf,Player.Names,Player.links,Last.Name))
}
EPL.Contracts <- lapply(Base.Contracts, data.creator)
names(EPL.Contracts) <- EPL.names$short.names
EPL.Contracts[[1]]
##                                  Player Position Age
## 1          LacazetteAlexandre Lacazette        F  29
## 2   AubameyangPierre-Emerick Aubameyang        F  31
## 3                   ParteyThomas Partey        M  27
## 4                      PepeNicolas Pepe        F  25
## 5              da SilvaWillian da Silva        M  32
## 6               BellerinHector Bellerin        D  25
## 7                     XhakaGranit Xhaka        M  28
## 8                        LenoBernd Leno       GK  28
## 9          MartinelliGabriel Martinelli        F  19
## 10                TierneyKieran Tierney        D  23
## 11              KolasinacSead Kolasinac        D  27
## 12               TorreiraLucas Torreira        M  24
## 13                       MaríPablo Marí        D  27
## 14                  SoaresCedric Soares        D  29
## 15                       LuizDavid Luiz        D  33
## 16           MagalhãesGabriel Magalhães        D  23
## 17                NketiahEdward Nketiah        F  21
## 18                 ElnenyMohamed Elneny        M  28
## 19               ChambersCalum Chambers        D  26
## 20            GuendouziMatteo Guendouzi        M  21
## 21                 SalibaWilliam Saliba        D  19
## 22 Maitland-NilesAinsley Maitland-Niles        M  NA
## 23        RúnarssonRúnar Alex Rúnarsson       GK  25
## 24    MavropanosKonstantinos Mavropanos        D  23
## 25                   HoldingRob Holding        D  25
## 26           Smith RoweEmile Smith Rowe        M  20
## 27                   WillockJoe Willock        M  21
## 28                CeballosDani Ceballos        M  24
## 29                      SakaBukayo Saka        F  19
## 30              ØdegaardMartin Ødegaard        M  22
##                                                             Contract.Terms
## 1  47333365\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£47,333,365
## 2  39000000\n\t\t\t\t\t\t\t\t\t\t\t3 yr\n\t\t\t\t\t\t\t\t\t\t\t£39,000,000
## 3  39000000\n\t\t\t\t\t\t\t\t\t\t\t3 yr\n\t\t\t\t\t\t\t\t\t\t\t£39,000,000
## 4  36400000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£36,400,000
## 5  30000000\n\t\t\t\t\t\t\t\t\t\t\t3 yr\n\t\t\t\t\t\t\t\t\t\t\t£30,000,000
## 6  28600000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£28,600,000
## 7  26000000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£26,000,000
## 8  26000000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£26,000,000
## 9  23400000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£23,400,000
## 10 21000000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£21,000,000
## 11 20800000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£20,800,000
## 12 19500000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£19,500,000
## 13 17680000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£17,680,000
## 14 15600000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£15,600,000
## 15 13070000\n\t\t\t\t\t\t\t\t\t\t\t2 yr\n\t\t\t\t\t\t\t\t\t\t\t£13,070,000
## 16 13000000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£13,000,000
## 17 11700000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£11,700,000
## 18 10400000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£10,400,000
## 19 10400000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£10,400,000
## 20 10400000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£10,400,000
## 21 10400000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£10,400,000
## 22   9100000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£9,100,000
## 23   8320000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£8,320,000
## 24   7150000\n\t\t\t\t\t\t\t\t\t\t\t6 yr\n\t\t\t\t\t\t\t\t\t\t\t£7,150,000
## 25   6500000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£6,500,000
## 26   5200000\n\t\t\t\t\t\t\t\t\t\t\t5 yr\n\t\t\t\t\t\t\t\t\t\t\t£5,200,000
## 27   4160000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£4,160,000
## 28   2700000\n\t\t\t\t\t\t\t\t\t\t\t1 yr\n\t\t\t\t\t\t\t\t\t\t\t£2,700,000
## 29   2080000\n\t\t\t\t\t\t\t\t\t\t\t4 yr\n\t\t\t\t\t\t\t\t\t\t\t£2,080,000
## 30                  0\n\t\t\t\t\t\t\t\t\t\t\t1 yr\n\t\t\t\t\t\t\t\t\t\t\t-
##    Avg..Salary Transfer.Fee Expires    value years   value.pds
## 1   £9,466,673  £45,050,000    2022 47333365  5 yr £47,333,365
## 2  £13,000,000  £57,380,000    2023 39000000  3 yr £39,000,000
## 3  £13,000,000  £55,000,000    2023 39000000  3 yr £39,000,000
## 4   £7,280,000  £91,200,000    2024 36400000  5 yr £36,400,000
## 5  £10,000,000            -    2023 30000000  3 yr £30,000,000
## 6   £5,720,000            -    2023 28600000  5 yr £28,600,000
## 7   £5,200,000  £36,000,000    2023 26000000  5 yr £26,000,000
## 8   £5,200,000  £19,200,000    2023 26000000  5 yr £26,000,000
## 9   £4,680,000   £7,640,000    2025 23400000  5 yr £23,400,000
## 10  £4,200,000  £30,780,000    2024 21000000  5 yr £21,000,000
## 11  £5,200,000            -    2022 20800000  4 yr £20,800,000
## 12  £3,900,000  £26,000,000    2023 19500000  5 yr £19,500,000
## 13  £4,420,000   £8,800,000    2024 17680000  4 yr £17,680,000
## 14  £3,900,000            -    2024 15600000  4 yr £15,600,000
## 15  £6,535,000   £9,920,000    2021 13070000  2 yr £13,070,000
## 16  £2,600,000  £28,600,000    2025 13000000  5 yr £13,000,000
## 17  £2,340,000            -    2022 11700000  5 yr £11,700,000
## 18  £2,600,000  £10,630,000    2022 10400000  4 yr £10,400,000
## 19  £2,600,000  £17,700,000    2022 10400000  4 yr £10,400,000
## 20  £2,080,000   £7,000,000    2023 10400000  5 yr £10,400,000
## 21  £2,080,000  £34,200,000    2024 10400000  5 yr £10,400,000
## 22  £1,820,000            -    2023  9100000  5 yr  £9,100,000
## 23  £2,080,000   £2,200,000    2024  8320000  4 yr  £8,320,000
## 24  £1,191,667   £1,890,000    2023  7150000  6 yr  £7,150,000
## 25  £1,300,000   £2,600,000    2025  6500000  5 yr  £6,500,000
## 26  £1,040,000            -    2023  5200000  5 yr  £5,200,000
## 27  £1,040,000            -    2023  4160000  4 yr  £4,160,000
## 28  £2,700,000            -    2021  2700000  1 yr  £2,700,000
## 29    £520,000            -    2024  2080000  4 yr  £2,080,000
## 30           -            -       -        0  1 yr            
##                 Player.Names                                   Player.links
## 1        Alexandre Lacazette https://www.spotrac.com/redirect/player/24059/
## 2  Pierre-Emerick Aubameyang https://www.spotrac.com/redirect/player/24963/
## 3              Thomas Partey https://www.spotrac.com/redirect/player/62853/
## 4               Nicolas Pepe https://www.spotrac.com/redirect/player/32697/
## 5           Willian da Silva https://www.spotrac.com/redirect/player/22635/
## 6            Hector Bellerin https://www.spotrac.com/redirect/player/11978/
## 7               Granit Xhaka https://www.spotrac.com/redirect/player/22653/
## 8                 Bernd Leno https://www.spotrac.com/redirect/player/26727/
## 9         Gabriel Martinelli https://www.spotrac.com/redirect/player/32041/
## 10            Kieran Tierney https://www.spotrac.com/redirect/player/32750/
## 11            Sead Kolasinac https://www.spotrac.com/redirect/player/23720/
## 12            Lucas Torreira https://www.spotrac.com/redirect/player/27685/
## 13                Pablo Marí https://www.spotrac.com/redirect/player/48798/
## 14             Cedric Soares https://www.spotrac.com/redirect/player/22941/
## 15                David Luiz https://www.spotrac.com/redirect/player/22629/
## 16         Gabriel Magalhães https://www.spotrac.com/redirect/player/50105/
## 17            Edward Nketiah https://www.spotrac.com/redirect/player/32751/
## 18            Mohamed Elneny https://www.spotrac.com/redirect/player/22656/
## 19            Calum Chambers https://www.spotrac.com/redirect/player/24015/
## 20          Matteo Guendouzi https://www.spotrac.com/redirect/player/27686/
## 21            William Saliba https://www.spotrac.com/redirect/player/32466/
## 22    Ainsley Maitland-Niles https://www.spotrac.com/redirect/player/24373/
## 23      Rúnar Alex Rúnarsson https://www.spotrac.com/redirect/player/62649/
## 24   Konstantinos Mavropanos https://www.spotrac.com/redirect/player/24730/
## 25               Rob Holding https://www.spotrac.com/redirect/player/22643/
## 26          Emile Smith Rowe https://www.spotrac.com/redirect/player/50184/
## 27               Joe Willock https://www.spotrac.com/redirect/player/24375/
## 28             Dani Ceballos https://www.spotrac.com/redirect/player/32467/
## 29               Bukayo Saka https://www.spotrac.com/redirect/player/48801/
## 30           Martin Ødegaard https://www.spotrac.com/redirect/player/71609/
##         Last.Name
## 1       Lacazette
## 2      Aubameyang
## 3          Partey
## 4            Pepe
## 5        da Silva
## 6        Bellerin
## 7           Xhaka
## 8            Leno
## 9      Martinelli
## 10        Tierney
## 11      Kolasinac
## 12       Torreira
## 13           Marí
## 14         Soares
## 15           Luiz
## 16      Magalhães
## 17        Nketiah
## 18         Elneny
## 19       Chambers
## 20      Guendouzi
## 21         Saliba
## 22 Maitland-Niles
## 23      Rúnarsson
## 24     Mavropanos
## 25        Holding
## 26     Smith Rowe
## 27        Willock
## 28       Ceballos
## 29           Saka
## 30       Ødegaard

The data now have some junk alongside workable versions of the variables of interest. It is worth noting that the header of the contracts data allows us to verify the size of the table as we picked it up [though I do rename them to allow the rbind to work]. This also suggests a strategy for picking up the rownames that is different than the above method that uses the dimension of the html table. Perhaps I should just gsub the header to recover the integer number of players. To tidy the data, they need to be stacked. A simple do.call and row bind will probably work.

Team.Base <- sapply(EPL.Contracts, dim)[1,]
Team <- rep(as.character(names(Team.Base)),Team.Base)
EPL.Contracts.df <- do.call("rbind",EPL.Contracts)
rownames(EPL.Contracts.df) <- NULL
EPL.Contracts.df$Team <- Team
EPL.Contracts.df$value <- as.numeric(as.character(EPL.Contracts.df$value))
EPL.Contracts.df %>% group_by(Team) %>% summarise(Team.Mean=mean(value, na.rm=TRUE)/1e3, Team.SD=sd(value, na.rm=TRUE)) -> Team.mean
pp <- Team.mean %>% arrange(Team.Mean)
pp$Team <- factor(pp$Team, levels = pp$Team)
pp %>% ggplot(aes(Team.Mean,Team, size=Team.SD)) + geom_point() + labs(x="Avg. Contract (1000s)") -> cplot
cplot

EPL.Contracts.df %>% group_by(Team) %>% summarise(Age.Mean=mean(Age, na.rm=TRUE), Age.SD=sd(Age, na.rm=TRUE)) -> Team.mean
Team.mean %>% ungroup() %>% arrange(., Age.Mean) -> pp
pp$Team <- factor(pp$Team, levels = pp$Team)
pp %>% ggplot(aes(Age.Mean,Team,size=Age.SD)) + geom_point() + labs(x="Age") -> cplot
cplot