Visualising COVID-19 in Oregon
Oregon COVID data
I now have a few days of data. These data are current as of March 24, 2020. I will present the first version of these visualizations here and then move the auto-update to a different location. A messy first version of the scraping exercise is at the bottom of this post.
paste0("https://github.com/robertwwalker/rww-science/raw/master/content/R/COVID/data/OregonCOVID",Sys.Date(),".RData")
## [1] "https://github.com/robertwwalker/rww-science/raw/master/content/R/COVID/data/OregonCOVID2020-03-24.RData"
load(url(paste0("https://github.com/robertwwalker/rww-science/raw/master/content/R/COVID/data/OregonCOVID",Sys.Date(),".RData")))
A base map
Load the tigris
library then grab the map as an sf
object; there is a geom_sf
that makes them easy to work with. Finally, I join the map to the data.
library(tigris); library(rgdal); library(htmltools); library(viridis); library(sf)
counties.t <- counties(state = "41", resolution = "500k", class="sf")
Map.Me <- left_join(counties.t,Oregon.COVID, by=c("NAME" = "County"))
The Whole Thing: An Animation
I will use a ggplot to build Oregon’s map and fill it with case numbers. I have to repel the labels to prevent print overlap [using ggrepel
], and scheme the colors with viridis. The last step is to animate it by time.
library(ggrepel); library(ggthemes); library(gganimate)
Anim1 <- Map.Me %>%
ggplot(., aes(geometry=geometry, fill=Number.of.cases, label=NAME, group=NAME)) +
geom_sf() +
geom_label_repel(stat = "sf_coordinates",
min.segment.length = 0,
colour = "white",
segment.colour = "white",
size = 3,
box.padding = unit(0.05, "lines")) + scale_fill_continuous_tableau("Red") + theme_minimal() + labs(title="COVID-19 in Oregon", subtitle="{frame_time}", x="", y="", caption="Made with R, ggplot2, and ggrepel by @PieRatio \n data: https://govstatus.egov.com/OR-OHA-COVID-19") + transition_time(date)
ResA <- animate(Anim1, end_pause=100, nframes=200)
ResA
Some Data on Testing
OR.Testing <- Oregon.Tests %>% group_by(date) %>% summarise(Total = sum(Outcome))
Test1 <- Oregon.Tests %>% ggplot(.) + aes(x=date, y=Outcome, fill=Category, label=Outcome) + geom_col() + geom_label(col="magenta") + scale_fill_viridis_d() + labs(title="COVID-19 Testing in Oregon", y="Tests", subtitle="Total shown as label", caption = "data: https://govstatus.egov.com/OR-OHA-COVID-19") + geom_label(data=OR.Testing, aes(x=date, y=Total+100, label=Total), fill="white", color="black", inherit.aes = FALSE)
Test1
Hospitalization Data
Hos1 <- OR.Hosp %>% filter(Hospitalized.!="Total") %>% ggplot(., aes(x=date, y=Number.of.cases,fill=Hospitalized.)) + geom_col() + scale_fill_viridis_d() + labs(x="Date", y="Number of COVID Positives", fill="Hospitalized?", title="Hospitalization Status of COVID-19 Positives in Oregon")
Hos1
Age Data
Age1 <- OR.Ages %>% ggplot(., aes(x=date, y=Number.of.cases, fill=Age.group)) + geom_col() + scale_fill_viridis_d() + labs(x="Date", y="Number of COVID-19 Positives")
Age1
A Mapping Function
To save time later, I turned the plots into a function so that I can use to reproduce each frame. Two inputs to the function, the dataset that is always the merged data from above and the date that I want to plot located in date1
in a YYYY-MM-DD
format.
Plot.COVID <- function(date1, data) {
data %>% filter(date==date1) %>%
ggplot(., aes(geometry=geometry, fill=Number.of.cases, label=NAME)) +
geom_sf() +
geom_label_repel(stat = "sf_coordinates",
min.segment.length = 0,
colour = "white",
segment.colour = "white",
size = 3,
box.padding = unit(0.05, "lines")) + scale_fill_continuous_tableau("Red") +
theme_minimal() +
labs(title="COVID-19 in Oregon",
subtitle=date1, x="", y="",
caption="Made with R, ggplot2, and ggrepel by @PieRatio \n data: https://govstatus.egov.com/OR-OHA-COVID-19")
}
The Result for 03/18/2020
This is the first date for which I can find data. The waybackmachine was crucial.
Plot.COVID("2020-03-18", Map.Me)
The Result for 03/19/2020
Plot.COVID("2020-03-19", Map.Me)
The Result for 03/20/2020
Plot.COVID("2020-03-20", Map.Me)
The Result for 03/21/2020
Plot.COVID("2020-03-21", Map.Me)
The Result for 03/22/2020
Plot.COVID("2020-03-22", Map.Me)
The Result for 03/23/2020
Plot.COVID("2020-03-23", Map.Me)
The Result for 03/24/2020
Plot.COVID("2020-03-24", Map.Me)
A Scraper
I have written some code here to grab the updates as they post them and add them to the dataset. Now I need to set a cron job to evaluate this script daily. It is also worth nothing that the script has required changing over time. Because I am particularly interested in the county map here, I will only remove a single table.
library(rvest)
load(paste0("OregonCOVID",Sys.Date()-1,".RData"))
webpage <- read_html("https://govstatus.egov.com/OR-OHA-COVID-19")
tbls <- html_nodes(webpage, "table")
tbls
COVID.New <- webpage %>%
html_nodes("table") %>%
.[2] %>%
html_table(fill = TRUE) %>% data.frame()
# Oregon.Total <- COVID.New %>% mutate(date=as.Date(Sys.Date())) %>% filter(County=="Total") %>% bind_rows(.,Oregon.Total)
Oregon.COVID <- COVID.New %>% mutate(date=as.Date(Sys.Date())) %>% filter(County!="Total") %>% bind_rows(.,Oregon.COVID)
# save(Oregon.COVID, file=paste0("OregonCOVID",Sys.Date(),".RData"))
Adding Historical Data Together
comma.rm.to.numeric <- function(variable) {as.numeric(str_remove_all( {{variable}}, ",")) }
# A function to parse the tables
OHA.Corona <- function(website, date) {
webpage <- read_html(website)
COVID.Head <- webpage %>%
html_nodes("table") %>%
.[1] %>%
html_table(fill = TRUE) %>% data.frame() # %>% filter(!startsWith(.[[1]],"Total"))
Scraped.date <- names(COVID.Head)[1] %>% str_remove(.,"X.Oregon.Test.Results.as.of.") %>% str_remove(., "..8.00.a.m..Updated.daily.")
names(COVID.Head) <- c("Category","Outcome")
COVID.Head <- COVID.Head %>% mutate(Outcome = comma.rm.to.numeric(Outcome), date=as.Date(date), Scraped.date = Scraped.date) # %>% filter(!startsWith(.[[1]],"Total"))
COVID.County <- webpage %>%
html_nodes("table") %>%
.[2] %>%
html_table(fill = TRUE) %>%
data.frame() %>%
mutate(date=as.Date(date),
Scraped.date = Scraped.date,
Negative.test.results = comma.rm.to.numeric(Negative.test.results))
COVID.Age <- webpage %>%
html_nodes("table") %>%
.[3] %>%
html_table(fill = TRUE) %>% data.frame() %>%
mutate(date=as.Date(date), Scraped.date = Scraped.date)
COVID.Hospitalized <- webpage %>%
html_nodes("table") %>%
.[4] %>%
html_table(fill = TRUE) %>%
data.frame() %>%
mutate(date=as.Date(date), Scraped.date = Scraped.date)
return(list(Header=COVID.Head, Counties = COVID.County, Ages = COVID.Age, Hospitalized = COVID.Hospitalized))
}
OHA.Corona.3 <- function(website, date) {
webpage <- read_html(website)
COVID.Head <- webpage %>%
html_nodes("table") %>%
.[1] %>%
html_table(fill = TRUE) %>% data.frame() %>% filter(!startsWith(.[[1]],"Total"))
Scraped.date <- names(COVID.Head)[1] %>% str_remove(.,"X.Oregon.Test.Results.as.of.") %>% str_remove(., "..8.00.a.m..Updated.daily.")
names(COVID.Head) <- c("Category","Outcome")
COVID.Head <- COVID.Head %>% mutate(Outcome = as.numeric(str_remove(Outcome,",")))
COVID.Head <- COVID.Head %>% mutate(date=as.Date(date), Scraped.date = Scraped.date)
COVID.County <- webpage %>%
html_nodes("table") %>%
.[2] %>%
html_table(fill = TRUE) %>% data.frame() %>% mutate(date=as.Date(date), Scraped.date = Scraped.date, Negative.test.results = as.numeric(str_remove_all(Negative.test.results, ",")))
COVID.Age <- webpage %>%
html_nodes("table") %>%
.[3] %>%
html_table(fill = TRUE) %>% data.frame() %>% mutate(date=as.Date(date), Scraped.date = Scraped.date)
return(list(Header=COVID.Head, Counties = COVID.County, Ages = COVID.Age))
}
OHA.Corona.2 <- function(website, date) {
webpage <- read_html(website)
COVID.Head <- webpage %>%
html_nodes("table") %>%
.[1] %>%
html_table(fill = TRUE) %>%
data.frame()
Scraped.date <- names(COVID.Head)[1] %>% str_remove(.,"X.Oregon.Test.Results.as.of.") %>% str_remove(., "..8.00.a.m..Updated.daily.")
names(COVID.Head) <- c("Category","Outcome")
COVID.Head <- COVID.Head %>% mutate(Outcome = as.numeric(str_remove(Outcome,",")))
COVID.Head <- COVID.Head %>% mutate(date=as.Date(date), Scraped.date = Scraped.date)
COVID.County <- webpage %>%
html_nodes("table") %>%
.[2] %>%
html_table(fill = TRUE) %>% data.frame() %>% mutate(date=as.Date(date), Scraped.date = Scraped.date)
COVID.Age <- webpage %>%
html_nodes("table") %>%
.[3] %>%
html_table(fill = TRUE) %>% data.frame() %>% filter(!startsWith(.[[1]],"Total")) %>% mutate(date=as.Date(date), Scraped.date = Scraped.date)
return(list(Header=COVID.Head, Counties = COVID.County, Ages = COVID.Age))
}
# March 18 Update
Mar18 <- OHA.Corona.2(website="https://web.archive.org/web/20200319144434/https://govstatus.egov.com/OR-OHA-COVID-19", date="2020-03-18")
# March 19 Update
Mar19 <- OHA.Corona.2(website="https://web.archive.org/web/20200320152224/https://govstatus.egov.com/OR-OHA-COVID-19", date="2020-03-19")
# March 20 Update
Mar20 <- OHA.Corona.2(website="https://web.archive.org/web/20200320202955/https://govstatus.egov.com/OR-OHA-COVID-19", date="2020-03-20")
# March 21 Update
Mar21 <- OHA.Corona.3(website="https://web.archive.org/web/20200321211741/https://govstatus.egov.com/OR-OHA-COVID-19", date="2020-03-21")
# March 22 Update
Mar22 <- OHA.Corona.3("https://web.archive.org/web/20200322205228/https://govstatus.egov.com/OR-OHA-COVID-19", date="2020-03-22")
# March 23 Update
Mar23 <- OHA.Corona(website="https://web.archive.org/web/20200323192410/https://govstatus.egov.com/OR-OHA-COVID-19", date="2020-03-23")
# March 24 Update
Mar24 <- OHA.Corona(website="https://govstatus.egov.com/OR-OHA-COVID-19", date=as.character(Sys.Date()))
Oregon.Tests.All <- bind_rows(Mar24$Header,Mar23$Header,Mar22$Header,Mar21$Header,Mar20$Header,Mar19$Header,Mar18$Header)
Oregon.COVID.All <- bind_rows(Mar24$Counties,Mar23$Counties,Mar22$Counties,Mar21$Counties,Mar20$Counties,Mar19$Counties,Mar18$Counties)
Oregon.COVID.Total <- Oregon.COVID.All %>% filter(County=="Total")
Oregon.COVID <- Oregon.COVID.All %>% filter(County!="Total")
Oregon.Tests <- Oregon.Tests.All[!str_detect(Oregon.Tests.All$Category, "Total"),]
# save.image("~/Sandbox/awful/content/post/2020-03-22-covid-19-county-maps-for-oregon/data/OregonCOVID2020-03-24.RData")