tabulizer Rocks
Voter Turnout in Oregon
Oregon’s voter turnout data is published by the Oregon Secretary of State’s office. You can find a direct link to the .pdf here. How hard is to recover a .pdf table? Let’s see. I am going to work with tabulizer
.
library(tabulizer)
library(dplyr)
The key function for this will be extract_tables
; with knowledge of that let’s see if it just automagically works.
library(kableExtra)
location <- 'https://sos.oregon.gov/elections/Documents/Voter_Turnout_History_General_Election.pdf'
out <- extract_tables(location)
head(out) %>% kable() %>% scroll_box(height="300px")
|
So far so good. Now a bit of wrangling in two steps. First, I need to get rid of the first row. Second, I need to get rid of the percent signs and commas.
library(stringr)
Cleaned <- out %>% data.frame() %>%
filter(X1!="Year") %>%
transmute(year = as.numeric(X1),
RegVoters = as.numeric(str_remove_all(X2, ",")),
Votes = as.numeric(str_remove_all(X3, ",")),
Vote.Percent = as.numeric(str_remove(X4, "%")))
Cleaned %>% head() %>% kable() %>% scroll_box(height="300px")
year | RegVoters | Votes | Vote.Percent |
---|---|---|---|
2020 | 2951428 | 2317965 | 78.50 |
2018 | 2763105 | 1873891 | 67.80 |
2016 | 2553808 | 2051452 | 80.33 |
2014 | 2174763 | 1541782 | 70.90 |
2012 | 2199360 | 1820507 | 82.80 |
2010 | 2068798 | 1487210 | 71.89 |
One more relevant feature is that midterms and presidential years are a bit different so let me denote this with an indicator. The method I will use is does the integer division of year minus 1960 divided by four have no remainder [TRUE] or have a remainder [FALSE].
library(fpp3); library(magrittr); library(hrbrthemes)
Cleaned %<>% mutate(President = as.factor(((year - 1960) %% 4) == 0))
Now I have exactly the dataset that I want. What does the plot look like?
Cleaned %>% ggplot(.) + aes(x=year, y=Vote.Percent, color=President, group=President) + geom_point() + geom_line() + scale_color_ipsum() + labs(title="Voter Turnout in Oregon since 1960", x="Year", y="Turnout (%)", color="Presidential Election?") + theme_ipsum_rc()