November 25, 2020

Socrata is amazingly handy for open data

The Socrata package makes it easy to access API calls built around SODA for open data access. If you try to skip the Socrata part, you usually only get a fraction of the available data. Socrata is intended to make open access data easier to manage and many government entities in the US use it as the portal to public data access. The R package makes interfacing with it much easier. First, how can we install it? It is on CRAN.

install.packages("RSocrata")
library(RSocrata)
SchoolSpend <- read.socrata("https://data.oregon.gov/resource/c7av-ntdz.csv")

The first bit of data that I found details various bits about spending and students in Oregon school districts. I want to look at a few basics of this. There is a lot more to plot but this is enough for now.

The Data

I found this on Oregon’s open data portal. What do I have?

library(skimr)
skim(SchoolSpend) %>% kable() %>% scroll_box(width="100%")
skim_type skim_variable n_missing complete_rate character.min character.max character.empty character.n_unique character.whitespace numeric.mean numeric.sd numeric.p0 numeric.p25 numeric.p50 numeric.p75 numeric.p100 numeric.hist
character county_name 0 1 4 10 0 36 0 NA NA NA NA NA NA NA NA
character district_number 0 1 9 33 0 375 0 NA NA NA NA NA NA NA NA
character school_year 0 1 10 10 0 13 0 NA NA NA NA NA NA NA NA
numeric district_id 0 1 NA NA NA NA NA 2094.940 189.4799 1894.0 1999.00 2085.00 2190.00 4131.00 ▇▁▁▁▁
numeric operating_cost_per_student 0 1 NA NA NA NA NA 10337.758 7762.8027 0.0 7252.19 8596.27 10797.39 170210.60 ▇▁▁▁▁
numeric student_count 0 1 NA NA NA NA NA 2900.144 6105.8332 0.0 234.00 872.00 2934.00 55321.00 ▇▁▁▁▁
numeric state_student_count 0 1 NA NA NA NA NA 552475.185 5910.9535 539105.0 549169.00 552161.00 558366.00 561354.00 ▂▂▇▃▆
numeric operating_cost 0 1 NA NA NA NA NA 22756620.867 49527514.3895 9881.0 2458166.78 7089257.97 22617257.11 513891919.14 ▇▁▁▁▁
numeric state_operating_cost 0 1 NA NA NA NA NA 4337006498.569 697307559.8432 948447366.8 3889552066.13 4198534676.94 5144636555.37 5248233458.10 ▁▁▁▇▆
numeric state_operating_cost_per_student 0 1 NA NA NA NA NA 7839.076 1195.0422 1759.3 7044.24 7636.57 9164.69 9384.06 ▁▁▁▇▆

How many school districts per county?

library(magrittr); library(hrbrthemes)
SchoolSpend %>% group_by(county_name, school_year) %>% tally() %>% mutate(school_year = as.Date(school_year, format = "%m/%d/%Y")) %>% filter(school_year == max(school_year)) %>% ggplot() + aes(x=fct_reorder(county_name, n), y=n, fill=county_name) + geom_col() + coord_flip() + guides(fill=FALSE) + labs(x= "County", y="Number of School Districts") + theme_minimal()

By Students?

SchoolSpend %>% group_by(county_name) %>% mutate(school_year = as.Date(school_year, format = "%m/%d/%Y")) %>% filter(school_year == max(school_year)) %>% summarise(Students = sum(student_count), Year = mean(school_year), County = as.factor(county_name)) %>% unique() -> Dat
ggplot(Dat) + aes(x=fct_reorder(County, -Students), y=Students, fill=county_name) + geom_col() + coord_flip() + guides(fill=FALSE) + labs(x= "County", y="Students") + theme_minimal()

There are a number of other bits of data organized by year and district. There is certainly more to examine, but then I found this.

Voter Registration Data

The database of Voter Registrations in Oregon is also available and easily accessible.

VoterReg <- read.socrata("https://data.oregon.gov/resource/6a4f-ecbi.csv")
VoterReg %>% filter(sysdate == "2020-11-03") %>% group_by(county) %>% summarise(Voters = sum(count_v_id)) %>% ggplot(., aes(x=fct_reorder(county, Voters), y=Voters, label=Voters)) + geom_col(fill="white", color="skyblue") + geom_text(size=2.2) + coord_flip() + labs(x="County", y="Registered Voters") + theme_minimal() -> Plot1
Plot1

library(plotly); library(widgetframe)
ggp1 <- ggplotly(Plot1)
frameWidget(ggp1)

The Balance of Registrations

CurrVR <- VoterReg %>% filter(sysdate == "2020-11-03")
CurrVR$DRE <- "Other"
CurrVR$DRE[CurrVR$party=="Democrat"] <- "Democrat"
CurrVR$DRE[CurrVR$party=="Republican"] <- "Republican"
CurrVR %>% group_by(county) %>% mutate(Voters = sum(count_v_id)) %>% ggplot(., aes(x=fct_reorder(county, Voters), y=Voters, label=Voters)) + geom_col() + geom_text(size=2.2) + coord_flip() + labs(x="County", y="Registered Voters") + theme_minimal()

The Plot by Party

Now let me split these up by grouping and plot them.

CurrVR %>% group_by(county, DRE) %>% summarise(Voters = sum(count_v_id)) %>%
ggplot(.) +
 aes(x = fct_reorder(county, Voters), y=Voters, fill = DRE) +
 geom_col() + scale_fill_viridis_d() +
 coord_flip() +
 theme_minimal() + labs(x="County")