Socrata is amazingly handy for open data
The Socrata package makes it easy to access API calls built around SODA for open data access. If you try to skip the Socrata part, you usually only get a fraction of the available data. Socrata is intended to make open access data easier to manage and many government entities in the US use it as the portal to public data access. The R package makes interfacing with it much easier. First, how can we install it? It is on CRAN.
install.packages("RSocrata")
library(RSocrata)
SchoolSpend <- read.socrata("https://data.oregon.gov/resource/c7av-ntdz.csv")
The first bit of data that I found details various bits about spending and students in Oregon school districts. I want to look at a few basics of this. There is a lot more to plot but this is enough for now.
The Data
I found this on Oregon’s open data portal. What do I have?
library(skimr)
skim(SchoolSpend) %>% kable() %>% scroll_box(width="100%")
skim_type | skim_variable | n_missing | complete_rate | character.min | character.max | character.empty | character.n_unique | character.whitespace | numeric.mean | numeric.sd | numeric.p0 | numeric.p25 | numeric.p50 | numeric.p75 | numeric.p100 | numeric.hist |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
character | county_name | 0 | 1 | 4 | 10 | 0 | 36 | 0 | NA | NA | NA | NA | NA | NA | NA | NA |
character | district_number | 0 | 1 | 9 | 33 | 0 | 375 | 0 | NA | NA | NA | NA | NA | NA | NA | NA |
character | school_year | 0 | 1 | 10 | 10 | 0 | 13 | 0 | NA | NA | NA | NA | NA | NA | NA | NA |
numeric | district_id | 0 | 1 | NA | NA | NA | NA | NA | 2094.940 | 189.4799 | 1894.0 | 1999.00 | 2085.00 | 2190.00 | 4131.00 | ▇▁▁▁▁ |
numeric | operating_cost_per_student | 0 | 1 | NA | NA | NA | NA | NA | 10337.758 | 7762.8027 | 0.0 | 7252.19 | 8596.27 | 10797.39 | 170210.60 | ▇▁▁▁▁ |
numeric | student_count | 0 | 1 | NA | NA | NA | NA | NA | 2900.144 | 6105.8332 | 0.0 | 234.00 | 872.00 | 2934.00 | 55321.00 | ▇▁▁▁▁ |
numeric | state_student_count | 0 | 1 | NA | NA | NA | NA | NA | 552475.185 | 5910.9535 | 539105.0 | 549169.00 | 552161.00 | 558366.00 | 561354.00 | ▂▂▇▃▆ |
numeric | operating_cost | 0 | 1 | NA | NA | NA | NA | NA | 22756620.867 | 49527514.3895 | 9881.0 | 2458166.78 | 7089257.97 | 22617257.11 | 513891919.14 | ▇▁▁▁▁ |
numeric | state_operating_cost | 0 | 1 | NA | NA | NA | NA | NA | 4337006498.569 | 697307559.8432 | 948447366.8 | 3889552066.13 | 4198534676.94 | 5144636555.37 | 5248233458.10 | ▁▁▁▇▆ |
numeric | state_operating_cost_per_student | 0 | 1 | NA | NA | NA | NA | NA | 7839.076 | 1195.0422 | 1759.3 | 7044.24 | 7636.57 | 9164.69 | 9384.06 | ▁▁▁▇▆ |
How many school districts per county?
library(magrittr); library(hrbrthemes)
SchoolSpend %>% group_by(county_name, school_year) %>% tally() %>% mutate(school_year = as.Date(school_year, format = "%m/%d/%Y")) %>% filter(school_year == max(school_year)) %>% ggplot() + aes(x=fct_reorder(county_name, n), y=n, fill=county_name) + geom_col() + coord_flip() + guides(fill=FALSE) + labs(x= "County", y="Number of School Districts") + theme_minimal()
By Students?
SchoolSpend %>% group_by(county_name) %>% mutate(school_year = as.Date(school_year, format = "%m/%d/%Y")) %>% filter(school_year == max(school_year)) %>% summarise(Students = sum(student_count), Year = mean(school_year), County = as.factor(county_name)) %>% unique() -> Dat
ggplot(Dat) + aes(x=fct_reorder(County, -Students), y=Students, fill=county_name) + geom_col() + coord_flip() + guides(fill=FALSE) + labs(x= "County", y="Students") + theme_minimal()
There are a number of other bits of data organized by year and district. There is certainly more to examine, but then I found this.
Voter Registration Data
The database of Voter Registrations in Oregon is also available and easily accessible.
VoterReg <- read.socrata("https://data.oregon.gov/resource/6a4f-ecbi.csv")
VoterReg %>% filter(sysdate == "2020-11-03") %>% group_by(county) %>% summarise(Voters = sum(count_v_id)) %>% ggplot(., aes(x=fct_reorder(county, Voters), y=Voters, label=Voters)) + geom_col(fill="white", color="skyblue") + geom_text(size=2.2) + coord_flip() + labs(x="County", y="Registered Voters") + theme_minimal() -> Plot1
Plot1
library(plotly); library(widgetframe)
ggp1 <- ggplotly(Plot1)
frameWidget(ggp1)
The Balance of Registrations
CurrVR <- VoterReg %>% filter(sysdate == "2020-11-03")
CurrVR$DRE <- "Other"
CurrVR$DRE[CurrVR$party=="Democrat"] <- "Democrat"
CurrVR$DRE[CurrVR$party=="Republican"] <- "Republican"
CurrVR %>% group_by(county) %>% mutate(Voters = sum(count_v_id)) %>% ggplot(., aes(x=fct_reorder(county, Voters), y=Voters, label=Voters)) + geom_col() + geom_text(size=2.2) + coord_flip() + labs(x="County", y="Registered Voters") + theme_minimal()
The Plot by Party
Now let me split these up by grouping and plot them.
CurrVR %>% group_by(county, DRE) %>% summarise(Voters = sum(count_v_id)) %>%
ggplot(.) +
aes(x = fct_reorder(county, Voters), y=Voters, fill = DRE) +
geom_col() + scale_fill_viridis_d() +
coord_flip() +
theme_minimal() + labs(x="County")