November 9, 2022
Last update: November 15. 2022
In preparation for the dumpster fire that is Oregon election reporting, I previously posed on importing a directory of .csv files. At present, that is what I can find to build this. What does the interface look like?
library(magick) Img <- image_read("./img/SShot.png") image_ggplot(Img) This is terrible, there is a javascript button to download each separately. Nevertheless, here we go.
Spending on Kids
The Urban Institute has a collection of data on government spending on children. The tidyTuesday page links to some of their commentary and to an article from Governing on the subject. The data are rich and interesting and are conveniently packaged into the tidykids package for R. My goal is to combine geofacets with animation to produce an animation of education spending over time by US states and territories.
This week’s tidyTuesday contains data on cocktails with data from cocktail recipes drawn from two sources. Because one of the datasets comes from Mr. Boston, it is not exactly neutral with respect to alcohols and I am not a particular fan of gin. That said, the data should provide an interesting playground for looking at some frequencies and learning some things about cocktail recipes and ingredients. With that in mind, let turn to the data.
Socrata: The Open Data Portal
I did not previously know much about precisely how open data portals had evolved. Oregon’s is quite nice and I will take the opportunity to map and summarise non-profits throughout the state. I have posted elsewhere about other aspects of Socrata; it is a very neat tool for accessing open data portals. The non-profit data is not extraordinarily rich though there is quite a bit that can be extracted.
R Markdown
I love this intro photo from the tidyTuesday page.
This week’s tidyTuesday data cover violations of the GDPR (General Data Protection Regulations) regulatory regime for data privacy in the European Union. The Wikipedia entry on GDPR is fairly extensive. The dataset is large and suggests some interesting regulatory arbitrage. Let’s have a look at the data. First, let’s load them.
gdpr_violations <- readr::read_tsv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-04-21/gdpr_violations.tsv')
## ## ── Column specification ────────────────────────────────────────────────────────
## cols(
## id = col_double(),
## picture = col_character(),
## name = col_character(),
## price = col_double(),
## authority = col_character(),
## date = col_character(),
## controller = col_character(),
## article_violated = col_character(),
## type = col_character(),
## source = col_character(),
## summary = col_character()
## )
gdpr_text <- readr::read_tsv('https://raw.
In previous work with Skip Krueger, we conceptualized bond ratings as a multiple rater problem and extracted measure of state level creditworthiness. I had always had it on my list to do something like this and recently ran across a package called geofacet that makes it simply to easy to do. The end result should parse out state level credit risk and showcase the time series of credit risk for each of the states.
Beer Distribution
The #tidyTuesday for March 31, 2020 is on beer. The essential elements and a method for pulling the data are shown:
Imgur
A Comment on Scraping .pdf
The Tweet
The details on how the data were obtained are a nice overview of scraping .pdf files. The code for doing it is at the bottom of the page. @thomasmock has done a great job commenting his way through it.