rww-science: the blog

November 9, 2022

Oregon County Support for Retaining Slavery in the OR Constitution

Last update: November 15. 2022 In preparation for the dumpster fire that is Oregon election reporting, I previously posed on importing a directory of .csv files. At present, that is what I can find to build this. What does the interface look like? library(magick) Img <- image_read("./img/SShot.png") image_ggplot(Img) This is terrible, there is a javascript button to download each separately. Nevertheless, here we go.

Read More...

February 22, 2021

tt: Employment and Earnings

As a continuation of the #DuBoisChallenge, this week’s tidyTuesday presents employment by industry, sex, race, and occupation. There is also some scraped data from the self-service tool that generates weekly and hourly earnings data from the CPS. Let’s see what we have. library(tidyverse) library(fpp3) library(magrittr) employed <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-02-23/employed.csv') earn <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-02-23/earn.csv') employed %<>% as_tsibble(index=year, key=c(industry,major_occupation,minor_occupation,race_gender)) Let me try to plot something. employed %>% filter(race_gender=="TOTAL") %>% autoplot(employ_n) + guides(color=FALSE) To be continued….

Read More...

February 8, 2021

TT: Wealth and Income

tidyTuesday-Screenshot tidyTuesday for the week of February 8, 2021 brings data from the US Census and the Urban Institute together to think about income, wealth, and racial inequality in these and other important economic indicators. There is a lot of data that they make available to accompany the nine charts about wealth inequality that they reported here. There is considerable variation in the scope and coverage of the various datasets; I will start by loading the ten datasets.

Read More...

November 30, 2020

Analyzing the Trump Campaign's Solicitations

tl;dr In September of 2018, I began to track email solicitations by the Trump Campaign. I have noticed a striking pattern of increasing fundraising activity that started just after the July 4 weekend but I wanted to verify this over the span of the data. In short, something is up. The Data I will use the wonderful gmailr package to access my gmail. You need a key and an id that the vignette gives guidance on.

Read More...

November 25, 2020

Socrata is amazingly handy for open data

The Socrata package makes it easy to access API calls built around SODA for open data access. If you try to skip the Socrata part, you usually only get a fraction of the available data. Socrata is intended to make open access data easier to manage and many government entities in the US use it as the portal to public data access. The R package makes interfacing with it much easier.

Read More...

October 19, 2020

Datasaurus Dozen

The datasaurus dozen The datasaurus dozen is a fantastic teaching resource for examining the importance of data visualization. Let’s have a look. The basic idea is that all thirteen (datasaurus plus 12) contain nearly identical means and standard deviations though they do vary if the five number summaries are deployed. The scatterplots that are derived from data with similar x-y summaries is a useful reminder that data science is about patterns, not just statistics.

Read More...

October 19, 2020

TT: Beyoncé and Taylor Swift Lyrics

tidyTuesday: Beyoncé and Taylor Swift Lyrics tidyTuesday for the final week of September 2020 is based on the music of Beyoncé and Taylor Swift. To be honest, I do not know either artist well so I will pick Beyoncé and look at her lyrics. The raw data are organized as a rather typical text file though there is some underlying tidyness to the rows and songs as embedded data to work with.

Read More...