May 9, 2018

tidyTuesday: coffee chains

The tidyTuesday for this week is coffee chain locations

For this week: 1. The basic link to the #tidyTuesday shows an original article for Week 6.

First, let’s import the data; it is a single Excel spreadsheet. The page notes that starbucks, Tim Horton, and Dunkin Donuts have raw data available.

library(readxl)
library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──

## ✓ ggplot2 3.3.3     ✓ purrr   0.3.4
## ✓ tibble  3.0.6     ✓ dplyr   1.0.4
## ✓ tidyr   1.1.2     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(janitor)

## 
## Attaching package: 'janitor'

## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

library(geofacet)
library(ggbeeswarm)
library(ggrepel)
# A great function appears below that I grabbed from Stack Overflow a while back.....
# URL functionality with read_excel is tricky so I turned them in to rds
read_excel_allsheets <- function(filename, tibble = TRUE) {
    sheets <- readxl::excel_sheets(filename)
    x <- lapply(sheets, function(X) readxl::read_excel(filename, sheet = X))
    if(!tibble) x <- lapply(x, as.data.frame)
    names(x) <- sheets
    x
}
download.file("https://github.com/rfordatascience/tidytuesday/raw/master/data/2018/2018-05-07/week6_coffee_chains.xlsx", "coffee.xlsx")
coffee.xl <- read_excel_allsheets("coffee.xlsx")
Starbucks <- readRDS(url("https://github.com/robertwwalker/academic-mymod/raw/master/data/week6SB.rds"))
Dunkin.Donuts <- readRDS(url("https://github.com/robertwwalker/academic-mymod/raw/master/data/week6DD.rds"))
Tim.Hortons <- readRDS(url("https://github.com/robertwwalker/academic-mymod/raw/master/data/week6TH.rds"))

What do the data look like?

library(skimr)
skim(Starbucks)

Table 1: Data summary
Name	Starbucks
Number of rows	25600
Number of columns	13
_______________________
Column type frequency:
character	11
numeric	2
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	n_unique
Brand	0	1.00	7	21	4
Store Number	0	1.00	5	12	25599
Store Name	0	1.00	2	60	25364
Ownership Type	0	1.00	8	13	4
Street Address	2	1.00	1	234	25353
City	14	1.00	2	29	5470
State/Province	0	1.00	1	3	338
Country	0	1.00	2	2	73
Postcode	1521	0.94	1	9	18888
Phone Number	6861	0.73	1	18	18559
Timezone	0	1.00	18	30	101

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
Longitude	1	1	-27.87	96.84	-159.46	-104.66	-79.35	100.63	176.92	▇▇▂▂▅
Latitude	1	1	34.79	13.34	-46.41	31.24	36.75	41.57	64.85	▁▁▁▇▂

skim(Dunkin.Donuts)

Table 1: Data summary
Name	Dunkin.Donuts
Number of rows	4898
Number of columns	22
_______________________
Column type frequency:
character	13
numeric	9
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	n_unique
biz_name	0	1.00	8	38	33
e_address	0	1.00	6	61	4864
e_city	0	1.00	2	27	1770
e_state	0	1.00	2	2	41
e_zip_full	0	1.00	10	10	545
e_country	0	1.00	3	3	1
loc_county	0	1.00	3	21	395
loc_PMSA	0	1.00	2	4	53
loc_TZ	0	1.00	3	5	5
loc_DST	0	1.00	1	1	3
web_url	0	1.00	20	175	22
biz_info	4091	0.16	14	18	709
biz_phone	0	1.00	14	14	4562

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
id	1	2459.46	1420.28	1.00	1231.25	2458.50	3686.75	4920.00	▇▇▇▇▇
e_postal	1	21528.34	20311.57	1001.00	6080.00	13334.50	32810.75	98112.00	▇▃▁▁▁
loc_area_code	1	590.09	229.26	201.00	401.00	610.00	781.00	989.00	▇▅▇▇▆
loc_FIPS	1	27911.17	12470.14	1069.00	17031.00	26125.00	36111.00	55111.00	▂▅▆▇▂
loc_MSA	1	4284.65	2849.57	160.00	1520.00	3800.00	6880.00	9320.00	▇▃▂▆▅
loc_LAT_centroid	1	39.62	4.33	21.42	39.39	41.22	42.11	47.63	▁▁▁▇▂
loc_LAT_poly	1	39.62	4.32	21.39	39.38	41.20	42.09	47.64	▁▁▁▇▂
loc_LONG_centroid	1	-77.55	7.31	-157.93	-81.44	-75.08	-72.66	-67.23	▁▁▁▁▇
loc_LONG_poly	1	-77.55	7.31	-157.96	-81.44	-75.08	-72.66	-67.28	▁▁▁▁▇

skim(Tim.Hortons)

Table 1: Data summary
Name	Tim.Hortons
Number of rows	4955
Number of columns	6
_______________________
Column type frequency:
character	6
________________________
Group variables	None

Variable type: character

skim_variable	complete_rate	min	max	n_unique
country	1	2	2	2
address	1	6	51	4803
city	1	3	38	1206
postal_code	1	4	7	4328
state	1	2	2	27
store_name	1	2	63	3167

A basic plot of the global Starbucks data.

library(ggmap)

## Google's Terms of Service: https://cloud.google.com/maps-platform/terms/.

## Please cite ggmap if you use it! See citation("ggmap") for details.

mapWorld <- borders("world", colour="gray50", fill="gray50") # create a layer of borders
mp <- ggplot() +   mapWorld
mp <- mp + geom_point(aes(x=Starbucks$Longitude, y=Starbucks$Latitude) ,color="dark green", size=0.5) + xlab("") + ylab("")
mp <- mp + geom_point(aes(x=Dunkin.Donuts$loc_LONG_centroid, y=Dunkin.Donuts$loc_LAT_centroid) ,color="orange", size=0.5) + xlab("") + ylab("")
mp

## Warning: Removed 1 rows containing missing values (geom_point).

Starbucks and Dunkin

Google Maps interface changed and I have not updated this part. Shame.