March 19, 2022

# National Weather Service Portland, Part II

I first will try to load it without any intervention to see what it looks like. As we will see, it is quite messy in a few easy ways and a few that are a bit more tricky.

NWS <- read.csv(url("https://www.weather.gov/source/pqr/climate/webdata/Portland_dailyclimatedata.csv"))
kable_styling() %>%
scroll_box(width = "100%", height = "500px")
Daily.Temperature.and.Precipitation.Data X X.1 X.2 X.3 X.4 X.5 X.6 X.7 X.8 X.9 X.10 X.11 X.12 X.13 X.14 X.15 X.16 X.17 X.18 X.19 X.20 X.21 X.22 X.23 X.24 X.25 X.26 X.27 X.28 X.29 X.30 X.31 X.32 X.33
Portland, Oregon Airport
Data Starts: 13 Oct 1940 END Year: 2019
TX is Maximum Temperature (deg F), TX is Minimum Temperature (deg F), PR is Precipitation (inches), SN is Snowfall (inches)
Example: High Temperature 23 October 1940 is 58 while low was 53 deg.
YR MO 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 AVG or Total
1940 10 TX M M M M M M M M M M M M 75 70 64 72 72 78 78 64 63 61 58 57 57 57 56 53 59 59 52 M
1940 10 TN M M M M M M M M M M M M 57 53 52 50 58 58 59 54 48 41 53 48 41 38 37 45 48 50 46 M
1940 10 PR M M M M M M M M M M M M 0.01 T T 0 0.13 0 T 0.14 0.05 0 0.63 1.03 0 0 T 0.18 0.58 0.5 0.25 M
1940 10 SN M M M M M M M M M M M M 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

The column names are stored in the seventh row; to properly import this. In addition, there are two missing value codes: M and - that will have to be accounted for. I will use skip to skip the first 6 rows and declare two distinct values to be encoded as missing. Let’s see what we get.

NWS <- read.csv(url("https://www.weather.gov/source/pqr/climate/webdata/Portland_dailyclimatedata.csv"), skip=6, na.strings = c("M","-"))
head(NWS, 10)
##      YR MO  X   X1   X2   X3   X4   X5   X6   X7   X8   X9  X10  X11  X12  X13
## 1  1940 10 TX <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>   75
## 2  1940 10 TN <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>   57
## 3  1940 10 PR <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> 0.01
## 4  1940 10 SN <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>    0
## 5  1940 11 TX   52   53   47   55   51   58   56   50   48   47   46   45   45
## 6  1940 11 TN   40   38   36   32   42   46   46   42   35   34   35   33   34
## 7  1940 11 PR 0.17 0.02    T    0 0.07 0.28 0.85 0.29 0.02 0.01 0.01    0    0
## 8  1940 11 SN    0    0    0    0    0    0    0    0    0    0    0    0    0
## 9  1940 12 TX   51   53   52   51   56   54   50   51   48   50   46   45   43
## 10 1940 12 TN   42   40   42   42   44   37   34   35   32   26   34   28   27
##    X14 X15 X16  X17  X18 X19  X20  X21 X22  X23  X24 X25 X26  X27  X28  X29 X30
## 1   70  64  72   72   78  78   64   63  61   58   57  57  57   56   53   59  59
## 2   53  52  50   58   58  59   54   48  41   53   48  41  38   37   45   48  50
## 3    T   T   0 0.13    0   T 0.14 0.05   0 0.63 1.03   0   0    T 0.18 0.58 0.5
## 4    0   0   0    0    0   0    0    0   0    0    0   0   0    0    0    0   0
## 5   47  53  49   46   49  46   49   50  44   42   44  51  44   45   59   57  45
## 6   33  28  27   36   30  29   36   33  28   37   35  37  36   38   43   40  39
## 7    0   0   0 0.29 0.01   0 0.37    T   0 0.12 0.62   0   0 0.51 0.89    T   T
## 8    0   0   0    0    0   0    0    0   0    0    0   0   0    0    0    0   0
## 9   40  39  39   41   41  45   46   62  60   56   53  54  45   50   51   43  44
## 10  25  29  33   35   34  35   41   39  39   42   42  42  40   38   36   35  37
##     X31 AVG.or.Total
## 1    52         <NA>
## 2    46         <NA>
## 3  0.25         <NA>
## 4     0            0
## 5  <NA>         49.1
## 6  <NA>         35.9
## 7  <NA>         4.53
## 8  <NA>            0
## 9    45         48.5
## 10   32           36

Two other things are of note. The first one is that R really doesn’t like columns to be named as numbers so we have an X in front of the numeric days. The second is that the column denoting which variable the rows represent is now X. Let me rename X to be Variable.

NWS <- read.csv(url("https://www.weather.gov/source/pqr/climate/webdata/Portland_dailyclimatedata.csv"), skip=6, na.strings = c("M","-")) %>%
rename(Variable = X)
str(NWS)
## 'data.frame':    3808 obs. of  35 variables:
##  $YR : int 1940 1940 1940 1940 1940 1940 1940 1940 1940 1940 ... ##$ MO          : int  10 10 10 10 11 11 11 11 12 12 ...
##  $Variable : chr "TX" "TN" "PR" "SN" ... ##$ X1          : chr  NA NA NA NA ...
##  $X2 : chr NA NA NA NA ... ##$ X3          : chr  NA NA NA NA ...
##  $X4 : chr NA NA NA NA ... ##$ X5          : chr  NA NA NA NA ...
##  $X6 : chr NA NA NA NA ... ##$ X7          : chr  NA NA NA NA ...
##  $X8 : chr NA NA NA NA ... ##$ X9          : chr  NA NA NA NA ...
##  $X10 : chr NA NA NA NA ... ##$ X11         : chr  NA NA NA NA ...
##  $X12 : chr NA NA NA NA ... ##$ X13         : chr  "75" "57" "0.01" "0" ...
##  $X14 : chr "70" "53" "T" "0" ... ##$ X15         : chr  "64" "52" "T" "0" ...
##  $X16 : chr "72" "50" "0" "0" ... ##$ X17         : chr  "72" "58" "0.13" "0" ...
##  $X18 : chr "78" "58" "0" "0" ... ##$ X19         : chr  "78" "59" "T" "0" ...
##  $X20 : chr "64" "54" "0.14" "0" ... ##$ X21         : chr  "63" "48" "0.05" "0" ...
##  $X22 : chr "61" "41" "0" "0" ... ##$ X23         : chr  "58" "53" "0.63" "0" ...
##  $X24 : chr "57" "48" "1.03" "0" ... ##$ X25         : chr  "57" "41" "0" "0" ...
##  $X26 : chr "57" "38" "0" "0" ... ##$ X27         : chr  "56" "37" "T" "0" ...
##  $X28 : chr "53" "45" "0.18" "0" ... ##$ X29         : chr  "59" "48" "0.58" "0" ...
##  $X30 : chr "59" "50" "0.5" "0" ... ##$ X31         : chr  "52" "46" "0.25" "0" ...
##  \$ AVG.or.Total: chr  NA NA NA "0" ...

It is disappointing that everything is stored as character type. That will prove advantageous in one respect because there is some /A garbage embedded in two of the variables (SN and PR). Here, I will ask R to find all columns that are stored as character and ask it to remove the string.

NWS <- NWS %>% mutate(across(where(is.character), ~str_remove(.x, "/A")))

Now, we will have to fix the values T in the precipitation and snow variables [which are currently stored in repeated rows]. Nevertheless, this should give me what I need to create the monthly data.

### Daily Data

The daily data will necessarily not involve the column of Totals/Averages that we used for the monthly data so let us eliminate it.

NWS.Daily <- NWS %>% select(-AVG.or.Total)

Now I want to rename the columns with names X1, X2, …, X31 to Day.1 to Day.31 for clarity. It is largely inconsequential, it would work on the X’s but I prefer nicely labeled intermediate steps.

names(NWS.Daily) <- c("YR","MO","Variable",paste0("Day.",1:31))

The next step is to tidy the data. First, let me use pivot_longer on every column that starts with Day. putting the variable names in Day and variable values in value.

NWS.Daily.Base <- NWS.Daily %>%
pivot_longer(., cols=starts_with("Day."), names_to = "Day", values_to = "value")
head(NWS.Daily.Base)
## # A tibble: 6 × 5
##      YR    MO Variable Day   value
##   <int> <int> <chr>    <chr> <chr>
## 1  1940    10 TX       Day.1 <NA>
## 2  1940    10 TX       Day.2 <NA>
## 3  1940    10 TX       Day.3 <NA>
## 4  1940    10 TX       Day.4 <NA>
## 5  1940    10 TX       Day.5 <NA>
## 6  1940    10 TX       Day.6 <NA>

Now I want to turn the days into numbers [they are character above] and then use pivot_wider to get the four variables into unique columns, recode trace [T] where they exist to numbers that are half the size of the smallest values, turn them into numbers, and create a date.

NWS.Daily.Base %<>%  mutate(Day = str_remove(Day, "Day.")) %>%
pivot_wider(., names_from = "Variable", values_from = "value")
NWS.Daily <- NWS.Daily.Base %>% mutate(PR = recode(PR, T = "O.005"), SN = recode(SN, T = "O.005")) %>%
mutate(TX = as.numeric(TX), TN = as.numeric(TN), PR = as.numeric(PR), SN = as.numeric(SN),
date = as.Date(paste(MO,Day,YR,sep="-"), format="%m-%d-%Y")
)
NWS.Daily.Clean <- NWS.Daily %>% filter(!is.na(date))
head(NWS.Daily.Clean)
## # A tibble: 6 × 8
##      YR    MO Day      TX    TN    PR    SN date
##   <int> <int> <chr> <dbl> <dbl> <dbl> <dbl> <date>
## 1  1940    10 1        NA    NA    NA    NA 1940-10-01
## 2  1940    10 2        NA    NA    NA    NA 1940-10-02
## 3  1940    10 3        NA    NA    NA    NA 1940-10-03
## 4  1940    10 4        NA    NA    NA    NA 1940-10-04
## 5  1940    10 5        NA    NA    NA    NA 1940-10-05
## 6  1940    10 6        NA    NA    NA    NA 1940-10-06

This is exactly what I needed.

# By Day?

NWS.Daily.Clean %>% ggplot() + aes(x=date, y=PR) + geom_point(alpha=0.1) + theme_ipsum_rc() + labs(title="Daily Precipitation")

This is really pretty terrible. This is going to need a good bit of work. My first go is going to be to create a moving average that can smooth out the look. I will use a 14-day moving average.

NWS.Daily.Clean %>%
arrange(date) %>%
mutate(Rolling.Average = zoo::rollmean(PR, 7, fill=NA)) %>%
ggplot(., aes(x=date, y=PR)) + geom_point(alpha=0.05, size=0.5) + geom_line(aes(x=date, y=Rolling.Average), inherit.aes=FALSE, color="red") + theme_ipsum_rc() + labs(title="Daily Precipitation")

## Time Series Plots

### Daily

NWS.Daily.Clean <- NWS.Daily.Clean %>% as_tsibble(., index=date)
NWS.Daily.Clean %>% filter(date > "2010-01-01") %>% autoplot(TX) + labs(title="Daily High Temperatures in Portland, Oregon", caption = "Data from NWS", x = "Date", y="High Temperature (deg F)")

## Seasonal Plots

### Daily

NWS.Daily.Clean %>% filter(date > "2015-01-01") %>% gg_season(TX, labels = "both")

## Subseries Plots

### Daily

Daily subseries plots are a mess because there are 31 and I am not sure that there would be much instructive anyway as daily variation in temperature is quite noisy.

NWS.Daily.Clean %>% gg_subseries(TX, period="weeks")

# Autocorrelation

To what degree are observations separated by k time periods correlated.

$r_{k} = \frac{\sum\limits_{t=k+1}^T (y_{t}-\bar{y})(y_{t-k}-\bar{y})} {\sum\limits_{t=1}^T (y_{t}-\bar{y})^2}$ where $$T$$ is the length of the time series. The autocorrelation coefficients make up the autocorrelation function or ACF.

The autocorrelation coefficients for the monthly high temperatures can be computed using the ACF() function.

### Daily: ACF

NWS.Daily.Clean %>% ACF(TX) %>% autoplot()