October 19, 2020

TT: Beyoncé and Taylor Swift Lyrics

tidyTuesday: Beyoncé and Taylor Swift Lyrics

tidytuesday header photo of beyonce and taylor swift

tidyTuesday for the final week of September 2020 is based on the music of Beyoncé and Taylor Swift. To be honest, I do not know either artist well so I will pick Beyoncé and look at her lyrics. The raw data are organized as a rather typical text file though there is some underlying tidyness to the rows and songs as embedded data to work with. I will not work much with it but it is there to exploit. First, I load the data.

beyonce_lyrics <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-09-29/beyonce_lyrics.csv')

## 
## -- Column specification --------------------------------------------------------
## cols(
##   line = col_character(),
##   song_id = col_double(),
##   song_name = col_character(),
##   artist_id = col_double(),
##   artist_name = col_character(),
##   song_line = col_double()
## )

taylor_swift_lyrics <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-09-29/taylor_swift_lyrics.csv')

## 
## -- Column specification --------------------------------------------------------
## cols(
##   Artist = col_character(),
##   Album = col_character(),
##   Title = col_character(),
##   Lyrics = col_character()
## )

sales <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-09-29/sales.csv')

## 
## -- Column specification --------------------------------------------------------
## cols(
##   artist = col_character(),
##   title = col_character(),
##   country = col_character(),
##   sales = col_double(),
##   released = col_character(),
##   re_release = col_character(),
##   label = col_character(),
##   formats = col_character()
## )

charts <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-09-29/charts.csv')

## 
## -- Column specification --------------------------------------------------------
## cols(
##   artist = col_character(),
##   title = col_character(),
##   released = col_character(),
##   re_release = col_character(),
##   label = col_character(),
##   formats = col_character(),
##   chart = col_character(),
##   chart_position = col_character()
## )

str(beyonce_lyrics)

## spec_tbl_df[,6] [22,616 x 6] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ line       : chr [1:22616] "If I ain't got nothing, I got you" "If I ain't got something, I don't give a damn" "'Cause I got it with you" "I don't know much about algebra, but I know 1+1 equals 2" ...
##  $ song_id    : num [1:22616] 50396 50396 50396 50396 50396 ...
##  $ song_name  : chr [1:22616] "1+1" "1+1" "1+1" "1+1" ...
##  $ artist_id  : num [1:22616] 498 498 498 498 498 498 498 498 498 498 ...
##  $ artist_name: chr [1:22616] "Beyoncé" "Beyoncé" "Beyoncé" "Beyoncé" ...
##  $ song_line  : num [1:22616] 1 2 3 4 5 6 7 8 9 10 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   line = col_character(),
##   ..   song_id = col_double(),
##   ..   song_name = col_character(),
##   ..   artist_id = col_double(),
##   ..   artist_name = col_character(),
##   ..   song_line = col_double()
##   .. )

Beyonce’s Favorite Lyric Words

What words does Beyoncé use the most? I will deploy tidytext tools to unnest the word tokens, get rid of stop words, and create a barplot from the table of frequencies.

beyonce_lyrics %>% 
  unnest_tokens(word, line) %>%   # Parse the lyric lines to tidy: one word per row.
  anti_join(., stop_words) %>%    # Remove the stop words
  group_by(word) %>%              # Group them by word
  summarise(Count = n()) %>%      # How common is the word?
  top_n(25) %>%                   # Keep the top 25
  ggplot() + 
  aes(x=fct_reorder(word, Count), y=Count) + 
  geom_col() + 
  coord_flip() + 
  labs(x="Word", title="Beyoncé's Top Words in Lyrics")

## Joining, by = "word"

## Selecting by Count

Beyoncé’s Cloud

library(wordcloud2)
beyonce_lyrics %>% 
  unnest_tokens(word, line) %>%   # Parse the lyric lines to tidy: one word per row.
  anti_join(., stop_words) %>%    # Remove the stop words
  group_by(word) %>%              # Group them by word
  summarise(Count = n()) %>%      # How common is the word?
  wordcloud2::wordcloud2(size = 0.7, shuffle = TRUE) -> WC111

## Joining, by = "word"

# htmlwidgets::saveWidget(widgetframe::frameWidget(WC111), file='widgets/wcbey.html')
# MyWC
# widgetframe::frameWidget(WC111)
WC111

I will finish with the sentence rendition of the graphic.

library(wordcloud2)
MyDat <- beyonce_lyrics %>% 
  unnest_tokens(word, line) %>%   # Parse the lyric lines to tidy: one word per row.
  anti_join(., stop_words) %>%    # Remove the stop words
  group_by(word) %>%              # Group them by word
  summarise(freq = n()) %>%      # How common is the word?
  arrange(desc(freq))

## Joining, by = "word"

MyWC <- wordcloud2::letterCloud(MyDat, word="Most Popular \n Words in \n Beyonce \n Lyrics", wordSize = 1.75, backgroundColor="lightblue", color='random-dark', size=0.8)
# htmlwidgets::saveWidget(widgetframe::frameWidget(MyWC), file='widgets/beyonce.html', selfcontained = TRUE)
# MyWC
widgetframe::frameWidget(MyWC)

Something about this does not seem to work now as I redesign the website. It did at one time.