TT: Beyoncé and Taylor Swift Lyrics
tidyTuesday: Beyoncé and Taylor Swift Lyrics
tidyTuesday
for the final week of September 2020 is based on the music of Beyoncé and Taylor Swift. To be honest, I do not know either artist well so I will pick Beyoncé and look at her lyrics. The raw data are organized as a rather typical text file though there is some underlying tidyness to the rows and songs as embedded data to work with. I will not work much with it but it is there to exploit. First, I load the data.
beyonce_lyrics <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-09-29/beyonce_lyrics.csv')
##
## -- Column specification --------------------------------------------------------
## cols(
## line = col_character(),
## song_id = col_double(),
## song_name = col_character(),
## artist_id = col_double(),
## artist_name = col_character(),
## song_line = col_double()
## )
taylor_swift_lyrics <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-09-29/taylor_swift_lyrics.csv')
##
## -- Column specification --------------------------------------------------------
## cols(
## Artist = col_character(),
## Album = col_character(),
## Title = col_character(),
## Lyrics = col_character()
## )
sales <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-09-29/sales.csv')
##
## -- Column specification --------------------------------------------------------
## cols(
## artist = col_character(),
## title = col_character(),
## country = col_character(),
## sales = col_double(),
## released = col_character(),
## re_release = col_character(),
## label = col_character(),
## formats = col_character()
## )
charts <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-09-29/charts.csv')
##
## -- Column specification --------------------------------------------------------
## cols(
## artist = col_character(),
## title = col_character(),
## released = col_character(),
## re_release = col_character(),
## label = col_character(),
## formats = col_character(),
## chart = col_character(),
## chart_position = col_character()
## )
str(beyonce_lyrics)
## spec_tbl_df[,6] [22,616 x 6] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ line : chr [1:22616] "If I ain't got nothing, I got you" "If I ain't got something, I don't give a damn" "'Cause I got it with you" "I don't know much about algebra, but I know 1+1 equals 2" ...
## $ song_id : num [1:22616] 50396 50396 50396 50396 50396 ...
## $ song_name : chr [1:22616] "1+1" "1+1" "1+1" "1+1" ...
## $ artist_id : num [1:22616] 498 498 498 498 498 498 498 498 498 498 ...
## $ artist_name: chr [1:22616] "Beyoncé" "Beyoncé" "Beyoncé" "Beyoncé" ...
## $ song_line : num [1:22616] 1 2 3 4 5 6 7 8 9 10 ...
## - attr(*, "spec")=
## .. cols(
## .. line = col_character(),
## .. song_id = col_double(),
## .. song_name = col_character(),
## .. artist_id = col_double(),
## .. artist_name = col_character(),
## .. song_line = col_double()
## .. )
Beyonce’s Favorite Lyric Words
What words does Beyoncé use the most? I will deploy tidytext
tools to unnest the word tokens, get rid of stop words, and create a barplot from the table of frequencies.
beyonce_lyrics %>%
unnest_tokens(word, line) %>% # Parse the lyric lines to tidy: one word per row.
anti_join(., stop_words) %>% # Remove the stop words
group_by(word) %>% # Group them by word
summarise(Count = n()) %>% # How common is the word?
top_n(25) %>% # Keep the top 25
ggplot() +
aes(x=fct_reorder(word, Count), y=Count) +
geom_col() +
coord_flip() +
labs(x="Word", title="Beyoncé's Top Words in Lyrics")
## Joining, by = "word"
## Selecting by Count
Beyoncé’s Cloud
library(wordcloud2)
beyonce_lyrics %>%
unnest_tokens(word, line) %>% # Parse the lyric lines to tidy: one word per row.
anti_join(., stop_words) %>% # Remove the stop words
group_by(word) %>% # Group them by word
summarise(Count = n()) %>% # How common is the word?
wordcloud2::wordcloud2(size = 0.7, shuffle = TRUE) -> WC111
## Joining, by = "word"
# htmlwidgets::saveWidget(widgetframe::frameWidget(WC111), file='widgets/wcbey.html')
# MyWC
# widgetframe::frameWidget(WC111)
WC111
I will finish with the sentence rendition of the graphic.
library(wordcloud2)
MyDat <- beyonce_lyrics %>%
unnest_tokens(word, line) %>% # Parse the lyric lines to tidy: one word per row.
anti_join(., stop_words) %>% # Remove the stop words
group_by(word) %>% # Group them by word
summarise(freq = n()) %>% # How common is the word?
arrange(desc(freq))
## Joining, by = "word"
MyWC <- wordcloud2::letterCloud(MyDat, word="Most Popular \n Words in \n Beyonce \n Lyrics", wordSize = 1.75, backgroundColor="lightblue", color='random-dark', size=0.8)
# htmlwidgets::saveWidget(widgetframe::frameWidget(MyWC), file='widgets/beyonce.html', selfcontained = TRUE)
# MyWC
widgetframe::frameWidget(MyWC)
Something about this does not seem to work now as I redesign the website. It did at one time.