---
title: 'SOCI424: Semantic networks'
author:
- (your name and role here)
- (your name and role here)
output:
html_document: default
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
# load the 'igraph' package
library(igraph)
```
# In Pairs:
## Part 1: Choose your artists
In this lab, you will be working with the lexical semantics of musical artists. Using your suggestions, I have pre-compiled semantic networks for the 26 artists below. To build these networks, I downloaded lyrics from at most 50 songs from each artists (sorted by popularity). After removing stopwords, relations were build between terms that occurred within a distance of 10 words in within a song. (Note: the language models I used did not have every language used in the artists' songs. Because of this, the results for Korean-language songs will be less reliable.) The networks were pruned to the top 500 terms using pointwise mutual information for the ranking.
1. [ABBA](https://soci424.netlify.com/data/lyrics/ABBA 20 top.csv)
2. [Adele](https://soci424.netlify.com/data/lyrics/Adele 20 top.csv)
3. [Ali Gatie](https://soci424.netlify.com/data/lyrics/Ali Gatie 20 top.csv)
4. [Arctic Monkeys](https://soci424.netlify.com/data/lyrics/Arctic Monkeys 20 top.csv)
5. [Backstreet Boys](https://soci424.netlify.com/data/lyrics/Backstreet Boys 20 top.csv)
6. [CL](https://soci424.netlify.com/data/lyrics/CL 20 top.csv)
7. [Childish Gambino](https://soci424.netlify.com/data/lyrics/Childish Gambino 20 top.csv)
8. [Chita](https://soci424.netlify.com/data/lyrics/Chita 20 top.csv)
9. [Clay and Friends](https://soci424.netlify.com/data/lyrics/Clay and Friends 20 top.csv)
10. [Coeur de Pirate](https://soci424.netlify.com/data/lyrics/Coeur de Pirate 20 top.csv)
11. [Del Water Gap](https://soci424.netlify.com/data/lyrics/Del Water Gap 20 top.csv)
12. [Dire Straits](https://soci424.netlify.com/data/lyrics/Dire Straits 20 top.csv)
13. [Glass Animals](https://soci424.netlify.com/data/lyrics/Glass Animals 20 top.csv)
14. [Juice WRLD](https://soci424.netlify.com/data/lyrics/Juice WRLD 20 top.csv)
15. [Kendrick Lamar](https://soci424.netlify.com/data/lyrics/Kendrick Lamar 20 top.csv)
16. [Kid Cudi](https://soci424.netlify.com/data/lyrics/Kid Cudi 20 top.csv)
17. [Lana Del Rey](https://soci424.netlify.com/data/lyrics/Lana Del Rey 20 top.csv)
18. [Little Mix](https://soci424.netlify.com/data/lyrics/Little Mix 20 top.csv)
19. [Mac Miller](https://soci424.netlify.com/data/lyrics/Mac Miller 20 top.csv)
20. [Maverick Sabre](https://soci424.netlify.com/data/lyrics/Maverick Sabre 20 top.csv)
21. [Nathy Peluso](https://soci424.netlify.com/data/lyrics/Nathy Peluso 20 top.csv)
22. [Peter McPoland ](https://soci424.netlify.com/data/lyrics/Peter McPoland 20 top.csv)
23. [Ramones](https://soci424.netlify.com/data/lyrics/Ramones 20 top.csv)
24. [Red Hot Chili Peppers](https://soci424.netlify.com/data/lyrics/Red Hot Chili Peppers 20 top.csv)
25. [Taylor Swift ](https://soci424.netlify.com/data/lyrics/Taylor Swift 20 top.csv)
26. [The Weeknd](https://soci424.netlify.com/data/lyrics/The Weeknd 20 top.csv)
### Choose your fighters
Choose **two** of the artists listed above to analyze (one chosen by each member of the group, if possible). It will help if you're a bit familiar with the artist's work!
- *Use R's `read.csv()` to load the two artists' data, saving them to appropriately named variables. What do the resulting data frames look like? What does each row represent?*
```{r 1_1}
# (your code here)
```
- *Use `graph_from_data_frame()` to convert each of these into a network object, naming the variable appropriately. How many vertices does each network have. How many edges? Do the networks have any vertex or edge attributes? What do those attributes represent?*
```{r 1_2}
# (your code here)
```
## Part 2: Visualization
- *Adjust the visual attributes of the nodes and edges in the network to make a (relatively) clear visualization. You should use a "Fruchterman-Reingold" layout (`layout_with_fr()`), which helps spread the vertices apart from one another a bit better than the default "Kamada-Kawai" layout. Likely, you will want to set the vertex size to 0. You might also consider setting the edge color to something with translucency (e.g. `"#00000033"`). (Note: you are unlikely to be able to read the labels clearly in RStudio -- you will export the figure to a PDF in the next step to aid in interpreting the network.)*
```{r 2_1}
# (your code here)
```
Saving your visualizations as PDFs in R is a bit odd but relatively straightforward, and doing so can help with reading dense and large networks. Creating a PDF is a three-step process: (1) create an 'empty' PDF file using the `pdf()` function; (2) 'draw' whatever you want using various plotting functions; (3) finally, you must 'finalize' the file with `dev.off()`. For example:
```
# first make an 8-inch by 5-inch PDF file
pdf(file='my_beautiful_figure.pdf',width=8,height=5)
# now plot something to it
plot(1:20,sin(1:20/10),pch=1:20,cex=2)
# and 'finalize' the file by closing the graphics device
dev.off()
```
- *Follow the format listed above to plot each of your two networks in PDF. You can put each one in a separate PDF file, or you can put them both in the same PDF file (in which case each visualization will be one page in the PDF document).*
```{r 2_2}
# (your code here)
```
- *Examine the files in detail in your favorite PDF viewing software, zooming in on the vertices to look at denser regions (you may want to go back and change some of the visual attributes -- e.g. the `label.cex` vertex attribute to change the text size). Talk about what kind of structure you see in each network (e.g. prominent terms, major clusters), and compare the two networks to one another. Thinking about the semantic structure of the artists' lyrical output, what do you think you can say about the artists music?*
- *(your response here)*
- *Consider the co-occurence relation that defines these semantic networks. Are they more similar to the projections from affiliation (bipartite) networks from last week or the corripondence analysis? How so? What would it mean to think of these relationships in terms of field theory?*
- *(your response here)*
## Part 3: Clusters
- *In this section you will be running community-detection algorithms on the semantic networks. But first, it will be important to let igraph know that not all of the relations between terms are equally strong. Set the `weight` edge attribute of each of your networks to the count of co-occurrence represented by each edge.*
```{r 3_1}
# (your code here)
```
- *Use `cluster_louvain()` on each of your networks, saving the resulting "`communities`" objects into appropriately named variables (e.g. `clusters_abba`). How many clusters were identified for each artist (you can use the `length()` function on the `communities` objects)? What are their modularities (these can be found using the `modularity()` function on the objects)? What do these numbers tell you about the artists' oeuvres?*
```{r 3_2}
# (your code here)
```
- *Spend a moment looking at the individual terms in the clusters (you can look at, e.g., the 3rd cluster's membership using list indexing: `clusters_abba[[3]]`). Given what you know about the artists, what might these clusters represent? Songs? Themes? Collaborations? (You don't need to show me the code for all of the clusters you look at -- try simply exploring them in the console)*
- *(your response here)*
- ***Bonus question for more experienced R users:** Recreate the PDFs from part 2, but color the terms according to cluster membership.*
```{r 3_4}
# (your code here)
```
## Part 4: Centrality
- *For each of the two artists, calculate the weighted degree centrality and the betweenness centrality for the terms in the network. (The weighted degree centrality of a vertex -- also called its 'strength' -- is just the sum of the weights of the edges connected to that vertex, and it can be calculated with the `strength()` function) Remember to use the inverse of the co-occurence countes as the weights when calculating betweenness (see the slides and lab from Sept 30 for details on this)! Save these centrality calculations either as vertex attributes on the networks or as stand-alone variables.*
```{r 4_1}
# (your code here)
```
- *Use `sort()` to find the most central terms in each network according the different measures. How should you interpret the different kinds of centrality? Can you find the central terms in the PDFs you created in part 2? Do these terms make sense, given what you know about the artists?*
```{r 4_2}
# (your code here)
```
- ***Bonus question for more experienced R users:** Recreate the PDFs again, but only include labels for the ten terms with the highest betweenness centrality or weighted degree.*
```{r 4_3}
# (your code here)
```