---
title: 'SOCI424: Measuring and Theorizing Relations (***KEY***)'
author:
- 'Peter McMahan'
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# In Pairs:
### *Preamble*: installing `igraph`
This worksheet will introduce you to the idea of network data representations in R, and start to get you familiar with working with networks programatically. Although much of what you will work on below will use R's base functionality, we will also introduce a network-specific add-on package called `igraph`. This package enables a host of network-analytic methods and visualizations to R. But to use `igraph` you will first need to install it (if you haven't already) using one of the following methods:
- Use "Tools \> Install Packages ..." in RStudio, and type "`igraph`"" into the search bar
**OR**
- Type `install.packages('igraph')` into the R console
Once you've done that, the following code chunk will load `igraph` and get it ready to use below. (Note: this will bring up a few warnings about "masked objects" which can be safely ignored.)
```{r packages}
library(igraph)
```
## Part 1: Reading the data
### Southern American Society
The data we will use for this lab comes from the book *Deep South: A Social Anthropological Study of Caste and Class* (Davis, Gardner, and Gardner [1941] 2009). Among the book's detailed accounts of early--20th century social systems in the Southern United States, it contains data on the participation of a sample of 18 upper-class white women in a series of 14 social events in 1936. The authors provide the following figure to illustrate the data:
![Figure 3 from Davis, Gardner, and Gardner ([1941] 2009)](https://soci424.netlify.app/worksheets/figures/02-davis_1941_fig3.png){width="100%"}
### Task 1A: Inputting the data
One way to represent data of this type in R is as a matrix with values `1` (representing participation) and `0` (representing non-participation). The following snippet creates such a matrix, with 18 rows and 14 columns. However, it is missing data for Miss Charlotte McDowd, Miss Ruth Desand, and Mrs. Flora Price, who are erroneously indicated to have not participated in any events.
- *Using the figure above, update the code below in your own R Markdown file to include the events that these three women participated in.*
```{r 1a}
# this code uses R's `matrix()` function to take a sequence of numbers
# and convert them to a 2-dimensional matrix. For this to work, it needs
# to know how many rows to expect (`nrow=18`), and whether to fill the
# matrix column-by-column, or row-by-row (`byrow=TRUE`)
events <- matrix(c(
1,1,1,1,1,1,0,1,1,0,0,0,0,0,
1,1,1,0,1,1,1,1,0,0,0,0,0,0,
0,1,1,1,1,1,1,1,1,0,0,0,0,0,
1,0,1,1,1,1,1,1,0,0,0,0,0,0,
0,0,1,1,1,0,1,0,0,0,0,0,0,0,
0,0,1,0,1,1,0,1,0,0,0,0,0,0,
0,0,0,0,1,1,1,1,0,0,0,0,0,0,
0,0,0,0,0,1,0,1,1,0,0,0,0,0,
0,0,0,0,1,0,1,1,1,0,0,0,0,0,
0,0,0,0,0,0,1,1,1,0,0,1,0,0,
0,0,0,0,0,0,0,1,1,1,0,1,0,0,
0,0,0,0,0,0,0,1,1,1,0,1,1,1,
0,0,0,0,0,0,1,1,1,1,0,1,1,1,
0,0,0,0,0,1,1,0,1,1,1,1,1,1,
0,0,0,0,0,0,1,1,0,1,1,1,1,1,
0,0,0,0,0,0,0,1,1,1,0,1,0,0,
0,0,0,0,0,0,0,0,1,0,1,0,0,0,
0,0,0,0,0,0,0,0,1,0,1,0,0,0
),nrow=18,byrow=TRUE)
```
### Task 1B: Learning about people and events
Now that the table is represented in the matrix `events`, we can use R to tell us a bit more about it. The functions `rowSums(events)` and `colSums(events)` will add up all of the numbers in each row and in each column of `events`, respectively. Using these functions, answer the following questions (*show your code for each*):
- *Which event was the most popular (had the most attendees)?*
```{r 1b_1}
colSums(events)
```
The event with the most attendees is even number `r which.max(colSums(events))`.
- *How many women attended exactly four events?*
```{r 1b_2}
rowSums(events)
```
There were `r sum(rowSums(events)==4)` women who attended exactly four events
## Part 2: Building a person-to-person network
The matrix you created in Part 1 describes the relationships between individual people and events (this is called a *bipartite network*, which we'll talk about later in the semester). For the current analysis, we want a network that describes the relationships between the 18 women in the dataset. One way to infer such relationships is to ask how frequently the women attended the same event. Presumably, women who have a closer relationship will also attend more events together.
### Task 2A: Making a co-attendance matrix
Fortunately for us, there is a very basic technique from linear algebra that will turn an *affiliation matrix* like the one you created above into a *co-occurrence* matrix that will measure co-attendance at events. To do so, we will use [matrix multiplication](https://en.wikipedia.org/wiki/Matrix_multiplication). You don't need to know anything about linear algebra for this to work, and for the time being, you can just treat it like a magic wand (we will cover this more when we discuss bipartite networks later on).
- *Use the following command to make a co-occurrence matrix for event attendance (this task doesn't actually require you to do anything ðŸ˜„)*
```{r 2a}
# `%*%` is the R command for matrix multiplication,
# and the `t()` function transposes a matrix
co_attend <- events %*% t(events)
```
### Task 2B: Basic description and interpretation
Answer the following questions about your new `co_attend` matrix:
(*Hint*: you will probably need to use the R functions `nrow()` and `ncol()`)
(*Another hint*: you can access specific elements in a matrix in R using "square brackets" to index--`x[1,2]` will return the number in the first row and second column of a matrix `x`)
- *How many rows and columns does `co_attend` have? What do they represent?*
```{r 2b_a}
nrow(co_attend)
ncol(co_attend)
```
The `co_attend` matrix has `r nrow(co_attend)` rows and `r ncol(co_attend)` columns. Each row and each column represents one of the 18 women described in the table.
- *How many times did Mrs. Flora Price attend the same event with Miss Laura Mandeville?*
```{r 2b_b}
# Flora price is number 18, Laura Mandeville is number 2
co_attend[2,18]
```
Mrs. Flora Price attended `r co_attend[2,18]` events with Miss Laura Mandeville
- *How many events did Miss Brenda Rogers attend in total? (You should retrieve this information directly from the `co_attend` matrix)*
```{r 2b_c}
# Brenda Rogers is number 4
co_attend[4,4]
# double check this against the `events` matrix
rowSums(events)[4]
```
The diagonal of the matrix captures individuals' attendance counts. From this we can see that Miss Brenda Rogers attended `r co_attend[4,4]` events total.
### Task 2C: Simplifying the valued data
Many network methods assume that relations are *binary* (either exist or do not exist). The `co_attend` matrix you built above is *valued* (relations between the women can take different values). In this task you will simplify the valued network to create a binary network.
- *Make a new matrix, called `co_attend_3plus` that has the same dimensions (number of rows and columns) as `co_attend`, but whose values are either `TRUE` or `FALSE`, indicating for each pair of women whether they have attended at least 3* events together. (Note: This one may be a little tricky for new R users. The operator `>=` will tell you whether the value on the left is at least as big as the value on the right.)
```{r 2c_a}
co_attend_3plus <- co_attend <= 3
```
- *Did Miss Katherine Rogers attend at least three events with Mrs. Nora Fayette?*
```{r 2c_b}
# Miss Katherine Rogers: 12
# Mrs. Nora Fayette: 14
co_attend_3plus[12,14]
```
Miss Katherine Rogers **`r ifelse(co_attend_3plus[12,14],'did','did not')`** attend at least three events with Mrs. Nora Fayette.
- *Did Miss Katherine Rogers attend at least three events with Miss Eleanor Nye?*
```{r 2c_c}
# Miss Katherine Rogers: 12
# Miss Eleanor Nye: 7
co_attend_3plus[12,7]
```
Miss Katherine Rogers **`r ifelse(co_attend_3plus[12,7],'did','did not')`** attend at least three events with Miss Eleanor Nye.
## Part 3: Visualizing the network
Now we will use the `igraph` package to build special network representations of our data, and to visualize those networks.
### Task 3A: Visualizing the valued network
The function `graph_from_adjacency_matrix()` from the `igraph` package converts an *adjacency matrix* like the co-attendance matrix you created above into a special`graph` object that can represent a lot more than a simple matrix can. The function automatically figures out how many nodes there are in the network by looking at the number of rows/columns in the matrix, and constructs edges between those nodes based on the data.
- *Use the following command to created an undirected*, *weighted* network from the `co_attend` matrix you already made. Notice that the single command is split up across three lines of code! (Note: this question and the following one also don't require you to do anything)
```{r 3a_a}
# The `diag=FALSE` argument tells R to ignore the *diagonal* of the matrix
# (the 1st row of the 1st column, 2nd row of 2nd column, etc), which would
# create edges from each node back to itself. We will talk more about such
# `loop` edges later.
event_net <- graph_from_adjacency_matrix(
co_attend, mode='undirected',
weighted=TRUE, diag=FALSE)
```
We will talk about plotting networks in detail later in the course, but for the moment we can mostly use the default options built into `igraph`.
- *Use the following two commands to (a) tell `igraph` that we want the weight of the edges to be reflected in the width of the lines in the plot, and (b) to plot `event_net` using the default parameters. (Don't worry about how these commands work yet.)*
```{r 3a_b}
E(event_net)$width <- E(event_net)$weight
plot(event_net)
```
### Task 3B: Visualizing the binary network
We want to compare the valued network to the binary network you constructed above. This time, you will need to provide the R code to build the `graph` object and to plot it.
- *Use `graph_from_adjaceny_matrix()` to make a new object in R called `event_net_3plus`. (This command will look very similar to the one used to make `event_net` above)*
```{r 3b_a}
event_net_3plus <- graph_from_adjacency_matrix(
co_attend_3plus, mode='undirected',
diag=FALSE)
```
- *Plot `event_net_3plus`. (Since this is a binary network, you will not need to specify the edge width like we did in the previous task.*
```{r 3b_b}
plot(event_net_3plus)
```
# References
Davis, Allison, Burleigh Bradford Gardner, and Mary R. Gardner. (1941) 2009. *Deep South: A Social Anthropological Study of Caste and Class*. Univ of South Carolina Press.