---
title: 'SOCI424/624: Measuring and Theorizing Relations'
author: '(anonymous for peer review)'
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```


# *Preamble*: installing `igraph`

This worksheet will introduce you to the idea of network data representations in R, and start to get you familiar with working with networks programatically. Although much of what you will work on below will use R's base functionality, we will also introduce a network-specific add-on package called `igraph`. This package enables a host of network-analytic methods and visualizations to R. But to use `igraph` you will first need to install it (if you haven't already) using one of the following methods:

-   Use "Tools \> Install Packages ..." in RStudio, and type "`igraph`"" into the search bar

**OR**

-   Type `install.packages('igraph')` into the R console

Once you've done that, the following code chunk will load `igraph` and get it ready to use below. (Note: this will bring up a few warnings about "masked objects" which can be safely ignored.)

```{r packages}
library(igraph)
```

# Part 1: Reading the data

## Southern American Society

The data we will use for this lab comes from the book *Deep South: A Social Anthropological Study of Caste and Class* (Davis, Gardner, and Gardner [1941] 2009). Among the book's detailed accounts of early--20th century social systems in the Southern United States, it contains data on the participation of a sample of 18 upper-class white women in a series of 14 social events in 1936. The authors provide the following figure to illustrate the data:

![Figure 3 from Davis, Gardner, and Gardner ([1941] 2009)](https://soci424.netlify.app/worksheets/figures/02-davis_1941_fig3.png){width="100%"}

## Task 1A: Inputting the data

One way to represent data of this type in R is as a matrix with values `1` (representing participation) and `0` (representing non-participation). The following snippet creates such a matrix, with 18 rows and 14 columns. However, it is missing data for Miss Charlotte McDowd, Miss Ruth Desand, and Mrs. Flora Price, who are erroneously indicated to have not participated in any events.

-   *Using the figure above, update the code below to include the events that these three women participated in.*

```{r 1a}
# this code uses R's `matrix()` function to take a sequence of numbers
# and convert them to a 2-dimensional matrix. For this to work, it needs
# to know how many rows to expect  (`nrow=18`), and whether to fill the
# matrix column-by-column, or row-by-row (`byrow=TRUE`)
events <- matrix(c(
  1,1,1,1,1,1,0,1,1,0,0,0,0,0,
  1,1,1,0,1,1,1,1,0,0,0,0,0,0,
  0,1,1,1,1,1,1,1,1,0,0,0,0,0,
  1,0,1,1,1,1,1,1,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,0,0,0,0,0,0,
  0,0,1,0,1,1,0,1,0,0,0,0,0,0,
  0,0,0,0,1,1,1,1,0,0,0,0,0,0,
  0,0,0,0,0,1,0,1,1,0,0,0,0,0,
  0,0,0,0,0,0,0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,1,1,1,0,0,1,0,0,
  0,0,0,0,0,0,0,1,1,1,0,1,0,0,
  0,0,0,0,0,0,0,1,1,1,0,1,1,1,
  0,0,0,0,0,0,1,1,1,1,0,1,1,1,
  0,0,0,0,0,1,1,0,1,1,1,1,1,1,
  0,0,0,0,0,0,1,1,0,1,1,1,1,1,
  0,0,0,0,0,0,0,1,1,1,0,1,0,0,
  0,0,0,0,0,0,0,0,1,0,1,0,0,0,
  0,0,0,0,0,0,0,0,0,0,0,0,0,0
),nrow=18,byrow=TRUE)

# if we want the nodes representing people in the (eventual) network to be
# labeled, we can add row names to this matrix.

row.names(events) <- c(
  "Mrs Evelyn Jefferson",
  "Miss Laura Mandeville",
  "Miss Theresa Anderson",
  "Miss Brenda Rogers",
  "Miss Charlotte McDowd",
  "Miss Frances Anderson",
  "Miss Eleanor Nye",
  "Miss Pearl Oglethorpe",
  "Miss Ruth DeSand",
  "Miss Verne Sanderson",
  "Miss Myra Liddell",
  "Miss Katherine Rogers",
  "Mrs Sylvia Avondale",
  "Mrs Nora Fayette",
  "Mrs Helen Lloyd",
  "Mrs Dorothy Murchison",
  "Mrs Olivia Carleton",
  "Mrs Flora Price"
)
```

## Task 1B: Learning about people and events

Now that the table is represented in the matrix `events`, we can use R to tell us a bit more about it. The functions `rowSums(events)` and `colSums(events)` will add up all of the numbers in each row and in each column of `events`, respectively. Using these functions, answer the following questions. You should use R to calculate the results, rather than 'hard coding' the correct answer directly into the code.

-   *Which event was the most popular (had the most attendees)? Store this value to the variable `most_popular`.*

```{r 1b_1}
# your code here
```

-   *How many women attended exactly four events? Store this number into the variable `num_exactly_four`.*

```{r 1b_2}
# your code here
```

# Part 2: Building a person-to-person network

The matrix you created in Part 1 describes the relationships between individual people and events (this is called a *bipartite network*, which we'll talk about later in the semester). For the current analysis, we want a network that describes the relationships between the 18 women in the dataset. One way to infer such relationships is to ask how frequently the women attended the same event. Presumably, women who have a closer relationship will also attend more events together.

## Task 2A: Making a co-attendance matrix

Fortunately for us, there is a very basic technique from linear algebra that will turn an *affiliation matrix* like the one you created above into a *co-occurrence* matrix that will measure co-attendance at events. To do so, we will use [matrix multiplication](https://en.wikipedia.org/wiki/Matrix_multiplication). You don't need to know anything about linear algebra for this to work, and for the time being, you can just treat it like a magic wand (we will cover this more when we discuss bipartite networks later on).

-   *Use the following command to make a co-occurrence matrix for event attendance (this task doesn't actually require you to do anything 😄)*

```{r 2a}
# `%*%` is the R command for matrix multiplication,
# and the `t()` function transposes a matrix
co_attend <- events %*% t(events)
```

## Task 2B: Basic description and interpretation

Answer the following questions about your new `co_attend` matrix. Again, you should calculate the results with R rather than hard-coding them.

(*Hint*: you will probably need to use the R functions `nrow()` and `ncol()`, or the function `dim()`)

(*Another hint*: you can access specific elements in a matrix in R using "square brackets" to index--`x[1,2]` will return the number in the first row and second column of a matrix `x`)

-   *How many rows and columns does `co_attend` have (print the values with the `print()` function)? What do they represent?*

```{r 2b_a}
# your code here
```

-   *How many times did Mrs. Flora Price attend the same event with Miss Laura Mandeville? (print the value)*

```{r 2b_b}
# your code here
```

-   *How many events did Miss Brenda Rogers attend in total? (You should retrieve this information directly from the `co_attend` matrix)*

```{r 2b_c}
# your code here
```

## Task 2C: Simplifying the valued data

Many network methods assume that relations are *binary* (either exist or do not exist). The `co_attend` matrix you built above is *valued* (relations between the women can take different values). In this task you will simplify the valued network to create a binary network.

-   *Make a new matrix, called `co_attend_3plus` that has the same dimensions (number of rows and columns) as `co_attend`, but whose values are either `TRUE` or `FALSE`, indicating for each pair of women whether they have attended at least 3* events together. (Note: This one may be a little tricky for new R users. The operator `>=` will tell you whether the value on the left is at least as big as the value on the right.)

```{r 2c_a}
# your code here
```

-   *Did Miss Katherine Rogers attend at least three events with Mrs. Nora Fayette?*

```{r 2c_b}
# your code here
```

-   *Did Miss Katherine Rogers attend at least three events with Miss Eleanor Nye?*

```{r 2c_c}
# your code here
```

# Part 3: Visualizing the network

Now we will use the `igraph` package to build special network representations of our data, and to visualize those networks.

## Task 3A: Visualizing the valued network

The function `graph_from_adjacency_matrix()` from the `igraph` package converts an *adjacency matrix* like the co-attendance matrix you created above into a special `graph` object that can represent a lot more than a simple matrix can. The function automatically figures out how many nodes there are in the network by looking at the number of rows/columns in the matrix, and constructs edges between those nodes based on the data.

-   *Use the following command to created an undirected, weighted network from the `co_attend` matrix you already made. A weighted network is just a network where the edges have 'weights' assigned to them. Notice that the single command is split up across three lines of code! (Note: this question and the following one also don't require you to do anything)*

```{r 3a_a}
# The `diag=FALSE` argument tells R to ignore the *diagonal* of the matrix
# (the 1st row of the 1st column, 2nd row of 2nd column, etc), which would 
# create edges from each node back to itself. We will talk more about such
# `loop` edges later.
event_net <- graph_from_adjacency_matrix(
  co_attend, mode='undirected', 
  weighted=TRUE, diag=FALSE)
```

We will talk about plotting networks in detail later in the course, but for the moment we can mostly use the default options built into `igraph`.

-   *Use the following two commands to (a) tell `igraph` that we want the weight of the edges to be reflected in the width of the lines in the plot, and (b) to plot `event_net` using the default parameters. (Don't worry about how these commands work yet.)*

```{r 3a_b}
# create a new edge attribute called "width", which
# igraph will automatically use when visualizing
E(event_net)$width <- E(event_net)$weight

# use the generic `plot()` function
plot(event_net)
```

## Task 3B: Visualizing the binary network

We want to compare the valued network to the binary network you constructed above. This time, you will need to provide the R code to build the `graph` object and to plot it.

-   *Use `graph_from_adjacency_matrix()` to make a new object in R called `event_net_3plus`. (This command will look very similar to the one used to make `event_net` above, with some important differences)*

```{r 3b_a}
# your code here
```

-   *Plot `event_net_3plus`. (Does it make sense to vary the edge widths as you did above?)*

```{r 3b_b}
# your code here
```

## Task 3C: Discuss the visualizations

- Examine the two network visualizations you produced above (tasks 3A and 3B). Do you see any obvious structure? Do some women seem more "central" to the network than others (what would that mean?)? Are any clusters apparent? Do the two networks seem to display similar structure?


# Part 4: What kind of relation?

Consider the relations you defined in the networks `event_net` and `event_net_3plus` above.

-   Describe these relations in terms of their formal representation. Are they directed or undirected? Valued or binary? Dynamic or static?

    -   (your response here)

-   Now consider the significance of these relations in a real social setting. How do you think the women in the network would understand their relationship with someone they see a lot at social events versus someone they see rarely? What aspects of their social context could we infer from these network ties?

    -   (your response here)

-   Do the ties in the valued network (`event_net`) and those in the binary network (`event_net_3plus`) represent the same kind of relationship?

    -   (your response here)

# Part 5: Pipes or prisms?

Consider the "bond" and "flow" theoretical perspectives from Borgotti and Halgin (2011).

-   Discuss the co-attendance networks from the "flow" perspective. What kinds of things might flow through these relations? How would the numerical value of the relation (number of events) affect such flow?

    -   (your response here)

-   Now discuss the networks from the "bond" perspective. What kinds of social roles for the different women might you infer from the networks? Do you think this network could reveal aspects of how these women understand themselves in their social context?

    -   (your response here)

# References

Davis, Allison, Burleigh Bradford Gardner, and Mary R. Gardner. (1941) 2009. *Deep South: A Social Anthropological Study of Caste and Class*. Univ of South Carolina Press.