Lab 1: From observations to networks

Peter McMahan for SOCI 424

Due Monday, Sept. 28 by the start of class (8:35am, Eastern Daylight Time)

This lab will introduce you to network data representations in R. To complete the lab, you should create an R script that will address all of the questions below. Responses should use plain, descriptive language to address the questions. The text of your responses can be included in the file as comments (putting a # character before a line of text will tell R to ignore that line). The following illustrates a good format:

###
# Question 1a
###
x <- 1:10

# The code above creates a vector of numbers from 1 to 10.
# These are some of my favorite numbers.

Southern culture

The data we will use for this lab comes from the book Deep South: A Social Anthropological Study of Caste and Class (Davis, Gardner, and Gardner [1941] 2009). Among the book’s detailed accounts of early–20th century social systems in the Southern United States, it contains data on the participation of a sample of 18 upper-class white women in a series of 14 social events in 1936. The authors provide the following figure to illustrate the data:

Figure 3 from Davis, Gardner, and Gardner ([1941] 2009)

1. Reading the data

We will use the data presented by Davis, Gardner, and Gardner ([1941] 2009) to examine the social structure among the 18 women in the figure.

1a.

It is always a good idea to visually inspect a source of data before you start analyzing it in R. According to the figure above, what is the least and most number of social events that any woman participated in? Which women were most active (i.e. participated in the most events)?

1b.

Is there anything you can tell informally about the social structure among these women just by looking at the events table? Do you think the women are separated into different cliques? Do you think some of the people might be more central to the social life than others?

1c.

One way to represent data of this type in R is as a matrix with values 1 (representing participation) and 0 (representing non-participation). The following snippet creates such a matrix, with 18 rows and 14 columns. However, it is missing data for Miss Charlotte McDowd, Miss Ruth Desand, and Mrs. Flora Price, who are erroneously indicated to have not participated in any events. Using the figure above, update the code below to include participation for these three women. (You can copy and paste the code into your own file in RStudio and edit it there)

# this code uses R's `matrix()` function to take a sequence of numbers
# and convert them to a 2-dimensional matrix. For this to work, it needs
# to know how many rows to expect  (`nrow=18`), and whether to fill the
# matrix column-by-column, or row-by-row (`byrow=TRUE`)
events <- matrix(c(
  1,1,1,1,1,1,0,1,1,0,0,0,0,0,
  1,1,1,0,1,1,1,1,0,0,0,0,0,0,
  0,1,1,1,1,1,1,1,1,0,0,0,0,0,
  1,0,1,1,1,1,1,1,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,0,0,0,0,0,0,
  0,0,1,0,1,1,0,1,0,0,0,0,0,0,
  0,0,0,0,1,1,1,1,0,0,0,0,0,0,
  0,0,0,0,0,1,0,1,1,0,0,0,0,0,
  0,0,0,0,0,0,0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,1,1,1,0,0,1,0,0,
  0,0,0,0,0,0,0,1,1,1,0,1,0,0,
  0,0,0,0,0,0,0,1,1,1,0,1,1,1,
  0,0,0,0,0,0,1,1,1,1,0,1,1,1,
  0,0,0,0,0,1,1,0,1,1,1,1,1,1,
  0,0,0,0,0,0,1,1,0,1,1,1,1,1,
  0,0,0,0,0,0,0,1,1,1,0,1,0,0,
  0,0,0,0,0,0,0,0,1,0,1,0,0,0,
  0,0,0,0,0,0,0,0,0,0,0,0,0,0
),nrow=18,byrow=TRUE)

1d.

Now that the table is represented in the matrix events, we can use R to tell us a bit more about it. The functions rowSums(events) and colSums(events) will add up all of the numbers in each row and in each column of events, respectively. Using these functions, answer the following questions (show your code for each):

2. Person–person relations

One way to infer the relationships among these 18 women is to ask how frequently they attended the same event. Presumably, women who have a closer relationship will also attend more events together.

2a.

Fortunately for us, there is a very basic technique from linear algebra that will turn an affiliation matrix like the one you created above into a co-occurrence matrix that will measure co-attendance at events. To do so, we will use matrix multiplication. You don’t need to know anything about linear algebra for this to work, and for the time being, you can just treat it like a magic wand (we will cover this more in a later class). Use the following command to make a co-occurrence matrix for event attendance:

# `%*%` is the R command for matrix multiplication,
# and the `t()` function transposes a matrix
co_attend <- events %*% t(events)

2b.

Visually inspect this new matrix co_attend, either by typing it into the R console on its own or by clicking on it in the RStudio “environment” pane.

2c.

Use the max() and min() functions to determine the maximum and minimum number of events that any pair of women in this sample attended together.

2d.

Make a new matrix, called co_attend_3plus that has the same shape as co_attend, but whose values are either TRUE or FALSE, indicating for each pair of women whether they have attended at least 3 event together. (Note: This one may be a little tricky for new R users. The operator >= will tell you whether the value on the left is at least as big as the value on the right.)

3. Networks

You will now convert this data into network representations in order to visualize it for further analysis. I will gloss over some of the details here—we will cover these methods in detail later on.

Note: The following questions require you to have the igraph add-on package installed. This does not come with R by default, so you will have to install it yourself (if you haven’t already). To install igraph you can either:

After you have installed the package, you will then need to load it into your R session with the command library(igraph).

3a.

The function graph_from_adjacency_matrix() from the igraph package converts an adjacency matrix like the co-attendance matrix you created above into a graph object. It automatically figures out how many nodes there are in the network by looking at the number of rows/columns in the matrix, and constructs edges between those nodes based on the data. Use the following command to created an undirected, weighted network from the co_attend matrix you already made:

# The `diag=FALSE` argument tells R to ignore the *diagonal* of the matrix
# (the 1st row of the 1st column, 2nd row of 2nd column, etc), which would 
# create edges from each node back to itself. We will talk more about such
# `loop` edges later.
event_net <- graph_from_adjacency_matrix(
  co_attend, mode='undirected', 
  weighted=TRUE, diag=FALSE)

3b.

We will talk about plotting networks in detail later in the course, but for the moment we can mostly use the default options built into igraph. Use the following two commands to (a) tell igraph that we want the weight of the edges to be reflected in the width of the lines in the plot, and (b) to plot event_net using the default parameters. Don’t worry about how these commands work yet.

E(event_net)$width <- E(event_net)$weight
plot(event_net)

3c.

Look at the plot that is created. What kinds of patterns can you see? Who is central and who is peripheral to the social structure this network represents?

3d.

We are now going to create and plot an unweighted network from the matrix co_attend_3plus you created earlier based on a different measure of “relationship” between the women.

event_net_3plus <- graph_from_adjacency_matrix(
  co_attend_3plus, mode='undirected', diag=FALSE)
plot(event_net_3plus)

Execute the commands above and inspect this new network visually, then answer the following questions:

References

Davis, Allison, Burleigh Bradford Gardner, and Mary R. Gardner. (1941) 2009. Deep South: A Social Anthropological Study of Caste and Class. Univ of South Carolina Press.