Lab 2: Working with networks in igraph

Peter McMahan for SOCI 424

Due Friday, Oct 9 by the 5pm, Eastern Daylight Time

To complete the lab, you should create an R script that will address all of the questions below. Responses should use plain, descriptive language to address the questions. The text of your responses can be included in the file as comments (putting a # character before a line of text will tell R to ignore that line). The following illustrates a good format:

###
# Question 1a
###
x <- 1:10

# The code above creates a vector of numbers from 1 to 10.
# These are some of my favorite numbers.

Face-to-face contact

This lab will introduce you to the basic functionalities of the igraph package in R, using a subset of the data from a 2016 study Kenyan households (Kiti et al. 2016). Participants in this study wore electronic sensors for three days, allowing researchers to record when they were within about 1.5 meters of one another. Sensor-based measurements like these are becoming more common in network research, as they allow for the ‘raw’ measurement of behavior and interaction.

Figure 1 from Kiti et al. (2016)

1. Load and inspect the network

Since the focus of this lab is on working with igraph networks, I have created the network for you and stored it as a GraphML file online.

1a.

Load the data directly from the internet using the following commands:

# load the igraph library
library(igraph)

# use the `read_graph()` function to load the data at the given web address.
# (note: the `format` argument is necessary to tell igraph to expect a GraphML file )
kenya <- read_graph('https://soci424.netlify.app/data/kenya.graphml',format='graphml')

1b.

Many objects in R have pretty good default visualizations, and igraph graph objects are no exception. Try plotting the kenya network using the plot() function. What kind of structure can you see? Are there distinct clusters in the network? Do some nodes seem more central than others? (Note: each time you use the plot() function the network will look a little bit different. Sometimes plotting multiple times will emphasize different aspects of the network!)

1c.

Visualization is all well and good, but frequently we can tell more about a network by looking at some measures.

1d.

The network plot above is somewhat deceptive, in that it shows all of the ties in the network as identical. In fact, it is showing a tie whenever two people have spent at least one minute within close proximity of each other. But many pairs spent much more time than that. This information is stored in an edge attribute that, in this case, is called “weight”. Use the command E(kenya)$weight to examine the number of minutes represented by each edge in the network.

Aside: The syntax you just used to access the number of minutes of interaction uses the general mechanism in igraph for accessing individual edges (ties) and vertices (nodes). E(kenya) tells R that you want to look at the entire edge sequence of kenya, and appending $weight to the command indicates that you want to retrieve the weight attribute of each of those edges. Try entering just E(kenya) on its own to see what R gives you.

While the function E() accesses the edge sequence of a graph, the similar function V() accesses the vertex sequence of a graph. This is what you will use in the next question.

1e.

This network also has vertex attributes, which represents information about the individual people in the network. You can see what vertex attributes a network has recorded using the command vertex_attr_names(kenya).

2. More sophisticated visualization

Now we will repeat the visualization of the network from 1b, but add some of the information from the edge and vertex attributes.

2a.

First, we will visualize the strength of the relations between actors (measured by the amount of time they spend around each other) by adjusting the edge width. Unless you tell it otherwise, igraph draws the edges with a width of 1. Use the following commands to set the width attribute of the edges in kenya to be equal to the weight attribute you looked at in 1d:

E(kenya)$width <- E(kenya)$weight

2b.

The default labels on the vertices (retrieved from the name attribute) are not very informative. Let’s label them with the household they live in instead.

2c.

Next, we will visualize the age of the actors in the network using the size of the vertex. By default, igraph uses a size of 15 for its vertices (for some reason…). Let’s start by representing the youngest people with smaller nodes. The age_cat attribute represents age categories:

Category 0: 0 to 5 years old

Category 1: 6 to 14 years old

Category 2: 15 to 19 years old

Category 3: 20 to 49 years old

Category 5: 50 years old and above

# first, we need to set a specific size for all of the vertices.
# We'll stick with the default of 15
V(kenya)$size <- 15
# Next, the square-bracket notation [] lets us set an attribute value
# for only a subset of the vertices.
# Let's set the youngest vertices (category 0) size to 5.
V(kenya)$size[V(kenya)$age_cat == 0] <- 5

2d.

Bonus question: Use the shape vertex attribute to represent the ‘M’ gender with circles and the ‘F’ gender with squares. (igraph recognizes the strings "circle" and "square" for setting vertex shapes)

3. Clusters

We will use the same strategy of modularity maximization that Shwed and Bearman (2010) used to find clusters in the network.

3a.

We will use the function cluster_optimal() to find maximum-modularity clusters.

3b.

Now we will represent the clusters with colors in the network visualization. This will require a few steps.

References

Kiti, Moses C., Michele Tizzoni, Timothy M. Kinyanjui, Dorothy C. Koech, Patrick K. Munywoki, Milosch Meriac, Luca Cappa, et al. 2016. “Quantifying Social Contacts in a Household Setting of Rural Kenya Using Wearable Proximity Sensors.” EPJ Data Science 5 (1): 1–21. https://doi.org/10.1140/epjds/s13688-016-0084-2.

Shwed, Uri, and Peter S. Bearman. 2010. “The Temporal Structure of Scientific Consensus Formation.” American Sociological Review 75 (6): 817–40. https://doi.org/10.1177/0003122410388488.