---
title: 'SOCI424: Blockmodels'
author:
- (your name and role here)
- (your name and role here)
output:
html_document: default
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
# load the 'igraph' package
library(igraph)
```
# In Pairs:
## Part 1: Get data and load packages
### Migration
Today's worksheet will examine the block structure of international migration. The data we will use is derived from the UN estimates of migrant stock for the years 1990, 1995, 2000, 2005, 2010, and 2015 (United Nations, Department of Economic and Social Affairs, Population Division, 2015), and were collected from . The data was reduced to only include edge for the top 15 immigration sources and top 15 immigration destinations for each country in the data set. We will therefore treat the resulting edges as binary, which can be though of as representing an inter-national relation of "top immigration partner."
### Task 1A: Loading the data
The data is available in the two-CSV format you've seen several times before. The information on the nodes (nations) is located at , and the migration relations (as well as metadata) are at .
- *Use R's `read.csv()` function to download the relational data directly from the URLs above. You can name the vertex data frame `nodes` and the edge data frame `edges`. Inspect these data frames visually. What vertex attributes are included? What edge attributes?*
```{r 1a_1}
# (your code here)
```
- *Use `graph_from_data_frame()` to make a network object named `imm` from the two data frames, being sure to provide the `directed` argument to the function. How many vertices (nations) are there in the network? How many edges (immigration relations)?*
```{r 1a_2}
# (your code here)
```
- *Create a 'legible' visualization of the network--something that lets you see the overall structure of the network. You will need to adjust various vertex attributes (e.g. `size`, `label`, `label.cex`) and edge attributes (e.g. `width`, `arrow.size`, and `color`). Do you see any obvious structural features in the network visualization?*
```{r 1a_3}
# (your code here)
```
### Task 1B: Install and load `randnet`
There are several add-on packages in R that can be used to estimate stochastic block models (e.g. [`blockmodels`](https://cran.r-project.org/package=blockmodels) and [`sbmr`](https://tbilab.github.io/sbmr/)). We'll be using functions from the `randnet` package to identify groups using spectral clustering.
- If you haven't done so already, install `randnet`, either using _Packages_>_Install_ in RStudio, or using the `install.packages()` function. Then, load the package in the block below.
```{r 1b_1}
# (your code here)
```
- *Create a simple visualization of the network to see if there is any obvious structure to it. Adjust the vertex attribute `size` to change the size of the vertices, and the vertex attribute `label.cex` to adjust the font size of the labels (values of `label.cex` less than one will make the labels smaller, and values greater than one will make the labels larger. You can even suppress vertex labels altogether by setting the `label` attribute to `NA`). What patterns do the clustering in the network suggest? Can you think of a possible cause for this kind of structure?*
```{r 1b_2}
# (your code here)
```
## Part 2: Stochastic block models
### Task 2A: A 2-community block model
The function we'll be using to estimate the stochastic block models (SBMs) and the resulting community membership is `reg.SSP()`. At a minimum, this function requires two arguments: a network (represented as an adjacency matrix), and the number of communities to identify. (For those with a math/CS background who are interested, the method it uses is a form of _spectral clustering_ which first considers each row of the adjacency matrix to be a point in a high-dimensional space, and then performs _K-means clustering_ on those points).
- *Since `reg.SSP()` expects an adjacency matrix, the first step is to create an adjacency-matrix representation of the graph `imm`. The function `as_adjacency_matrix()` will do this, but you will need to specify `sparse=FALSE` to get a standard matrix. Create a new object call `am` that is the adjacency matrix representation of `imm`*
```{r 2a_1}
# (your code here)
```
- *Next, use `reg.SSP()` to estimate a two-community SBM. This function yields a list with two items in it: (1) `cluster` is a vector describing community membership for each node in the network, and (2) `loss` is a number summarizing how well the clustering algorithm fits the data. In the block below, calculate the value of `reg.SPP()` on the adjacency matrix, and store the resulting `cluster` vector as a vertex attribute on `imm` named `membership` (i.e. once you're done `V(imm)$membership` should tell you whether each vertex is a member of community 1 or community 2)*
```{r 2a_2}
# (your code here)
```
- *In the code block below, list the names of the countries in block 1 and the names of the countries in block 2 separately. Do you notice any similarities between the countries in the same block? Given what you know about international migration, does the 2-community SBM discover meaningful structure in the network?*
```{r 2a_3}
# (your code here)
```
In the block below I've defined two functions for you to use in the next task:
i. `block_plot()` creates a visual representation of the blocked adjacency matrix analogous to table 1.11 from the Doreian (2005) reading. The rows an the columns of the adjacency matrix are rearranged to reflect the identified block structure, allowing you to visualize the density of relations within and between blocks.
ii. `block_density()` collapses this information into a K-by-K matrix of edge probabilities within each block (where K is the number of communities). The values in the matrix represent the probability (0.0 to 1.0) of an edge within blocks (on the diagonal) or between blocks (off-diagonal).
Each of these functions expects an `igraph` graph object, and expects there to be a `membership` vertex attribute. Run the following block as-is before continuing to the next task.
```{r block_functions}
block_plot <- function(g){
if(!('membership' %in% vertex_attr_names(g))){
stop("No 'membership' vertex attribute")
}
if(any(is.na(V(g)$membership))){
stop("Every vertex must belong to a membership")
}
# gather info
n <- vcount(g)
b <- V(g)$membership
K <- length(unique(b))
# get sorted adjacency matrix
am <- as_adjacency_matrix(g,sparse=FALSE)
am <- am[order(b),order(b)]
# empty plot
plot(NA,xlim=c(0,1),ylim=c(0,1),
xaxt='n',yaxt='n',bty='n',
xlab='',ylab='')
# add rectangles for positive entries in `am`
x <- (col(am)[am>0] - 1)/n
y <- (row(am)[am>0] - 1)/n
rect(x,1-y,x+1/n,1-(y+1/n),col='black')
# draw boundaries
borders <- (cumsum(table(b)))/n
for(bord in borders[1:(K-1)]){
lines(c(0,1),1-c(bord,bord),col='red')
lines(c(bord,bord),c(0,1),col='red')
}
# label clusters
axis(2,
at=1-(c(0,borders[1:(K-1)]) + borders)/2,
labels=names(borders),
tick=FALSE,cex=2,line=-1.5
)
}
block_density <- function(g){
if(!('membership' %in% vertex_attr_names(g))){
stop("No 'membership' vertex attribute")
}
if(any(is.na(V(g)$membership))){
stop("Every vertex must belong to a membership")
}
# gather info
n <- vcount(g)
b <- V(g)$membership
K <- length(unique(b))
# get sorted adjacency matrix
am <- as_adjacency_matrix(g,sparse=FALSE)
am <- am[order(b),order(b)]
b <- sort(b)
# make results matrix
res <- matrix(NA,nrow=K,ncol=K)
# fill it in
for(i in 1:K){
iidx <- b==i
for(j in 1:K){
jidx = b==j
res[i,j] <- mean(am[iidx,jidx])
}
}
return(res)
}
```
- *Use `block_plot()` and `block_density` to summarize the two-community structure you produced above. Does the migration network seem to display a 'coreâ€“periphery' structure, with a densely connected core and a sparsely connected periphery?*
```{r 2a_4}
# (your code here)
```
### Task 2B: More communities
The quality of the estimates from an SBM depends crucially on the number of communities in the model. In this task, you will consider different numbers of communities.
- *Repeat the tasks from 2A for a higher value of K (number of communities) and describe the results: calculate the SBM membership, examine the names of the countries in each community, and visualize the block structure. (You can try out different values of K to see how they look, but you only need to report the results of one of those values) Does the result paint a different picture of international migration than the two-community model did?*
```{r 2b_1}
# (your code here)
```
- *The `randnet` package provides a function called `BHMC.estimate()` that tries to heuristically and algorithmically find an "optimal" number of communities. In the code block below, use this function to find a value for K, and examine the resulting block model. You should again use `block_plot()` and `block_density()`, as well as look at the membership of the individual blocks. What is the structure of international migration describe by the results? Are there any particularly interesting diagonal blocks or off-diagonal blocks? Are there specific communities of nations that seem to play a distinctive role in the network?*
```{r 2b_2}
# (your code here)
```
# References
United Nations, Department of Economic and Social Affairs, Population Division (2015). "Trends in International Migrant Stock: The 2015 Revision." (United Nations database, POP/DB/MIG/Stock/Rev.2015),