---
title: 'SOCI424/624: Dyads, triads, and homophily'
author: 'Peter McMahan'
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
# load the 'igraph' package
library(igraph)
```
# Analysis
## Part 1: Dyads and triads
### Preschooler ethology
Today, we'll be using data on the interactions of a group a preschool students from Strayer and Trudel's (1984) study. The researchers observed several groups of children, aged 1-6, in a Montreal daycare over the course of two years. They catalogued two types of behavior: 'agonistic' acts (things like biting, making faces, and stealing), and 'affiliative' acts (things like following, smiling, and holding hands). This was part of a larger body of literature from the 1970s and 1980s that used ethological methods of animal behavioral analysis to examine the behavior of children.
The relations for this data are stored as simple lists of edges. For example, the first few lines of the agonistic relations file look like this:
1, 2
1, 3
1, 4
1, 5
...
indicating that child 1 displayed agonistic behavior toward child 2, 3, 4, 5, etc. For the current analysis, I have simplified the data to represent unvalued (binary) relations. For the agonistic data an edge represents at least one observed agonistic act. Affinitive (friendly) behavior was much more common among the students, so an edge here represents at least *five* observed affinitive acts.
There are three files you will need to load for this worksheet:
- The agonistic data is available at
- The affinitive data is available at
- The vertex attributes are at
### Task 1A: Loading the data
- *Use R's `read.csv()` function to download all three of these file directly into data frames representing edges and vertices. Name the agonistic data frame `ag_edges`, the affinitive data frame `af_edges`, and the vertex attributes `children`.*
```{r 1a_1}
ag_edges <- read.csv("https://soci424.netlify.app/data/preschool/agonism.csv")
af_edges <- read.csv("https://soci424.netlify.app/data/preschool/affinity.csv")
children <- read.csv("https://soci424.netlify.app/data/preschool/children.csv")
```
- *Inspect the two relational data frames (either in the console or in RSudio's table viewer). In the space below, calculate how many agonistic relations and how many affinitive relations there are between these children.*
```{r 1a_2}
# agonistic
num_ag <- nrow(ag_edges)
print(num_ag)
# affinitive
num_af <- nrow(af_edges)
print(num_af)
```
*There are a total of `r num_ag` agonistic and `r num_af` affinitive relations recorded in the data*
- *Use the `graph_from_data_frame()` function in R to create two separate network objects: `agnet` for the agonistic network and `afnet` for the affiliative network. You will want to provide two arguments: (1) a data frame with network relations, and (2) a data frame with vertex attributes (specified as `vertices`).*
```{r 1a_3}
agnet <- graph_from_data_frame(ag_edges, vertices = children)
afnet <- graph_from_data_frame(af_edges, vertices = children)
```
- *Igraph provides functions `vcount()` and `ecount()` to count the number of vertices (nodes) and edges (relations) in a network. Write code below to answer the following questions: How many children are in the network? (the answer should be the same for both networks) How many relations does each network have? (the answer should be different for both networks.*
```{r 1a_4}
# number of children
print(vcount(agnet))
print(vcount(afnet))
# number of edges
print(ecount(agnet))
print(ecount(afnet))
```
*The networks have the same number of children: `r vcount(agnet)`. The affinitive network has `r ecount(afnet)` edges while the agonisitic network has `r ecount(agnet)` edges.
### Task 1B: Dyadic and triadic analysis
In this task you will learn about the dyads (pairs of children) in the network.
- *Use igraph's `dyad_census()` function to run a dyad census on each of the two networks. What do the numbers mean? Do you notice a difference in the prevalence of certain dyads between the two networks?*
```{r 1b_1}
# agonistic network
ag_dyads <- dyad_census(agnet)
print(ag_dyads)
#affinitive network
af_dyads <- dyad_census(afnet)
print(af_dyads)
```
*While both networks have similar numbers of null dyads (`r ag_dyads$null` and `r af_dyads$null`), the proportions of mutual verus asymmetric dyads differ significantly. The affinitive network has roughly equal numbers of mutual (`r af_dyads$mut`) and asymmetric (`r af_dyads$asym`) dyads. But in the agonistic network, the number of mutual dyads (`r ag_dyads$mut`) is barely a fifth of the number of asymmetric dyads (`r ag_dyads$asym`).*
Recall that the reciprocity of a network is the probability that any given directed edge in that network is reciprocated. That is, if the edge A→B exists, reciprocity is the probability that the edge B→A also exists.
- *Use igraph's `reciprocity()` function to calculate the reciprocity of the two networks. Is there a large difference? How should these numbers be interpreted?*
```{r 1b_2}
ag_rec <- reciprocity(agnet)
print(ag_rec)
af_rec <- reciprocity(afnet)
print(af_rec)
```
_There is a stark difference between the reciprocity in the two networks. In the afinitive network there is a `r round(af_rec,2) * 100`% chance that any given edge is reciproctated. But in the agonistic network, that probabilty is only `r round(ag_rec,2) * 100`%. Clearly, affinitive relationships are much more likely to be reciprocated than agonistic relationships._
- *Calculate the transitivity of each set of relations. Is there a large difference?*
```{r 1b_3}
ag_trans <- transitivity(agnet)
print(ag_trans)
af_trans <- transitivity(afnet)
print(af_trans)
```
_There is not a large difference in transitivity score between the agonistic (`r ag_trans`) and affinitive (`r af_trans`) networks._
- ***Bonus question for more experienced R users**: Holland and Leinhardt (1971) characterized strongly hierarchical networks as those that lacked a set of intransitive triads (021C, 030C, 111D, 111U, 120C, 201, 210D), arguing that these seven triads would be rare in such networks. Using the `triad_census()` function (and the documentation to see which triad is which), characterize the prevalence of these intransitive triads in the agonisitic and affinitive networks. By this metric, is one of these networks more hierarchical than the other?*
```{r 1b_4}
# get the triad census for each network
ag_triads <- triad_census(agnet)
af_triads <- triad_census(afnet)
# make reading these vectors easier to read with
# labels from the documentation
triad_labels <- c(
'003','012','102','021D','021U','021C',
'111D','111U','030T','030C','201',
'120D','120U','120C','210','300'
)
names(ag_triads) <- triad_labels
names(af_triads) <- triad_labels
# which are forbidden?
# NOTE: there is a typo in the prompt: 210D should just be 210
forbidden <- c('021C', '030C', '111D', '111U', '120C', '201', '210')
is_forbidden <- triad_labels %in% forbidden
# get proportions of forbidden triads
ag_prop_forbidden <- sum(ag_triads[is_forbidden]) / sum(ag_triads)
af_prop_forbidden <- sum(af_triads[is_forbidden]) / sum(af_triads)
print(ag_prop_forbidden)
print(af_prop_forbidden)
```
_In the affinitive network, almost half (`r round(af_prop_forbidden, 3) * 100`%) of the triads are of the non-hierarchical type. In the agonistic network, this number drops down to about a quarter (`r round(ag_prop_forbidden, 3) * 100`%). This suggests that the agonistic network might display a more hierarchical structure than the affinitive network._
## Part 2: Gender homophily
Strayer and Trudel also reported data on the gender of the children they studied. They provide no description of how the data on gender was collected, but, as was common in studies from the period, each child's gender was treated unproblematically as either "♂" or "♀". We will refer to these as "boy" and "girl", respectively.
Data on the children's gender was provided in the vertex attribute file you downloaded at the beginning of this worksheet, and was included in `agnet` and `afnet` when you constructed the graphs.
### Task 2A: Adding vertex attributes
The igraph function `V()` lets you interact with *vertex sequences* in a network. So, e.g., `V(afnet)` will list the 16 vertices (nodes) in the affinity network. The `V()` function also lets you read or alter vertex *attributes* — information (such as gender) about each of the vertices in a network. Igraph uses the "dollar sign" notation to access attributes of vertices, so `V(afnet)$name` would give you the "name" associated with each vertex in the network (by default, the vertex names are just numbers from 1 to the number of vertices). You can also use this syntax to create new attributes, or to alter existing ones: `V(afnet)$species <- 'human'` would create a new "species" vertex attribute, and set it to "human" for every child in the network.
- *Examine the `gender` vertex attribute for both networks. How many girls are there in each network?*
```{r 2a_2}
print( V(agnet)$gender )
print( V(afnet)$gender )
# count girls
print( sum(V(agnet)$gender == 'G'))
print( sum(V(afnet)$gender == 'G'))
```
_Each network has `r sum(V(afnet)$gender == 'G')` girls._
- *In the next task, we will also need a numeric representation of the gender categories. The following code will create a new attribute on both networks called `gender_cat` that takes a value of 1 for boys and 2 for girls. (You don't need to change this code, but you should look at it and make sure you understand what it is doing)*
```{r 2a_3}
# agonistic network
V(agnet)$gender_cat <- 1
V(agnet)$gender_cat[V(agnet)$gender=='G'] <- 2
# affinity network
V(afnet)$gender_cat <- 1
V(afnet)$gender_cat[V(agnet)$gender=='G'] <- 2
```
### Task 2B: Characterizing homophily
- *As we discussed in class, one way to characterize the overall level of homophily in a network is with `assortativity`. Use the `assortativity_nominal()` function to calculate the gender assortativity of the agonism and affinity networks. (Note: this function requires two arguments---the network and the category vertex attribute. Use `V(agnet)$gender_cat` or `V(afnet)$gender_cat` for this second argument.) Is there a difference between the two networks' values?*
```{r 2b_1}
ag_assort <- assortativity_nominal(agnet, V(agnet)$gender_cat)
print(ag_assort)
af_assort <- assortativity_nominal(afnet, V(afnet)$gender_cat)
print(af_assort)
```
_There is a significant difference in assortativity between the two networks. The affinitive network's assortativity of `r af_assort` suggests a positive gender homophily. However, the agonistic network is slightly _dis_assortative (`r ag_assort`) -- agonistic relations are slightly more likely to cross gender boundaries._
Another common way to discuss homophily is to examine *homophilous* (within-category) and *heterophilous* (between-category) relations. The `E()` function in igraph is an analog to the `V()` function that will allow us to look at the sequence of *edges* in the network and to create subsets of those edges based on type. The next few prompts will walk you through using the `E()` function to count inter- and intra-gender relations.
- *We will first need to enumerate the vertices in each gender category. You can do this using the `V()` function and its special indexing syntax. For example `V(agnet)[gender=='B']` will return the set of vertices whose `gender` attribute is 'B'. Create two vertex sequences, `boys` and `girls`, that contain only the boys and girls, respectively. (You should only need to do this for one of the networks, since they describe the same set of children.)*
```{r 2b_2}
# NOTE: the prompt incorrectly says that you only
# need to create boy and girl sequences for one network.
# In fact, you need to make a separate vector for each network.
agboys <- V(agnet)[gender=='B']
aggirls <- V(agnet)[gender=='G']
afboys <- V(afnet)[gender=='B']
afgirls <- V(afnet)[gender=='G']
```
- *Now enumerate the within-gender (gender homophilous) and between-gender (gender heterophilous) ties in each of your networks. You can do this easily using the indexing syntax from the `E()` function. As an example: `E(agnet)[boys %--% boys]` will enumerate the edges that connect a boy to a boy in the agonistic network. How many edges are homophilous within each gender category in each of the two networks?*
```{r 2b_3}
# agonistic :
ag_bb <- E(agnet)[agboys %--% agboys]
ag_bg <- E(agnet)[agboys %--% aggirls]
ag_gg <- E(agnet)[aggirls %--% aggirls]
# Bonus (optional): make sure we're not missing any edges.
# The `stopifnot` function raises an error if the
# expression inside doesn't evaluate as TRUE
stopifnot(
ecount(agnet) == length(ag_bb) + length(ag_bg) + length(ag_gg)
)
# affinitive :
af_bb <- E(afnet)[afboys %--% afboys]
af_bg <- E(afnet)[afboys %--% afgirls]
af_gg <- E(afnet)[afgirls %--% afgirls]
```
_In the agonistic network (`r ecount(agnet)` edges total), `r length(ag_bb)` relations are among boys and `r length(ag_gg)` are among girls. In the affinitive network (`r ecount(afnet)` edges total), `r length(af_bb)` relations are among boys and `r length(af_gg)` are among girls_
- ***Bonus question for more experienced R users**: Using the information you have on gender of the children, calculate the baseline homophily for boys and for girls (i.e. the expected homophily if relations were random). Then calculate the actual homophily for boys and girls within the agonistic and affinity networks. Are these networks more or less homophilous than the baseline level?*
```{r 2b_4}
# calculate the probability of boy-boy and girl-girl edges
# in a totally random network (not allowing loops)
# probabiliy of a boy-boy edge is the probability
# of first picking a boy at random, and then picking
# a boy at random from the remaining students
prob_bb <-
(length(agboys) / vcount(agnet)) * # boy at random
((length(agboys) - 1) / (vcount(agnet) - 1)) # another boy
prob_bg <-
2 * # double this number to accounto for boy-girl and girl-boy
(length(agboys) / vcount(agnet)) * # boy at random
(length(aggirls) / (vcount(agnet) - 1)) # a girl
prob_gg <-
(length(aggirls) / vcount(agnet)) * # girl at random
((length(aggirls) - 1) / (vcount(agnet) - 1)) # another girl
# calculate the most likely proportion of homophilous relations
homophily_baseline <- (prob_bb + prob_gg)
print(homophily_baseline)
# actual proportion for agnet
homophily_agnet <- (length(ag_bb) + length(ag_gg)) / ecount(agnet)
print(homophily_agnet)
# actual proportion for afnet
homophily_afnet <- (length(af_bb) + length(af_gg)) / ecount(afnet)
print(homophily_afnet)
```
_Baseline homophily for this network (defined as expected proportion of gender-homophilous edges in a random network) is `r homophily_baseline`. The agonistic network is less homophilous than baseline, suggesting it is heterophilous, while the affinitive network is considerably more homophilous than baseline._
## Part 3: Visualizing vertex attributers
### Task 3A: Using vertex attributes for visualization
- *Vertex attributes in igraph do a lot of work. They describe empirical characteristics of vertices (e.g. gender), but you can also use vertex attributes to describe the visualization of nodes in a network when plotting. Create a new vertex attribute called `color`, and set it to `"green"` for all of the vertices in both networks. Plot the networks to see what they look like.*
```{r 3a_1}
V(agnet)$color <- 'green'
plot(agnet)
```
```{r 3a_1_pt2}
V(afnet)$color <- 'green'
plot(afnet)
```
- *Now change the `color` attribute only for the boys, to make them visually distinct from the girls. (You can choose any color---the `colors()` command will list the ridiculously long list of named colors in R). Plot the networks again. Keeping in mind the caveats about network visualizations from class, what do the visualizations tell you? Do you see any gendered patterns? Are there any obvious differences between those gendered patterns between the two networks?*
```{r 3a_2}
V(agnet)[agboys]$color <- "orchid4"
plot(agnet)
```
```{r 3a_2_pt2}
V(afnet)[afboys]$color <- "orchid4"
plot(afnet)
```
_While both networks show what appears to be some sort of gender clustering, the pattern seems a bit different. In the affiliative network, the gnders seem to be clustered somewhat separately from each other. In the agonistic network, the boys form a 'core' in the center of the network -- perhaps reflecting more cross-gender agonistic relations._
# Discussion:
- In part 1 above, you described features of the dyads (e.g. reciprocity) and triads (e.g. transitivity) of the two networks. Talk about the differences you observed between the agonistic and affinity networks in these structural measures. Considering the types of relations described (and what you know about preschool-aged children), are these patterns what you would expect? Do you think that transitivity (which ignores the direction of edges) describes the same kinds of structural patterns in the agonistic versus the affinity networks?
- (your response)
- In the example above, and in most of the examples from the reading, homophily was analyzed on *categorical* variables---gender, race, political affiliation, etc. Does the concept of homophily require categorical distinctions? What would homophily look like with less reductive conceptions of, say, race or gender that recognize complexity of the social distinctions they describe (e.g. multiracial or gender non-binary people)? How might you measure such homophily?
- (your response)
- It is often the case in network analysis that different network statistics are related to one another in surprising ways. Thinking about networks in general, how would a tendency toward *homophily* encourage *transitivity* in a network? How would a tendency toward *transitivity* encourage *homophily*? Are transitivity and homophily separate structuring "forces" in a network, or just different aspects of a single underlying force?
- (your response)
# References
Holland, Paul W., and Samuel Leinhardt. "Transitivity in Structural Models of Small Groups." *Small Group Research* 2, no. 2 (1971): 107--24. .
Strayer, F. F., and M. Trudel. "Developmental Changes in the Nature and Function of Social Dominance among Young Children." *Ethology and Sociobiology*, The Study of Adaptiveness of Aggressive, Dominance, and Conflict Resolution Strategies in Humans and Nonhuman Primates, 5, no. 4 (January 1, 1984): 279--95. [https://doi.org/10.1016/0162-3095(84)90007-4](https://doi.org/10.1016/0162-3095(84)90007-4){.uri}.