Network Analysis Tutorial

With Applications in R

Bruce A. Desmarais

Pennsylvania State University

1 Introduction

Provide a broad introduction to many concepts/methods (nothing in depth)
Present technical intuition and some essential math (no derivations)
Comments describing functions generally (see help for explanations of all options)

2 Introduction to R

2.1 Programming in R: First Steps

R is a Command-line interpreted programming language
Commands executed sequentially by return (i.e., `enter’) or separated by ‘;’
Script files are formatted in plain text files (e.g. UTF-8) with extension “.R”
Comment heavily using ‘#’

# In R, functions are executed as '<function.name>(<input>)'
# <input> is a comma-separated list of arguments
# The exception is the 'print()' function, which can 
# be executed by typing the name of the object to 
# print and hitting enter
# Try
print(x='Hello World')

## [1] "Hello World"

# x is the only argument

2.2 Objects: Vectors and Matrices

In R everything is an object.
R environment - collection of objects accessible to R in RAM
Vector - column of nubmers, characters, logicals (T/F)

# Vectors contain data of the same type
# Create a character vector
char_vec <- c('a','b','c')
# Look at it
char_vec

## [1] "a" "b" "c"

# Create a numeric vector
num_vec <- numeric(5)
num_vec

## [1] 0 0 0 0 0

# Change Values
num_vec[1] <- 4
num_vec[2:4] <- c(3,2,1) 
num_vec

## [1] 4 3 2 1 0

num_vec[5] <- '5'
num_vec

## [1] "4" "3" "2" "1" "5"

# Reference all but 3
num_vec[-3]

## [1] "4" "3" "1" "5"

num_vec

## [1] "4" "3" "2" "1" "5"

Matrix

# Create a matrix
MyMat <- matrix(1:25,nrow=5,ncol=5)
MyMat

##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    6   11   16   21
## [2,]    2    7   12   17   22
## [3,]    3    8   13   18   23
## [4,]    4    9   14   19   24
## [5,]    5   10   15   20   25

# Access (or change) a cell
MyMat[1,3]

## [1] 11

MyMat[2,4] <- 200 
MyMat[2,4]

## [1] 200

# Rows then columns
MyMat[1,]

## [1]  1  6 11 16 21

MyMat[,3] <- c(1,1,1,1,1)
MyMat[,3]

## [1] 1 1 1 1 1

MyMat

##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    6    1   16   21
## [2,]    2    7    1  200   22
## [3,]    3    8    1   18   23
## [4,]    4    9    1   19   24
## [5,]    5   10    1   20   25

# Multiple rows/columns and negation
MyMat[1:3,-c(1:3)]

##      [,1] [,2]
## [1,]   16   21
## [2,]  200   22
## [3,]   18   23

# The matrix (shortcut for network objects)
MyMat[,]

##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    6    1   16   21
## [2,]    2    7    1  200   22
## [3,]    3    8    1   18   23
## [4,]    4    9    1   19   24
## [5,]    5   10    1   20   25

2.3 Objects: Data Frames

Data Frames can hold columns of different types

# A Data Frame is the conventional object type for a dataset
## Create a data frame containing numbers and a character vector
## Construct a letter vector
let_vec <- c('a','b','c','d','e')

## Combine various objects into a data frame
dat <- data.frame(MyMat, num_vec,let_vec, stringsAsFactors=F)

## Create/override variable names
names(dat) <- c("mm1","mm2","mm3","mm4","mm5","nv","lv")

# Variables can be accessed with '/pre>
dat$lv

## [1] "a" "b" "c" "d" "e"

# Or with matrix-type column indexing
dat[,7]

## [1] "a" "b" "c" "d" "e"

2.4 R Packages

# Use install.packages() to install
# library() or require() to use the package
# install.packages('statnet') # - suite of great network analysis packages
# install.packages('igraph') # - other great network analysis package
library(statnet,quietly=T)

## network: Classes for Relational Data
## Version 1.13.0 created on 2015-08-31.
## copyright (c) 2005, Carter T. Butts, University of California-Irvine
##                     Mark S. Handcock, University of California -- Los Angeles
##                     David R. Hunter, Penn State University
##                     Martina Morris, University of Washington
##                     Skye Bender-deMoll, University of Washington
##  For citation information, type citation("network").
##  Type help("network-package") to get started.

## 
## ergm: version 3.7.1, created on 2017-03-20
## Copyright (c) 2017, Mark S. Handcock, University of California -- Los Angeles
##                     David R. Hunter, Penn State University
##                     Carter T. Butts, University of California -- Irvine
##                     Steven M. Goodreau, University of Washington
##                     Pavel N. Krivitsky, University of Wollongong
##                     Martina Morris, University of Washington
##                     with contributions from
##                     Li Wang
##                     Kirk Li, University of Washington
##                     Skye Bender-deMoll, University of Washington
## Based on "statnet" project software (statnet.org).
## For license and citation information see statnet.org/attribution
## or type citation("ergm").

## NOTE: Versions before 3.6.1 had a bug in the implementation of the
## bd() constriant which distorted the sampled distribution somewhat.
## In addition, Sampson's Monks datasets had mislabeled verteces. See
## the NEWS and the documentation for more details.

## 
## networkDynamic: version 0.9.0, created on 2016-01-12
## Copyright (c) 2016, Carter T. Butts, University of California -- Irvine
##                     Ayn Leslie-Cook, University of Washington
##                     Pavel N. Krivitsky, University of Wollongong
##                     Skye Bender-deMoll, University of Washington
##                     with contributions from
##                     Zack Almquist, University of California -- Irvine
##                     David R. Hunter, Penn State University
##                     Li Wang
##                     Kirk Li, University of Washington
##                     Steven M. Goodreau, University of Washington
##                     Jeffrey Horner
##                     Martina Morris, University of Washington
## Based on "statnet" project software (statnet.org).
## For license and citation information see statnet.org/attribution
## or type citation("networkDynamic").

## 
## tergm: version 3.4.0, created on 2016-03-28
## Copyright (c) 2016, Pavel N. Krivitsky, University of Wollongong
##                     Mark S. Handcock, University of California -- Los Angeles
##                     with contributions from
##                     David R. Hunter, Penn State University
##                     Steven M. Goodreau, University of Washington
##                     Martina Morris, University of Washington
##                     Nicole Bohme Carnegie, New York University
##                     Carter T. Butts, University of California -- Irvine
##                     Ayn Leslie-Cook, University of Washington
##                     Skye Bender-deMoll
##                     Li Wang
##                     Kirk Li, University of Washington
## Based on "statnet" project software (statnet.org).
## For license and citation information see statnet.org/attribution
## or type citation("tergm").

## 
## ergm.count: version 3.2.2, created on 2016-03-29
## Copyright (c) 2016, Pavel N. Krivitsky, University of Wollongong
##                     with contributions from
##                     Mark S. Handcock, University of California -- Los Angeles
##                     David R. Hunter, Penn State University
## Based on "statnet" project software (statnet.org).
## For license and citation information see statnet.org/attribution
## or type citation("ergm.count").

## NOTE: The form of the term 'CMP' has been changed in version 3.2
## of 'ergm.count'. See the news or help('CMP') for more information.

## sna: Tools for Social Network Analysis
## Version 2.4 created on 2016-07-23.
## copyright (c) 2005, Carter T. Butts, University of California-Irvine
##  For citation information, type citation("sna").
##  Type help(package="sna") to get started.

## 
## statnet: version 2016.9, created on 2016-08-29
## Copyright (c) 2016, Mark S. Handcock, University of California -- Los Angeles
##                     David R. Hunter, Penn State University
##                     Carter T. Butts, University of California -- Irvine
##                     Steven M. Goodreau, University of Washington
##                     Pavel N. Krivitsky, University of Wollongong
##                     Skye Bender-deMoll
##                     Martina Morris, University of Washington
## Based on "statnet" project software (statnet.org).
## For license and citation information see statnet.org/attribution
## or type citation("statnet").

## unable to reach CRAN

# R is OPEN SOURCE,  SO LOOK AT THE CODE !!!
network.size

## function (x) 
## {
##     if (!is.network(x)) 
##         stop("network.size requires an argument of class network.\n")
##     else get.network.attribute(x, "n")
## }
## <environment: namespace:network>

# And give credit to the authors
citation('statnet')

## 
## `statnet` is part of the Statnet suite of packages.  If you are
## using the `statnet` package for research that will be published,
## we request that you acknowledge this by citing the following. For
## BibTeX format, use toBibtex(citation("statnet")).
## 
## Handcock M, Hunter D, Butts C, Goodreau S, Krivitsky P,
## Bender-deMoll S and Morris M (2016). _statnet: Software Tools for
## the Statistical Analysis of Network Data_. The Statnet Project
## (<URL: http://www.statnet.org>). R package version 2016.9, <URL:
## CRAN.R-project.org/package=statnet>.
## 
## Handcock M, Hunter D, Butts C, Goodreau S and Morris M (2008).
## "statnet: Software Tools for the Representation, Visualization,
## Analysis and Simulation of Network Data." _Journal of Statistical
## Software_, *24*(1), pp. 1-11. <URL:
## http://www.jstatsoft.org/v24/i01>.
## 
## We have invested a lot of time and effort in creating the Statnet
## suite of packages for use by other researchers. Please cite it in
## all papers where it is used. The package statnet is made
## distributed under the terms of the license: GPL-3 + file LICENSE

# BibTeX Users
toBibtex(citation("statnet"))

## @Manual{,
##   author = {Mark S. Handcock and David R. Hunter and Carter T. Butts and Steven M. Goodreau and Pavel N. Krivitsky and Skye Bender-deMoll and Martina Morris},
##   title = {statnet: Software Tools for the Statistical Analysis of Network Data},
##   organization = {The Statnet Project (\url{http://www.statnet.org})},
##   year = {2016},
##   note = {R package version 2016.9},
##   url = {CRAN.R-project.org/package=statnet},
## }
## 
## @Article{,
##   title = {statnet: Software Tools for the Representation, Visualization, Analysis and Simulation of Network Data},
##   author = {Mark S. Handcock and David R. Hunter and Carter T. Butts and Steven M. Goodreau and Martina Morris},
##   journal = {Journal of Statistical Software},
##   year = {2008},
##   volume = {24},
##   number = {1},
##   pages = {1--11},
##   url = {http://www.jstatsoft.org/v24/i01},
## }

2.5 Interactions with the Hard Drive

# Saving dat as a .csv
write.csv(dat,'dat.csv', row.names=F)

# Loading it as such
dat2 <- read.csv('dat.csv',stringsAsFactors=F)

# Save to RData file and load
save(list=c("dat2","dat"),file="dat_and_dat2.RData")
load("dat_and_dat2.RData")
# understand that objects in the loaded objects will overwrite

# Save and load it all
save.image('ALL.RData')
load('ALL.RData')
# Beware of memory aggregation with save.image()!!

2.6 R can help

# When you know the function name exactly
help("evcent")
# or
?evcent

# Find help files containing a word
help.search("eigenvector")

R help files contain

function usage arguments
detailed description
values (objects) returned
links to related functions
references on the methods
error-free examples

2.7 Use R for graphics

# Initialize the plot
# first column of MyMat on x-axis
# second on y-axis
plot(MyMat[,1],MyMat[,2],pch=1:5,cex=3,col=1:5) 

# Draw a line
lines(MyMat[,1],MyMat[,2], lty =2,  col="grey45",lwd=3)

# Re-write the points
points(MyMat[,1],MyMat[,2],pch=1:5,cex=3, col=1:5)

# add a legend
legend("topleft", legend = c("Circle", "Triangle", "Plus", "Times", "Diamond"), col=1:5,pch=1:5)

# Check out all the other options
?par

# Save it this time (as a PDF)
pdf('myfirstplot.pdf')
plot(MyMat[,1],MyMat[,2],pch=1:5,cex=3,col=1:5)
lines(MyMat[,1],MyMat[,2], lty =2,  ,col="grey45",lwd=3)
points(MyMat[,1],MyMat[,2],pch=1:5,cex=3,col=1:5)
legend("topleft", legend = c("Circle", "Triangle", "Plus", "Times", "Diamond"), col=1:5,pch=1:5)
dev.off()

## quartz_off_screen 
##                 2

3 Introduction to Networks

3.1 Network Terminology and the Basics

Units in the network: Nodes, actors, or vertices
Relationships between nodes: edges, links, or ties
Pairs of actors: Dyads
Direction: Directed vs. Undirected (digraph vs. graph)
Tie value: Dichotomous/Binary, Valued/Weighted
Ties to Self: Loops

3.2 Network and Network Data Types

Many Modes with unconnected nodes: Bi/Multipartite
- Affiliation Networks
- Association/correlation Networks
- Beware of Collapsed Modes!
Many relations among nodes: Multiplex
Data types
- Have all the data: Network Census
- Have links data from a sample of nodes: Ego Network
- Sample along links starting with Ego: link tracing, snowball, respondent-driven

3.3 Network Data

Vertex-level Data: Vertex attributes (n rows and k columns)
Adjacency Matrix Data for each relation (n by n) matrix
Edgelist Data for each edge (e by p) matrix. p typically two—sender & receiver

# Read in adjacency matrices
## read.csv creates a data frame object from a CSV file
## Need to indicate that there's no header row in the CSV
advice <- read.csv("http://brucedesmarais.com/Advice.csv", header=F)

reportsto <- read.csv("http://brucedesmarais.com/ReportsTo.csv", header = F)

# Read in vertex attribute data
attributes <- read.csv("http://brucedesmarais.com/KrackhardtVLD.csv")

3.4.1 Creating Network Objects: Managers in a “Hi-Tech” Firm

# Read in the library for network analysis
library(network,quietly=T)

# Use the advice network dataset to create network object
adviceNet <- network(advice)

# Add the vertex attributes into the network
set.vertex.attribute(adviceNet,names(attributes),attributes)

# Add the organizational chart as a network variable
set.network.attribute(adviceNet,"reportsto",reportsto)

# Simple plot
## Set random number seed so the plot is replicable
set.seed(5)
## Plot the network
plot(adviceNet,displaylabels=T,label=get.vertex.attribute(adviceNet,"Level"),vertex.cex=2,label.cex=1,edge.col=rgb(150,150,150,100,maxColorValue=255),label.pos=5,vertex.col="lightblue")

# check out all the options with ?plot.network

3.4.2 Creating Network Objects: Defense Pacts (edgelist) (2000)

# Read in vertex dataset
allyV <- read.csv("http://brucedesmarais.com/allyVLD.csv",stringsAsFactors=F)

# Read in edgelist
allyEL <- read.csv("http://brucedesmarais.com/allyEL.csv", stringsAsFactors=F)

# Read in contiguity
contig <- read.csv("http://brucedesmarais.com/contiguity.csv",stringsAsFactors=F,row.names=1)

require(network)
# (1) Initialize network
# store number of vertices
n <- nrow(allyV)
AllyNet <- network.initialize(n,dir=F)

# (2) Set vertex labels
network.vertex.names(AllyNet)  <- allyV$stateabb

# (3) Add in the edges
# Note, edgelist must match vertex labels
AllyNet[as.matrix(allyEL)]  <- 1

# (4) Store country code attribute
set.vertex.attribute(x=AllyNet,             # Network in which to store
            "ccode",            # What to name the attribute
            allyV$ccode)            # Values to put in

# (5) Store year attribute
set.vertex.attribute(AllyNet,"created",allyV$styear)

# (6) Store network attribute
set.network.attribute(AllyNet,"contiguous",as.matrix(contig))

# Simple plot
plot(AllyNet,displaylabels=T,label.cex=.5,edge.col=rgb(150,150,150,100,maxColorValue=255))

# check out all the options with ?plot.network

4 The Individual Level: Actor Position Analysis

4.1 Connectedness: Degree Centrality

require(sna)
# (in-) Degree Centrality is the number of in-connections by node
dc <- degree(adviceNet, cmode="indegree")

# Store in vertex level data frame
attributes$dc <- dc

# Plot degree centrality against age
## Make a simple scatter plot
par(cex=2,las=1)
plot(attributes$Tenure,attributes$dc)
## Add a trend (i.e., regression) line
abline(lm(attributes$dc ~ attributes$Tenure))

# Plot network with node size proportional to Degree Centrality
## First normalize degree 
ndc <- dc/max(dc)
## Set random number seed so the plot is replicable
set.seed(5)
## Now plot
plot(adviceNet,displaylabels=T,label=get.vertex.attribute(adviceNet,"Level"),vertex.cex=3*ndc,label.cex=1,edge.col=rgb(150,150,150,100,maxColorValue=255),label.pos=5,vertex.col="lightblue")

4.2 Connectedness: Eigenvector Centrality

x = λ^-1Ax

# Eigenvector Centrality Recursively Considers Neighbors' Centrality
ec <- evcent(adviceNet)

# Store in vertex level data frame
attributes$ec <- ec

# Plot eigenvector centrality against age
## Make a simple scatter plot
par(cex=2,las=1)
plot(attributes$Tenure,attributes$ec)
## Add a trend (i.e., regression) line
abline(lm(attributes$ec ~ attributes$Tenure))

# Plot network with node size proportional to eigenvector centrality
## First normalize
nec <- ec/max(ec)
## Set random number seed so the plot is replicable
set.seed(5)
## Now plot
plot(adviceNet,displaylabels=T,label=get.vertex.attribute(adviceNet,"Level"),vertex.cex=3*nec,label.cex=1,edge.col=rgb(150,150,150,100,maxColorValue=255),label.pos=5,vertex.col="lightblue")

4.3 Connectedness: Betweenness Centrality

# Betweenness Centrality Considers unlikely connections
# Proportion of shortest paths that pass through a vertex
bc <- betweenness(adviceNet,rescale=T)

# Store in vertex level data frame
attributes$bc <- bc

# Plot eigenvector centrality against age
## Make a simple scatter plot
par(cex=2,las=1)
plot(attributes$Tenure,attributes$bc)
## Add a trend (i.e., regression) line
abline(lm(attributes$bc ~ attributes$Tenure))

# Plot network with node size proportional to betweenness centrality
## First normalize
nbc <- bc/max(bc)
## Set random number seed so the plot is replicable
set.seed(5)
## Now plot
plot(adviceNet,displaylabels=T,label=get.vertex.attribute(adviceNet,"Level"),vertex.cex=3*nbc,label.cex=1,edge.col=rgb(150,150,150,100,maxColorValue=255),label.pos=5,vertex.col="lightblue")

4.4 Comparing Centrality Measures

# DC vs. EC
par(cex=2,las=1)
plot(dc,ec)

# DC vs. BC
par(cex=2,las=1)
plot(dc,bc)

# BC vs. EC
par(cex=2,las=1)
plot(bc,ec)

# Correlations among all of them
cor(cbind(ec,bc,dc))

##           ec        bc         dc
## ec  1.000000 0.4320920 -0.3164510
## bc  0.432092 1.0000000  0.5436799
## dc -0.316451 0.5436799  1.0000000

4.5 Embeddedness: Clustering Coefficient

Clustering coefficient is the proportion of potential ties among a node’s neightbors that exist.

# Read in library for clustering coefficient
require(igraph,quietly=T)

## 
## Attaching package: 'igraph'

## The following objects are masked from 'package:sna':
## 
##     betweenness, bonpow, closeness, components, degree,
##     dyad.census, evcent, hierarchy, is.connected, neighborhood,
##     triad.census

## The following objects are masked from 'package:network':
## 
##     %c%, %s%, add.edges, add.vertices, delete.edges,
##     delete.vertices, get.edge.attribute, get.edges,
##     get.vertex.attribute, is.bipartite, is.directed,
##     list.edge.attributes, list.vertex.attributes,
##     set.edge.attribute, set.vertex.attribute

## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum

## The following object is masked from 'package:base':
## 
##     union

# Compute local transitivity, i.e., the clustering clef
anet <- graph.adjacency(adviceNet[,])
cc <- transitivity(anet,type="local")

# Store in data frame
attributes$cc <- cc
attributes

##    Age Tenure Level Department dc         ec           bc        cc
## 1   33      9     3          4 13 0.13887187 0.0511034401 0.5263158
## 2   42     20     2          4 18 0.04122224 0.0220658524 0.5666667
## 3   40     13     3          2  5 0.28317710 0.0245530182 0.5105263
## 4   33      8     3          4  8 0.21911327 0.0509618222 0.3473684
## 5   32      3     3          2  5 0.29140287 0.0188794477 0.4210526
## 6   59     28     3          1 10 0.02466730 0.0000000000 0.6909091
## 7   55     30     1          0 13 0.11196954 0.1026936921 0.3047619
## 8   34     11     3          1 10 0.16751801 0.0147754765 0.4967320
## 9   62      5     3          2  4 0.21796754 0.0146987667 0.5514706
## 10  37      9     3          3  9 0.34421295 0.0680179383 0.3162055
## 11  46     27     3          3 11 0.03538648 0.0044550658 0.7252747
## 12  34      9     3          1  7 0.03823355 0.0009441199 0.4722222
## 13  48      0     3          2  4 0.14344220 0.0033191715 0.5555556
## 14  43     10     2          2 10 0.09198919 0.0021891780 0.4835165
## 15  40      8     3          2  4 0.41860823 0.0227975453 0.4528986
## 16  27      5     3          4  8 0.11228627 0.0026022305 0.4848485
## 17  30     12     3          1  9 0.08660158 0.0094116953 0.4945055
## 18  33      9     2          3 15 0.40245199 0.3305452292 0.2016129
## 19  32      5     3          2  4 0.28747761 0.0028028560 0.4571429
## 20  38     12     3          2  8 0.21341504 0.0296630672 0.4842105
## 21  36     13     2          1 15 0.20359254 0.2235203871 0.2430769

# Remove igraph before using statnet functions
detach(package:igraph)

# Plot network with node size proportional to clustering clef
## First normalize
ncc <- cc/max(cc)
## Set random number seed so the plot is replicable
set.seed(5)
## Now plot
plot(adviceNet,displaylabels=T,label=get.vertex.attribute(adviceNet,"Level"),vertex.cex=3*ncc,label.cex=1,edge.col=rgb(150,150,150,100,maxColorValue=255),label.pos=5,vertex.col="lightblue")

# Correlations among all of them
cor(cbind(ec,bc,dc,cc))

##            ec         bc         dc         cc
## ec  1.0000000  0.4320920 -0.3164510 -0.5872217
## bc  0.4320920  1.0000000  0.5436799 -0.7536546
## dc -0.3164510  0.5436799  1.0000000 -0.2204078
## cc -0.5872217 -0.7536546 -0.2204078  1.0000000

5 Group-Level Analysis: Communities and Clusters

5.1 Clustering by Structural Equivalence

# Blockmodeling is the Classical SNA Approach
# Goal is to group nodes based on structural equivalence

## Create clusters based on structural equivalence
eclusts <- equiv.clust(adviceNet)

## First check out a dendrogram to eyeball the number of clusters
plot(eclusts)

# Run a block model identifying six groups
adviceBlockM <- blockmodel(adviceNet, eclusts, k=6)

# Create block membership vector and colors
## Extract block memberships
bmems <- adviceBlockM$block.membership[adviceBlockM$order.vec]
## Create group colors
colVec <- c("black","white","red","blue","yellow","gray60")
## Assign colors to individual nodes based on block membership
bcols <- colVec[bmems]

set.seed(5)
## Now plot
plot(adviceNet,displaylabels=T,label=get.vertex.attribute(adviceNet,"Level"),vertex.cex=2,label.cex=1,edge.col=rgb(150,150,150,100,maxColorValue=255),label.pos=5,vertex.col=bcols)

5.2 Clustering Based on Modularity: Community Detection

Modularity = 1/(2m)Σ_ij[A_ij-k_ik_j/(2m)]1(c_i=c_j)

k is the degree
m is the number of edges in the network
c is the group index
1 is the indicator function (i.e., are the groups of i and j equal)

# Modularity-based community detection popular in physics
# Modularity = Dense within communities, sparse across 
library(igraph,quietly=T)

## 
## Attaching package: 'igraph'

## The following objects are masked from 'package:sna':
## 
##     betweenness, bonpow, closeness, components, degree,
##     dyad.census, evcent, hierarchy, is.connected, neighborhood,
##     triad.census

## The following objects are masked from 'package:network':
## 
##     %c%, %s%, add.edges, add.vertices, delete.edges,
##     delete.vertices, get.edge.attribute, get.edges,
##     get.vertex.attribute, is.bipartite, is.directed,
##     list.edge.attributes, list.vertex.attributes,
##     set.edge.attribute, set.vertex.attribute

## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum

## The following object is masked from 'package:base':
## 
##     union

# Convert into a graph
anet <- graph.adjacency(adviceNet[,])

## Use semi-greedy splitting and merging
mem <- spinglass.community(anet)$membership

# Check number of communities
max(mem)

## [1] 2

# Get memberships and plot
detach("package:igraph")
bcols <- c("lightblue","yellow")
set.seed(5)
## Now plot
plot(adviceNet,displaylabels=T,label=get.vertex.attribute(adviceNet,"Department"),vertex.cex=2,label.cex=1,edge.col=rgb(150,150,150,100,maxColorValue=255),label.pos=5,vertex.col=bcols)

6 Network-Level: Testing Structural Hypotheses

6.1 Introduction

Now assume the network is stochastic
We might have hypotheses about the stochastic process
- Nodes that are similar are likely to form ties: Homophily
- Directed edges are likely to be reciprocated: Reciprocity
- A friend of a friend is a friend: Transitivity

6.3 μ=0, the z-test

Suppose x is a large univariate sample of size n, and

T(x) =

√n

n
∑
i=1

x_i

Our null is that X has finite positive variance and a mean of zero.

–>

6.3 H₀ for a network…

Maximum Entropy Null Distribution

Uniform/Equal Probability of Every Network

6.5 Conditional Uniform Graph Testing: Comparing Observed to Null

# Conditional Uniform Graph Tests
# "CUG" tests allow you to control for features of the observed network in the null
# We should test for transitivity (is there even a reason for community detection?)
# gtrans function in sna package measures graph transitivity
ctDens <- cug.test(adviceNet,gtrans,cmode=c("edges"),reps=500)

# Check results
ctDens

## 
## Univariate Conditional Uniform Graph Test
## 
## Conditioning Method: edges 
## Graph Type: digraph 
## Diagonal Used: FALSE 
## Replications: 500 
## 
## Observed Value: 0.6639785 
## Pr(X>=Obs): 0 
## Pr(X<=Obs): 1

# Now lets look at something else
# Do more experienced managers have higher in-degree centrality?

# function to estimate correlation between in-degree and node attribute
indegCor <-  function(net,attr){
    require(sna)
    cor(degree(net,cmode="indegree"),attr)
}

# See the additional argument in cmode, now controlling for dyad census
ctDens2 <- cug.test(adviceNet,indegCor,cmode=c("dyad.census"),reps=500, FUN.args=list(attr = attributes$Tenure))

# Check results
ctDens2

## 
## Univariate Conditional Uniform Graph Test
## 
## Conditioning Method: dyad.census 
## Graph Type: digraph 
## Diagonal Used: FALSE 
## Replications: 500 
## 
## Observed Value: 0.5493177 
## Pr(X>=Obs): 0.002 
## Pr(X<=Obs): 0.998

## Can we incorporate both? 
## Need ERGM for that!

7 Independent exercise with the allliance network

What are the 5 most central states, according to each centrality measure?
Do community detection and blockmodeling result in substantially different partitions?
Is the age of the state significantly correlated with the number of alliance ties?