{Network Analysis Tutorial  With Applications in }

Network Analysis Tutorial

With Applications in R  

Bruce A. Desmarais

Pennsylvania State University

1 Introduction

  • Provide a broad introduction to many concepts/methods (nothing in depth)

  • Present technical intuition and some essential math (no derivations)

  • Comments describing functions generally (see help for explanations of all options)

2 Introduction to R

2.1 Programming in R: First Steps

  • R  is a Command-line interpreted programming language

  • Commands executed sequentially by return (i.e., `enter’) or separated by ‘;’

  • Script files are formatted in plain text files (e.g. UTF-8) with extension “.R”

  • Comment heavily using ‘#’

# In R, functions are executed as '<function.name>(<input>)'
# <input> is a comma-separated list of arguments
# The exception is the 'print()' function, which can 
# be executed by typing the name of the object to 
# print and hitting enter
# Try
print(x='Hello World')
## [1] "Hello World"
# x is the only argument

2.2 Objects: Vectors and Matrices

  • In R  everything is an object.

  • R  environment - collection of objects accessible to R  in RAM

  • Vector - column of nubmers, characters, logicals (T/F)

# Vectors contain data of the same type
# Create a character vector
char_vec <- c('a','b','c')
# Look at it
char_vec
## [1] "a" "b" "c"
# Create a numeric vector
num_vec <- numeric(5)
num_vec
## [1] 0 0 0 0 0
# Change Values
num_vec[1] <- 4
num_vec[2:4] <- c(3,2,1) 
num_vec
## [1] 4 3 2 1 0
num_vec[5] <- '5'
num_vec
## [1] "4" "3" "2" "1" "5"
# Reference all but 3
num_vec[-3]
## [1] "4" "3" "1" "5"
num_vec
## [1] "4" "3" "2" "1" "5"
  • Matrix
# Create a matrix
MyMat <- matrix(1:25,nrow=5,ncol=5)
MyMat
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    6   11   16   21
## [2,]    2    7   12   17   22
## [3,]    3    8   13   18   23
## [4,]    4    9   14   19   24
## [5,]    5   10   15   20   25
# Access (or change) a cell
MyMat[1,3] 
## [1] 11
MyMat[2,4] <- 200 
MyMat[2,4]
## [1] 200
# Rows then columns
MyMat[1,]
## [1]  1  6 11 16 21
MyMat[,3] <- c(1,1,1,1,1)
MyMat[,3]
## [1] 1 1 1 1 1
MyMat
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    6    1   16   21
## [2,]    2    7    1  200   22
## [3,]    3    8    1   18   23
## [4,]    4    9    1   19   24
## [5,]    5   10    1   20   25
# Multiple rows/columns and negation
MyMat[1:3,-c(1:3)]
##      [,1] [,2]
## [1,]   16   21
## [2,]  200   22
## [3,]   18   23
# The matrix (shortcut for network objects)
MyMat[,]
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    6    1   16   21
## [2,]    2    7    1  200   22
## [3,]    3    8    1   18   23
## [4,]    4    9    1   19   24
## [5,]    5   10    1   20   25

2.3 Objects: Data Frames

  • Data Frames can hold columns of different types
# A Data Frame is the conventional object type for a dataset
## Create a data frame containing numbers and a character vector
## Construct a letter vector
let_vec <- c('a','b','c','d','e')

## Combine various objects into a data frame
dat <- data.frame(MyMat, num_vec,let_vec, stringsAsFactors=F)

## Create/override variable names
names(dat) <- c("mm1","mm2","mm3","mm4","mm5","nv","lv")

# Variables can be accessed with '/pre>
dat$lv
## [1] "a" "b" "c" "d" "e"
# Or with matrix-type column indexing
dat[,7]
## [1] "a" "b" "c" "d" "e"

2.4 R Packages

# Use install.packages() to install
# library() or require() to use the package
# install.packages('statnet') # - suite of great network analysis packages
# install.packages('igraph') # - other great network analysis package
library(statnet,quietly=T)
## network: Classes for Relational Data
## Version 1.13.0 created on 2015-08-31.
## copyright (c) 2005, Carter T. Butts, University of California-Irvine
##                     Mark S. Handcock, University of California -- Los Angeles
##                     David R. Hunter, Penn State University
##                     Martina Morris, University of Washington
##                     Skye Bender-deMoll, University of Washington
##  For citation information, type citation("network").
##  Type help("network-package") to get started.
## 
## ergm: version 3.7.1, created on 2017-03-20
## Copyright (c) 2017, Mark S. Handcock, University of California -- Los Angeles
##                     David R. Hunter, Penn State University
##                     Carter T. Butts, University of California -- Irvine
##                     Steven M. Goodreau, University of Washington
##                     Pavel N. Krivitsky, University of Wollongong
##                     Martina Morris, University of Washington
##                     with contributions from
##                     Li Wang
##                     Kirk Li, University of Washington
##                     Skye Bender-deMoll, University of Washington
## Based on "statnet" project software (statnet.org).
## For license and citation information see statnet.org/attribution
## or type citation("ergm").
## NOTE: Versions before 3.6.1 had a bug in the implementation of the
## bd() constriant which distorted the sampled distribution somewhat.
## In addition, Sampson's Monks datasets had mislabeled verteces. See
## the NEWS and the documentation for more details.
## 
## networkDynamic: version 0.9.0, created on 2016-01-12
## Copyright (c) 2016, Carter T. Butts, University of California -- Irvine
##                     Ayn Leslie-Cook, University of Washington
##                     Pavel N. Krivitsky, University of Wollongong
##                     Skye Bender-deMoll, University of Washington
##                     with contributions from
##                     Zack Almquist, University of California -- Irvine
##                     David R. Hunter, Penn State University
##                     Li Wang
##                     Kirk Li, University of Washington
##                     Steven M. Goodreau, University of Washington
##                     Jeffrey Horner
##                     Martina Morris, University of Washington
## Based on "statnet" project software (statnet.org).
## For license and citation information see statnet.org/attribution
## or type citation("networkDynamic").
## 
## tergm: version 3.4.0, created on 2016-03-28
## Copyright (c) 2016, Pavel N. Krivitsky, University of Wollongong
##                     Mark S. Handcock, University of California -- Los Angeles
##                     with contributions from
##                     David R. Hunter, Penn State University
##                     Steven M. Goodreau, University of Washington
##                     Martina Morris, University of Washington
##                     Nicole Bohme Carnegie, New York University
##                     Carter T. Butts, University of California -- Irvine
##                     Ayn Leslie-Cook, University of Washington
##                     Skye Bender-deMoll
##                     Li Wang
##                     Kirk Li, University of Washington
## Based on "statnet" project software (statnet.org).
## For license and citation information see statnet.org/attribution
## or type citation("tergm").
## 
## ergm.count: version 3.2.2, created on 2016-03-29
## Copyright (c) 2016, Pavel N. Krivitsky, University of Wollongong
##                     with contributions from
##                     Mark S. Handcock, University of California -- Los Angeles
##                     David R. Hunter, Penn State University
## Based on "statnet" project software (statnet.org).
## For license and citation information see statnet.org/attribution
## or type citation("ergm.count").
## NOTE: The form of the term 'CMP' has been changed in version 3.2
## of 'ergm.count'. See the news or help('CMP') for more information.
## sna: Tools for Social Network Analysis
## Version 2.4 created on 2016-07-23.
## copyright (c) 2005, Carter T. Butts, University of California-Irvine
##  For citation information, type citation("sna").
##  Type help(package="sna") to get started.
## 
## statnet: version 2016.9, created on 2016-08-29
## Copyright (c) 2016, Mark S. Handcock, University of California -- Los Angeles
##                     David R. Hunter, Penn State University
##                     Carter T. Butts, University of California -- Irvine
##                     Steven M. Goodreau, University of Washington
##                     Pavel N. Krivitsky, University of Wollongong
##                     Skye Bender-deMoll
##                     Martina Morris, University of Washington
## Based on "statnet" project software (statnet.org).
## For license and citation information see statnet.org/attribution
## or type citation("statnet").
## unable to reach CRAN
# R is OPEN SOURCE,  SO LOOK AT THE CODE !!!
network.size
## function (x) 
## {
##     if (!is.network(x)) 
##         stop("network.size requires an argument of class network.\n")
##     else get.network.attribute(x, "n")
## }
## <environment: namespace:network>
# And give credit to the authors
citation('statnet')
## 
## `statnet` is part of the Statnet suite of packages.  If you are
## using the `statnet` package for research that will be published,
## we request that you acknowledge this by citing the following. For
## BibTeX format, use toBibtex(citation("statnet")).
## 
## Handcock M, Hunter D, Butts C, Goodreau S, Krivitsky P,
## Bender-deMoll S and Morris M (2016). _statnet: Software Tools for
## the Statistical Analysis of Network Data_. The Statnet Project
## (<URL: http://www.statnet.org>). R package version 2016.9, <URL:
## CRAN.R-project.org/package=statnet>.
## 
## Handcock M, Hunter D, Butts C, Goodreau S and Morris M (2008).
## "statnet: Software Tools for the Representation, Visualization,
## Analysis and Simulation of Network Data." _Journal of Statistical
## Software_, *24*(1), pp. 1-11. <URL:
## http://www.jstatsoft.org/v24/i01>.
## 
## We have invested a lot of time and effort in creating the Statnet
## suite of packages for use by other researchers. Please cite it in
## all papers where it is used. The package statnet is made
## distributed under the terms of the license: GPL-3 + file LICENSE
# BibTeX Users
toBibtex(citation("statnet"))
## @Manual{,
##   author = {Mark S. Handcock and David R. Hunter and Carter T. Butts and Steven M. Goodreau and Pavel N. Krivitsky and Skye Bender-deMoll and Martina Morris},
##   title = {statnet: Software Tools for the Statistical Analysis of Network Data},
##   organization = {The Statnet Project (\url{http://www.statnet.org})},
##   year = {2016},
##   note = {R package version 2016.9},
##   url = {CRAN.R-project.org/package=statnet},
## }
## 
## @Article{,
##   title = {statnet: Software Tools for the Representation, Visualization, Analysis and Simulation of Network Data},
##   author = {Mark S. Handcock and David R. Hunter and Carter T. Butts and Steven M. Goodreau and Martina Morris},
##   journal = {Journal of Statistical Software},
##   year = {2008},
##   volume = {24},
##   number = {1},
##   pages = {1--11},
##   url = {http://www.jstatsoft.org/v24/i01},
## }

2.5 Interactions with the Hard Drive

# Saving dat as a .csv
write.csv(dat,'dat.csv', row.names=F)

# Loading it as such
dat2 <- read.csv('dat.csv',stringsAsFactors=F)

# Save to RData file and load
save(list=c("dat2","dat"),file="dat_and_dat2.RData")
load("dat_and_dat2.RData")
# understand that objects in the loaded objects will overwrite

# Save and load it all
save.image('ALL.RData')
load('ALL.RData')
# Beware of memory aggregation with save.image()!!

2.6 R can help

# When you know the function name exactly
help("evcent")
# or
?evcent

# Find help files containing a word
help.search("eigenvector")

R help files contain

  • function usage arguments

  • detailed description

  • values (objects) returned

  • links to related functions

  • references on the methods

  • error-free examples

2.7 Use R for graphics

# Initialize the plot
# first column of MyMat on x-axis
# second on y-axis
plot(MyMat[,1],MyMat[,2],pch=1:5,cex=3,col=1:5) 

# Draw a line
lines(MyMat[,1],MyMat[,2], lty =2,  col="grey45",lwd=3)

# Re-write the points
points(MyMat[,1],MyMat[,2],pch=1:5,cex=3, col=1:5)

# add a legend
legend("topleft", legend = c("Circle", "Triangle", "Plus", "Times", "Diamond"), col=1:5,pch=1:5)

# Check out all the other options
?par

# Save it this time (as a PDF)
pdf('myfirstplot.pdf')
plot(MyMat[,1],MyMat[,2],pch=1:5,cex=3,col=1:5)
lines(MyMat[,1],MyMat[,2], lty =2,  ,col="grey45",lwd=3)
points(MyMat[,1],MyMat[,2],pch=1:5,cex=3,col=1:5)
legend("topleft", legend = c("Circle", "Triangle", "Plus", "Times", "Diamond"), col=1:5,pch=1:5)
dev.off()
## quartz_off_screen 
##                 2

3 Introduction to Networks

3.1 Network Terminology and the Basics

  • Units in the network: Nodes, actors, or vertices

  • Relationships between nodes: edges, links, or ties

  • Pairs of actors: Dyads

  • Direction: Directed vs. Undirected (digraph vs. graph)

  • Tie value: Dichotomous/Binary, Valued/Weighted

  • Ties to Self: Loops

3.2 Network and Network Data Types

  • Many Modes with unconnected nodes: Bi/Multipartite
    • Affiliation Networks

    • Association/correlation Networks

    • Beware of Collapsed Modes!

  • Many relations among nodes: Multiplex

  • Data types
    • Have all the data: Network Census

    • Have links data from a sample of nodes: Ego Network

    • Sample along links starting with Ego: link tracing, snowball, respondent-driven

3.3 Network Data

  • Vertex-level Data: Vertex attributes (n rows and k columns)

  • Adjacency Matrix Data for each relation (n by n) matrix

  • Edgelist Data for each edge (e by p) matrix. p typically two—sender & receiver

# Read in adjacency matrices
## read.csv creates a data frame object from a CSV file
## Need to indicate that there's no header row in the CSV
advice <- read.csv("http://brucedesmarais.com/Advice.csv", header=F)

reportsto <- read.csv("http://brucedesmarais.com/ReportsTo.csv", header = F)

# Read in vertex attribute data
attributes <- read.csv("http://brucedesmarais.com/KrackhardtVLD.csv")

3.4.1 Creating Network Objects: Managers in a “Hi-Tech” Firm

# Read in the library for network analysis
library(network,quietly=T)

# Use the advice network dataset to create network object
adviceNet <- network(advice)

# Add the vertex attributes into the network
set.vertex.attribute(adviceNet,names(attributes),attributes)

# Add the organizational chart as a network variable
set.network.attribute(adviceNet,"reportsto",reportsto)

# Simple plot
## Set random number seed so the plot is replicable
set.seed(5)
## Plot the network
plot(adviceNet,displaylabels=T,label=get.vertex.attribute(adviceNet,"Level"),vertex.cex=2,label.cex=1,edge.col=rgb(150,150,150,100,maxColorValue=255),label.pos=5,vertex.col="lightblue")