Visualising networks using Networkx, Pyvis and Community detection

In this article, we delve into the application of network analysis theory, a branch of mathematics and computer science, and apply this to the world of “Game of Thrones.” Using Python libraries such as NetworkX, Pandas, and Pyvis, we will examine the intricate character relationships within the popular series. This analysis will provide insights into the dynamics of the narrative, identify key characters, and reveal the underlying connections that shape the fascinating world of Westeros.

In the course of this exploration, we will:

Analyse the Game of Thrones dataset to understand the character interactions and associations.
Visualise the networks using Networkx and experiment with different layouts for the most readable representation.
Use Pyvis for interactive network visualisations and compare its effectiveness with Networkx.
Perform community detection to identify clusters or communities within the network based on the patterns of connections between nodes.

Dataset

The datasets we will use offer an interesting glimpse into the intricate character relationships within the Game of Thrones series books. There are five CSV files, each file representing one of the books in the series. In the datasets, each row represents a connection between two characters, showcasing their interactions and associations. The dataset provides insights into the dynamics of the narrative, with the “weight” column offering a measure of the relationship’s significance or intensity. By analysing this dataset, we can unravel the complex web of relationships, identify key characters, study the evolution of networks, and gain a deeper understanding of the series’ plot lines and character dynamics. It is a useful tool for discovering how characters interact and revealing the underlying connections that influence the fascinating world of Westeros.

Columns:

Source: This column identifies the character from which a relationship originates.
Target: This column designates the character at the receiving end of the relationship.
Type: The type column describes the nature of the connection, indicating that all relationships are undirected, implying mutual interactions.
Weight: The weight column assigns a numerical value to each relationship, providing a measure of its significance or intensity.
Book: This column specifies the book number, enabling differentiation of relationships across multiple books. The data sets can be found on my GitHub.

The data sets can be found here

Network Analysis Book 1

from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).

# Importing the required libraries
import networkx as nx
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Reading in the data of book 1
d1=pd.read_csv('/content/drive/MyDrive/Colab Notebooks/Complex Networks (blog)/book1.csv')

# Printing out the head of the data
d1

	Source	Target	Type	weight	book
0	Addam-Marbrand	Jaime-Lannister	Undirected	3	1
1	Addam-Marbrand	Tywin-Lannister	Undirected	6	1
2	Aegon-I-Targaryen	Daenerys-Targaryen	Undirected	5	1
3	Aegon-I-Targaryen	Eddard-Stark	Undirected	4	1
4	Aemon-Targaryen-(Maester-Aemon)	Alliser-Thorne	Undirected	4	1
...	...	...	...	...	...
679	Tyrion-Lannister	Willis-Wode	Undirected	4	1
680	Tyrion-Lannister	Yoren	Undirected	10	1
681	Tywin-Lannister	Varys	Undirected	4	1
682	Tywin-Lannister	Walder-Frey	Undirected	8	1
683	Waymar-Royce	Will-(prologue)	Undirected	18	1

684 rows × 5 columns

We can count the number of unique characters.

# Printing out the number of unique characters
print("Number of unique characters: ", len(d1['Source'].unique()))

Number of unique characters:  139

We can also calculate the average interactions per character.

# Number of interactions per character
interactions_per_character = d1.groupby('Source')['Target'].count()
print("Average number of interactions per character: ", round(interactions_per_character.mean(),3))

Average number of interactions per character:  4.921

Let’s plot the distribution of the number of interactions per character.

# Distribution of Interactions per Character
sns.set_style("whitegrid")

plt.figure(figsize=(10,6))
sns.histplot(interactions_per_character, bins=30, color='skyblue', edgecolor='black', kde=True)
plt.title('Distribution of Interactions per Character', fontsize=15)
plt.xlabel('Number of Interactions', fontsize=12)
plt.ylabel('Number of Characters', fontsize=12)
plt.show()

There is a strong right-skewed distribution with some outliers towards the right.

When calculating the top 10 characters, ranked on interactions, we can see which characters are outliers and are very central in this network.

# top 10 characters by number of interactions
top_characters = interactions_per_character.sort_values(ascending=False).head(10)
print("Top 10 characters by number of interactions: \n", top_characters)

Top 10 characters by number of interactions: 
 Source
Eddard-Stark          51
Catelyn-Stark         39
Bran-Stark            30
Arya-Stark            27
Cersei-Lannister      23
Joffrey-Baratheon     19
Daenerys-Targaryen    18
Jaime-Lannister       18
Jon-Snow              17
Drogo                 15
Name: Target, dtype: int64

Analyse the edges of the network

We can also analyse the edges rather than the nodes of a network. The code below will give you a list of the pairs of characters (edges) that have the most interactions (highest weights), and a histogram showing the distribution of edge weights. The edge weight is the sum of the ‘weight’ column for each pair of characters, which represents the number of interactions between them.

# Create a DataFrame that counts the number of interactions (weights) between each pair of characters
edge_weights = d1.groupby(['Source', 'Target'])['weight'].sum().reset_index(name='weight')

# Sort the DataFrame by the weight and display the top edges
top_edges = edge_weights.sort_values(by='weight', ascending=False).head(10)
print("Top 10 edges by weight: \n", top_edges)

Top 10 edges by weight: 
                  Source            Target  weight
329        Eddard-Stark  Robert-Baratheon     291
134          Bran-Stark        Robb-Stark     112
62           Arya-Stark       Sansa-Stark     104
249  Daenerys-Targaryen             Drogo     101
479   Joffrey-Baratheon       Sansa-Stark      87
504            Jon-Snow     Samwell-Tarly      81
454        Jeor-Mormont          Jon-Snow      81
320        Eddard-Stark     Petyr-Baelish      81
257  Daenerys-Targaryen     Jorah-Mormont      75
225    Cersei-Lannister  Robert-Baratheon      72

The top edge is between Eddard Stark and Robert Baratheon, with a weight of 291. This indicates that Eddard Stark and Robert Baratheon had 291 interactions, suggesting a significant level of connection or relationship between them. Ned Stark and Robert Baratheon had a close relationship that dated back to their youth. They were longtime friends and trusted allies. Their bond was formed during Robert’s Rebellion, a war that aimed to overthrow the Mad King Aerys II Targaryen and place Robert on the Iron Throne.

The second edge is between Bran Stark and Robb Stark, with a weight of 112. This implies that Bran Stark and Robb Stark had 112 interactions, indicating a strong connection between these two Stark brothers.

The third edge is between Arya Stark and Sansa Stark, with a weight of 104. This suggests that Arya Stark and Sansa Stark had 104 interactions, implying a significant level of connection or relationship between these two Stark sisters.

Let’s plot the distribution.

# Plot a histogram of edge weights
plt.figure(figsize=(10,6))
sns.histplot(edge_weights['weight'], bins=30, color='skyblue', edgecolor='black', kde=False)
plt.title('Distribution of Edge Weights', fontsize=15)
plt.xlabel('Edge Weight', fontsize=12)
plt.ylabel('Number of Edges', fontsize=12)
plt.show()

Visualise the networks using Networkx

NetworkX is a Python library that provides tools for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks. It is widely used for tasks related to network analysis and graph theory.

We will be using it to make visualisations.

First, we create an empty graph object and iterate through the data frame to add edges.

# Creating an empty graph object
b1 = nx.Graph()

# Iterating through the DataFrame to add edges
for _, edge in d1.iterrows():
    b1.add_edge(edge['Source'], edge['Target'], weight=edge['weight'])

# Printing out the number of nodes and edges in the graph
print("Total number of nodes: ", int(b1.number_of_nodes()))
print("Total number of edges: ", int(b1.number_of_edges()))

Total number of nodes:  187
Total number of edges:  684

I will plot the network of Book 1 using 3 different layouts from the Networkx library to see which one is the most readable.

Nx.draw(): default spring layout
Nx.draw_circular(): Nodes are positioned in a circle around the centre
Nx.draw_kamada_kawai(): Positions nodes using the force-directed method of Kamada and Kawai.

Nx.draw()

plt.figure(figsize =(20, 20))
nx.draw(b1, with_labels= True)

Nx.draw_circular()

plt.figure(figsize =(20, 20))
nx.draw_circular(b1, with_labels= True)

Nx.draw_kamada_kawai()

plt.figure(figsize =(20, 20))
nx.draw_kamada_kawai(b1, with_labels= True)

All of these graphs give some indication of how the network works, but I do not find it easy to read. Let us look for another visualisation library.

Visualise the networks using Pyvis

After some further research, I found the Pyvis library. Pyvis is a Python library to visualise networks using vis.js. It is a port of the popular R package, networkD3. It is primarily used in Jupyter Notebook, but it also has the ability to generate HTML files.

As I did not find the visualisation of the network using nx.draw() very readable, I will use Pyvis to visualise the network of Book 1.

from pyvis.network import Network

net = Network(notebook=True, height='950px', width='95%', bgcolor='#222222', font_color='white', cdn_resources='in_line')
net.repulsion()
node_degree = dict(b1.degree())

# set node size and atrributes
nx.set_node_attributes(b1, node_degree, 'size')


net.from_nx(b1)

net.save_graph("GameofThrones.html")

from IPython.display import HTML
HTML(filename="GameofThrones.html")

0%

$GoT_node_size.png$

The HTML file can be visualized and downloaded here. For the interactive visualization is worth it :)

It is already a nice visualisation, but it is still difficult to read. We can try to use community detection to see if it is possible to make it more readable.

Network community theory

Network community theory is a concept in social network analysis that examines how individuals or groups form and interact within communities or clusters within a larger network. It focuses on understanding the structure and dynamics of these communities and their implications for various social phenomena.

Community detection algorithms are used to identify clusters or communities within the network based on the patterns of connections between nodes.

The theory explores the notion that individuals within a community tend to be more closely connected to each other than to nodes outside the community. These communities can exhibit characteristics such as a higher density of connections, stronger ties between members, and shared interests or attributes.

We can easily do a community detection using the Community python library.

from community import community_louvain
communities = community_louvain.best_partition(b1)
communities

# Adding the community to the nodes
nx.set_node_attributes(b1, communities, 'group')

net = Network(notebook=True, height='950px', width='95%', bgcolor='#222222', font_color='white', cdn_resources='in_line')
net.repulsion()
node_degree = dict(b1.degree())

# set node size and atrributes
nx.set_node_attributes(b1, node_degree, 'size')


net.from_nx(b1)

net.save_graph("GameofThronesCommunities.html")

HTML(filename="GameofThronesCommunities.html")

0%

The HTML file can be visualized and downloaded here. For the interactive visualization is worth it :)

You can clearly see the different communities detected. Jon Snow and Daenerys Targaryen both clearly have their own network and social circle in Book 1. This can be explained by the fact that both characters are at the end of the world. Jon find himself on The Wall in the north, and Daenerys in Pentos, before heading even more east into Essos. The rest of the narrative mainly takes place in Westeros.

End note

In conclusion, the article demonstrated how network analysis techniques can be applied to analyse the character relationships in “Game of Thrones.” By visualising the networks, identifying key characters, and performing community detection, valuable insights were gained into the narrative dynamics and the underlying connections that shape the world of Westeros.

Game of Thrones ⚔️ 🐉: Visualising networks using Networkx, Pyvis and Community detection

Visualising networks using Networkx, Pyvis and Community detection

Dataset

Network Analysis Book 1

Analyse the edges of the network

Visualise the networks using Networkx

Nx.draw()

Nx.draw_circular()

Nx.draw_kamada_kawai()

Visualise the networks using Pyvis

Network community theory

End note