Game of Thrones ⚔️ 🐉: Visualising networks using Networkx, Pyvis and Community detection

LucaSantagata

Visualising networks using Networkx, Pyvis and Community detection

In this article, we delve into the application of network analysis theory, a branch of mathematics and computer science, and apply this to the world of “Game of Thrones.” Using Python libraries such as NetworkX, Pandas, and Pyvis, we will examine the intricate character relationships within the popular series. This analysis will provide insights into the dynamics of the narrative, identify key characters, and reveal the underlying connections that shape the fascinating world of Westeros.

In the course of this exploration, we will:

  • Analyse the Game of Thrones dataset to understand the character interactions and associations.
  • Visualise the networks using Networkx and experiment with different layouts for the most readable representation.
  • Use Pyvis for interactive network visualisations and compare its effectiveness with Networkx.
  • Perform community detection to identify clusters or communities within the network based on the patterns of connections between nodes.

Dataset

The datasets we will use offer an interesting glimpse into the intricate character relationships within the Game of Thrones series books. There are five CSV files, each file representing one of the books in the series. In the datasets, each row represents a connection between two characters, showcasing their interactions and associations. The dataset provides insights into the dynamics of the narrative, with the “weight” column offering a measure of the relationship’s significance or intensity. By analysing this dataset, we can unravel the complex web of relationships, identify key characters, study the evolution of networks, and gain a deeper understanding of the series’ plot lines and character dynamics. It is a useful tool for discovering how characters interact and revealing the underlying connections that influence the fascinating world of Westeros.

Columns:

  • Source: This column identifies the character from which a relationship originates.
  • Target: This column designates the character at the receiving end of the relationship.
  • Type: The type column describes the nature of the connection, indicating that all relationships are undirected, implying mutual interactions.
  • Weight: The weight column assigns a numerical value to each relationship, providing a measure of its significance or intensity.
  • Book: This column specifies the book number, enabling differentiation of relationships across multiple books. The data sets can be found on my GitHub.

The data sets can be found here

Network Analysis Book 1

from google.colab import drive
drive.mount('/content/drive')
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
# Importing the required libraries
import networkx as nx
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Reading in the data of book 1
d1=pd.read_csv('/content/drive/MyDrive/Colab Notebooks/Complex Networks (blog)/book1.csv')

# Printing out the head of the data
d1
Source Target Type weight book
0 Addam-Marbrand Jaime-Lannister Undirected 3 1
1 Addam-Marbrand Tywin-Lannister Undirected 6 1
2 Aegon-I-Targaryen Daenerys-Targaryen Undirected 5 1
3 Aegon-I-Targaryen Eddard-Stark Undirected 4 1
4 Aemon-Targaryen-(Maester-Aemon) Alliser-Thorne Undirected 4 1
... ... ... ... ... ...
679 Tyrion-Lannister Willis-Wode Undirected 4 1
680 Tyrion-Lannister Yoren Undirected 10 1
681 Tywin-Lannister Varys Undirected 4 1
682 Tywin-Lannister Walder-Frey Undirected 8 1
683 Waymar-Royce Will-(prologue) Undirected 18 1

684 rows × 5 columns

We can count the number of unique characters.

# Printing out the number of unique characters
print("Number of unique characters: ", len(d1['Source'].unique()))
Number of unique characters:  139

We can also calculate the average interactions per character.

# Number of interactions per character
interactions_per_character = d1.groupby('Source')['Target'].count()
print("Average number of interactions per character: ", round(interactions_per_character.mean(),3))
Average number of interactions per character:  4.921

Let’s plot the distribution of the number of interactions per character.

# Distribution of Interactions per Character
sns.set_style("whitegrid")

plt.figure(figsize=(10,6))
sns.histplot(interactions_per_character, bins=30, color='skyblue', edgecolor='black', kde=True)
plt.title('Distribution of Interactions per Character', fontsize=15)
plt.xlabel('Number of Interactions', fontsize=12)
plt.ylabel('Number of Characters', fontsize=12)
plt.show()

There is a strong right-skewed distribution with some outliers towards the right.

When calculating the top 10 characters, ranked on interactions, we can see which characters are outliers and are very central in this network.

# top 10 characters by number of interactions
top_characters = interactions_per_character.sort_values(ascending=False).head(10)
print("Top 10 characters by number of interactions: \n", top_characters)
Top 10 characters by number of interactions: 
 Source
Eddard-Stark          51
Catelyn-Stark         39
Bran-Stark            30
Arya-Stark            27
Cersei-Lannister      23
Joffrey-Baratheon     19
Daenerys-Targaryen    18
Jaime-Lannister       18
Jon-Snow              17
Drogo                 15
Name: Target, dtype: int64

Analyse the edges of the network

We can also analyse the edges rather than the nodes of a network. The code below will give you a list of the pairs of characters (edges) that have the most interactions (highest weights), and a histogram showing the distribution of edge weights. The edge weight is the sum of the ‘weight’ column for each pair of characters, which represents the number of interactions between them.

# Create a DataFrame that counts the number of interactions (weights) between each pair of characters
edge_weights = d1.groupby(['Source', 'Target'])['weight'].sum().reset_index(name='weight')

# Sort the DataFrame by the weight and display the top edges
top_edges = edge_weights.sort_values(by='weight', ascending=False).head(10)
print("Top 10 edges by weight: \n", top_edges)
Top 10 edges by weight: 
                  Source            Target  weight
329        Eddard-Stark  Robert-Baratheon     291
134          Bran-Stark        Robb-Stark     112
62           Arya-Stark       Sansa-Stark     104
249  Daenerys-Targaryen             Drogo     101
479   Joffrey-Baratheon       Sansa-Stark      87
504            Jon-Snow     Samwell-Tarly      81
454        Jeor-Mormont          Jon-Snow      81
320        Eddard-Stark     Petyr-Baelish      81
257  Daenerys-Targaryen     Jorah-Mormont      75
225    Cersei-Lannister  Robert-Baratheon      72

The top edge is between Eddard Stark and Robert Baratheon, with a weight of 291. This indicates that Eddard Stark and Robert Baratheon had 291 interactions, suggesting a significant level of connection or relationship between them. Ned Stark and Robert Baratheon had a close relationship that dated back to their youth. They were longtime friends and trusted allies. Their bond was formed during Robert’s Rebellion, a war that aimed to overthrow the Mad King Aerys II Targaryen and place Robert on the Iron Throne.

The second edge is between Bran Stark and Robb Stark, with a weight of 112. This implies that Bran Stark and Robb Stark had 112 interactions, indicating a strong connection between these two Stark brothers.

The third edge is between Arya Stark and Sansa Stark, with a weight of 104. This suggests that Arya Stark and Sansa Stark had 104 interactions, implying a significant level of connection or relationship between these two Stark sisters.

Let’s plot the distribution.

# Plot a histogram of edge weights
plt.figure(figsize=(10,6))
sns.histplot(edge_weights['weight'], bins=30, color='skyblue', edgecolor='black', kde=False)
plt.title('Distribution of Edge Weights', fontsize=15)
plt.xlabel('Edge Weight', fontsize=12)
plt.ylabel('Number of Edges', fontsize=12)
plt.show()

Visualise the networks using Networkx

NetworkX is a Python library that provides tools for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks. It is widely used for tasks related to network analysis and graph theory.

We will be using it to make visualisations.

First, we create an empty graph object and iterate through the data frame to add edges.

# Creating an empty graph object
b1 = nx.Graph()

# Iterating through the DataFrame to add edges
for _, edge in d1.iterrows():
    b1.add_edge(edge['Source'], edge['Target'], weight=edge['weight'])
# Printing out the number of nodes and edges in the graph
print("Total number of nodes: ", int(b1.number_of_nodes()))
print("Total number of edges: ", int(b1.number_of_edges()))
Total number of nodes:  187
Total number of edges:  684

I will plot the network of Book 1 using 3 different layouts from the Networkx library to see which one is the most readable.

  • Nx.draw(): default spring layout
  • Nx.draw_circular(): Nodes are positioned in a circle around the centre
  • Nx.draw_kamada_kawai(): Positions nodes using the force-directed method of Kamada and Kawai.

Nx.draw()

plt.figure(figsize =(20, 20))
nx.draw(b1, with_labels= True)