Data Science / Python Code Snippets
A quick start guide to visualizing a Pandas dataframe using networkx and matplotlib.
I just discovered — quite accidentally — how to export data from JIRA so naturally, I began to think of ways to visualize the information and potentially glean some insight from the dataset. I’ve stumbled upon the concept of network graphs and the idea quickly captured my imagination. I realized that I can use it to tell stories not only about the relationships between people but between words as well! But NLP is a big topic, so how about we walk first and run later?!
This is just a very gentle introduction so we won’t be using any fancy code here.
Network graphs “show interconnections between a set of entities”¹ where entities arenodes
and the
connections between them are represented through links or edges
¹. In the graph below, the dots are the nodes and the lines are called edges.
In this post, I’ll share the code that will let us quickly visualize a Pandas dataframe using a popular network graph package: networkx.
First, let’s get our data and load it into a dataframe. You can download the sample dataset here.
import pandas as pddf = pd.read_csv['jira_sample.csv']
Second, let’s trim the dataframe to only include the columns we want to examine. In this case, we only want the columns ‘Assignee’ and ‘Reporter’.
df1 = df[['Assignee', 'Reporter']]
Third, it’s time to create the world into which the graph will exist. If you haven’t
already, install the networkx package by doing a quick pip install networkx
.
import networkx as nxG = nx.Graph[]
Then, let’s populate the graph with the 'Assignee'
and 'Reporter'
columns from the df1
dataframe.
G = nx.from_pandas_edgelist[df1, 'Assignee', 'Reporter']
Next, we’ll materialize the graph we created with the help of matplotlib for formatting.
from matplotlib.pyplot import figurefigure[figsize=[10, 8]]
nx.draw_shell[G, with_labels=True]
The most important line in the block above is nx.draw_shell[G, with_labels=True]
. It tells the computer to draw the graph G
using a shell layout with the labels for entities turned on.
Voilà! We got ourselves a network graph:
Right off the bat, we can tell that there’s a heavy concentration of lines originating from three major players, ‘barbie.doll’, ‘susan.lee’, and ‘joe.appleseed’. Of course, just to be sure, it’s always a good idea to confirm our ‘eyeballing’ with some hard numbers.
Bonus Round
Let’s check out ‘barbie.doll’.
G['barbie.doll']
To see how many connections ‘barbie.doll’ has, let’s use len[]
:
len[G['barbie.doll']]
Next, let’s create another dataframe that shows the nodes and their number of connections.
leaderboard = {}for x in G.nodes:
leaderboard[x] = len[G[x]]s = pd.Series[leaderboard, name='connections']df2 = s.to_frame[].sort_values['connections', ascending=False]
In the code block above, we first initialized an empty dictionary called ‘leaderboard’ and then used a simple for-loop to populate the dictionary with names and number of connections. Then, we created a series out of the
dictionary. Finally, we created another dataframe from the series that we created using to_frame[]
.
To display the dataframe, we simply use df2.head[]
and we got ourselves a leaderboard!
And that’s it! With a few simple lines of code, we quickly made a network graph from a Pandas dataframe and even displayed a table with names and number of connections.
I hope you enjoyed this one. Network graph analysis is a big topic but I hope that this gentle introduction will encourage you to explore more and expand your repertoire.
In the next article, I’ll walk through Power BI’s custom visual called ‘Network Navigator’ to create a network graph with a few simple clicks of the mouse.
Stay tuned!
You can reach me on Twitter or LinkedIn.
[1]: Data-To-Viz. [May 15, 2020]. Network Diagram //www.data-to-viz.com/graph/network.html