Convert dataframe to graph python

Data Science / Python Code Snippets

A quick start guide to visualizing a Pandas dataframe using networkx and matplotlib.

Convert dataframe to graph python

Photo by Alina Grubnyak on Unsplash

I just discovered — quite accidentally — how to export data from JIRA so naturally, I began to think of ways to visualize the information and potentially glean some insight from the dataset. I’ve stumbled upon the concept of network graphs and the idea quickly captured my imagination. I realized that I can use it to tell stories not only about the relationships between people but between words as well! But NLP is a big topic, so how about we walk first and run later?!

This is just a very gentle introduction so we won’t be using any fancy code here.

Network graphs “show interconnections between a set of entities”¹ where entities arenodes and the connections between them are represented through links or edges¹. In the graph below, the dots are the nodes and the lines are called edges.

Convert dataframe to graph python

Martin Grandjean / CC BY-SA (https://creativecommons.org/licenses/by-sa/3.0)

In this post, I’ll share the code that will let us quickly visualize a Pandas dataframe using a popular network graph package: networkx.

First, let’s get our data and load it into a dataframe. You can download the sample dataset here.

import pandas as pddf = pd.read_csv('jira_sample.csv')

Second, let’s trim the dataframe to only include the columns we want to examine. In this case, we only want the columns ‘Assignee’ and ‘Reporter’.

df1 = df[['Assignee', 'Reporter']]

Third, it’s time to create the world into which the graph will exist. If you haven’t already, install the networkx package by doing a quick pip install networkx.

import networkx as nxG = nx.Graph()

Then, let’s populate the graph with the 'Assignee' and 'Reporter' columns from the df1 dataframe.

G = nx.from_pandas_edgelist(df1, 'Assignee', 'Reporter')

Next, we’ll materialize the graph we created with the help of matplotlib for formatting.

from matplotlib.pyplot import figurefigure(figsize=(10, 8))
nx.draw_shell(G, with_labels=True)

The most important line in the block above is nx.draw_shell(G, with_labels=True). It tells the computer to draw the graph Gusing a shell layout with the labels for entities turned on.

Voilà! We got ourselves a network graph:

Convert dataframe to graph python

Right off the bat, we can tell that there’s a heavy concentration of lines originating from three major players, ‘barbie.doll’, ‘susan.lee’, and ‘joe.appleseed’. Of course, just to be sure, it’s always a good idea to confirm our ‘eyeballing’ with some hard numbers.

Bonus Round

Let’s check out ‘barbie.doll’.

G['barbie.doll']

Convert dataframe to graph python

To see how many connections ‘barbie.doll’ has, let’s use len():

len(G['barbie.doll'])

Next, let’s create another dataframe that shows the nodes and their number of connections.

leaderboard = {}for x in G.nodes:
leaderboard[x] = len(G[x])
s = pd.Series(leaderboard, name='connections')df2 = s.to_frame().sort_values('connections', ascending=False)

In the code block above, we first initialized an empty dictionary called ‘leaderboard’ and then used a simple for-loop to populate the dictionary with names and number of connections. Then, we created a series out of the dictionary. Finally, we created another dataframe from the series that we created using to_frame().

To display the dataframe, we simply use df2.head() and we got ourselves a leaderboard!

Convert dataframe to graph python

And that’s it! With a few simple lines of code, we quickly made a network graph from a Pandas dataframe and even displayed a table with names and number of connections.

I hope you enjoyed this one. Network graph analysis is a big topic but I hope that this gentle introduction will encourage you to explore more and expand your repertoire.

In the next article, I’ll walk through Power BI’s custom visual called ‘Network Navigator’ to create a network graph with a few simple clicks of the mouse.

Stay tuned!

You can reach me on Twitter or LinkedIn.

[1]: Data-To-Viz. (May 15, 2020). Network Diagram https://www.data-to-viz.com/graph/network.html

How do you plot a DataFrame on a line graph in Python?

How to make Line plots with datetime in X axis?.
Step 1: Check if datetime values are in correct format. The datetime values should be of the form of pandas datetime objects. ... .
Step 2: Make datetime values index of the dataframe. ... .
Step 3: Create the Line plot..

How do you plot a DataFrame?

Here are the steps to plot a scatter diagram using Pandas..
Step 1: Prepare the data. To start, prepare the data for your scatter diagram. ... .
Step 2: Create the DataFrame. Once you have your data ready, you can proceed to create the DataFrame in Python. ... .
Step 3: Plot the DataFrame using Pandas..

What is the method of DataFrame to plot a line chart?

line() function is used to plot series or DataFrame as lines. This function is useful to plot lines using DataFrame's values as coordinates. Columns to use for the horizontal axis.

How do you plot a column in a DataFrame in Python?

You can plot data directly from your DataFrame using the plot() method. To plot multiple data columns in single frame we simply have to pass the list of columns to the y argument of the plot function.