How do you plot cross tables in python?

In this article, we will discuss how to create a bar plot by using pandas crosstab in Python. First Lets us know more about the crosstab, It is a simple cross-tabulation of two or more variables.

What is cross-tabulation?

It is a simple cross-tabulation that help us to understand the relationship between two or more variable. It will give a clear understanding of the data and makes analysis easier.  

Let us take an example if we take a data set of Handedness of people which includes peoples nationality, sex, age, and name. Suppose if we want to analyze the relationship between nationality and handedness of the peoples. Crosstab gives you the relationship between them.

How do you plot cross tables in python?

Crosstab using pandas

Before creating the barplot we should create cross-tabulation using pandas.

Syntax: pandas.crosstab(index, columns, values=None, rownames=None, colnames=None, aggfunc=None, margins=False, margins_name=’All’, dropna=True, normalize=False

Code:

Python

import pandas as pd

df = pd.read_csv('Data.csv')

crosstb = pd.crosstab(df.Nationality, df.Handedness)

Output:

How do you plot cross tables in python?

Creating bar plots

Bar graphs are most used to compare between different groups or to track changes over time. Using bar plots to the crosstab is one of the efficient ways to conclude the crosstab and analyze them even better.

Syntax: DataFrame.plot.bar(x=None, y=None, **kwargs)

Code:

Python3

import pandas as pd

df = pd.read_csv('Data.csv')

crosstb = pd.crosstab(df.Nationality, df.Handedness)

barplot = crosstb.plot.bar(rot=0)

Output:

How do you plot cross tables in python?

Stacked barplot

Here we will create a stacked barplot through dataframe by passing the stacked parameter as True.

Dataframe.plot(kind=”bar”, stacked = True, rot=0)

Code:

Python

import pandas as pd

df = pd.read_csv('Data.csv')

crosstb = pd.crosstab(df.Nationality, df.Handedness)

pl = crosstb.plot(kind="bar", stacked=True, rot=0)

Output:

How do you plot cross tables in python?

Creating bar plot using more than two variables from the crosstab

In the above example, we found the relationship between nationality and the handedness of the people. We can also create a crosstab with more than two values. We will implement this in the following example. 

Python3

import pandas as pd

df = pd.read_csv('Data.csv')

crosstb = pd.crosstab(df.Sex, [df.Nationality,

                               df.Handedness])

a = crosstb.plot(kind='bar', rot=0)

a.legend(title='Handedness', bbox_to_anchor=(1, 1.02),

         loc='upper left')

Output:

How do you plot cross tables in python?


pandas.crosstab(index, columns, values=None, rownames=None, colnames=None, aggfunc=None, margins=False, margins_name='All', dropna=True, normalize=False)[source]#

Compute a simple cross tabulation of two (or more) factors.

By default, computes a frequency table of the factors unless an array of values and an aggregation function are passed.

Parametersindexarray-like, Series, or list of arrays/Series

Values to group by in the rows.

columnsarray-like, Series, or list of arrays/Series

Values to group by in the columns.

valuesarray-like, optional

Array of values to aggregate according to the factors. Requires aggfunc be specified.

rownamessequence, default None

If passed, must match number of row arrays passed.

colnamessequence, default None

If passed, must match number of column arrays passed.

aggfuncfunction, optional

If specified, requires values be specified as well.

marginsbool, default False

Add row/column margins (subtotals).

margins_namestr, default ‘All’

Name of the row/column that will contain the totals when margins is True.

dropnabool, default True

Do not include columns whose entries are all NaN.

normalizebool, {‘all’, ‘index’, ‘columns’}, or {0,1}, default False

Normalize by dividing all values by the sum of values.

  • If passed ‘all’ or True, will normalize over all values.

  • If passed ‘index’ will normalize over each row.

  • If passed ‘columns’ will normalize over each column.

  • If margins is True, will also normalize margin values.

ReturnsDataFrame

Cross tabulation of the data.

Notes

Any Series passed will have their name attributes used unless row or column names for the cross-tabulation are specified.

Any input passed containing Categorical data will have all of its categories included in the cross-tabulation, even if the actual data does not contain any instances of a particular category.

In the event that there aren’t overlapping indexes an empty DataFrame will be returned.

Reference the user guide for more examples.

Examples

>>> a = np.array(["foo", "foo", "foo", "foo", "bar", "bar",
...               "bar", "bar", "foo", "foo", "foo"], dtype=object)
>>> b = np.array(["one", "one", "one", "two", "one", "one",
...               "one", "two", "two", "two", "one"], dtype=object)
>>> c = np.array(["dull", "dull", "shiny", "dull", "dull", "shiny",
...               "shiny", "dull", "shiny", "shiny", "shiny"],
...              dtype=object)
>>> pd.crosstab(a, [b, c], rownames=['a'], colnames=['b', 'c'])
b   one        two
c   dull shiny dull shiny
a
bar    1     2    1     0
foo    2     2    1     2

Here ‘c’ and ‘f’ are not represented in the data and will not be shown in the output because dropna is True by default. Set dropna=False to preserve categories with no data.

>>> foo = pd.Categorical(['a', 'b'], categories=['a', 'b', 'c'])
>>> bar = pd.Categorical(['d', 'e'], categories=['d', 'e', 'f'])
>>> pd.crosstab(foo, bar)
col_0  d  e
row_0
a      1  0
b      0  1
>>> pd.crosstab(foo, bar, dropna=False)
col_0  d  e  f
row_0
a      1  0  0
b      0  1  0
c      0  0  0

How do you create a cross table in Python?

Step 1 - Import the library. import pandas as pd. ... .
Step 2 - Setting up the Data. We have created a dataset by making a dictionary with features and passing it through the dataframe function. ... .
Step 3 - Making CrossTab Table..

What is cross table in Python?

The crosstab() function is used to compute a simple cross tabulation of two (or more) factors. By default computes a frequency table of the factors unless an array of values and an aggregation function are passed.

How do you show percentages in crosstab Python?

“how to display percentage in pandas crosstab” Code Answer.
pd. crosstab(df. A,df. B, normalize='index')\.
. round(4)*100..
B A B C..
one 33.33 33.33 33.33..
three 33.33 33.33 33.33..
two 33.33 33.33 33.33..

What is the difference between crosstab and pivot table?

With a basic crosstab, you would have to go back to the program and create a separate crosstab with the information on individual products. Pivot tables let the user filter through their data, add or remove custom fields, and change the appearance of their report.