How do you show percentages in crosstab python?

Given a dataframe with different categorical variables, how do I return a cross-tabulation with percentages instead of frequencies?

df = pd.DataFrame({'A' : ['one', 'one', 'two', 'three'] * 6,
                   'B' : ['A', 'B', 'C'] * 8,
                   'C' : ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'] * 4,
                   'D' : np.random.randn(24),
                   'E' : np.random.randn(24)})


pd.crosstab(df.A,df.B)


B       A    B    C
A               
one     4    4    4
three   2    2    2
two     2    2    2

Using the margins option in crosstab to compute row and column totals gets us close enough to think that it should be possible using an aggfunc or groupby, but my meager brain can't think it through.

B       A     B    C
A               
one     .33  .33  .33
three   .33  .33  .33
two     .33  .33  .33

In today’s tutorial we’ll learn how to quickly create and customize crosstabs. As part of our Data Analysis work in Pandas, we typically use crosstabs in order to count , plot and analyze frequencies of occurrences on multiple columns.

Importing our data

For this example we’ll import a simple csv file that is located in the same working directory as our Pythin notebook or script:

# import pandas
import pandas as pd

sal_df = pd.read_csv('hr.csv')

print(sal_df)

Here’s our data:

Create a Simple Pandas crosstab

We’ll call the pd.crosstab function and render a very simple crosstab:

crosstb1 = pd.crosstab(index = sal_df['month'], columns = sal_df['language'])

crosstb1

Note that by default the crosstab displays the occurrence count (not the unique / distinct values) and there is no need to specify the aggfunc = count parameter.

How do you show percentages in crosstab python?

Create a cross tab with percentages

By definition, the crosstab counts the occurrences of each row in the DataFrame. We can do the same using a Pandas Pivot Table. What’s special about the crosstab function, if that as we mentioned before, it also allows to easily computer the relative frequency between values in different columns. We do that by adding the normalize=True parameter.

crosstb2 = pd.crosstab(index = sal_df['month'],  \
                      columns = sal_df['language'] , \
                      normalize=True) *100
crosstb2

Using a simple snippet we computed the relative frequency (percentage) of each observation in our data.

How do you show percentages in crosstab python?

Adding a total column and row to the crosstab

Next, we’ll use the margins and margins_name parameters to add a summary column and row to the table.

crosstb3 = pd.crosstab(index = sal_df['month'],  \
                      columns = sal_df['language'] , \
                      normalize=True, margins = True, \
                      margins_name= "Total") *100
crosstb3

To improve the look and feel of our table, we’ll append percentage signs to each cell in the cross tab. In order to do so, we’ll first convert the crosstab contents to a Python string data type. We do that by using the astype DataFrame method. We then append the % character using a simple lambda function.

crosstb3.astype(str).apply(lambda x:x + '%')

Here is the resulting crosstab:

How do you show percentages in crosstab python?

Pandas crosstabs with multiple columns

Next, we would like to allow an easier drill down into the data by adding another level to our crosstab columns section. Note that instead of passing a single column we are passing a list containing the columns we want displayed.

crosstb4 = pd.crosstab(index = sal_df['month'],  \
                      columns = [sal_df['language'], sal_df['office']] , \
                      normalize=True, margins = True, \
                      margins_name= "Total") *100

crosstb4.astype(str).applymap(lambda x:x + '%')
How do you show percentages in crosstab python?

Additional Learning

How to count values in specific Pandas rows and columns?

How do you find the percentage in a cross tab?

If there is only one measure in the crosstab, click the crosstab corner. , click Show Value As, and click the percentage values that you want to show. If you click Custom, provide the information that is required to calculate the percentage values.

How do you find the percentage in a series in Python?

You can calculate the percentage by using DataFrame. groupby() method..
Splitting the data into groups based on some criteria..
Applying a function to each group independently..
Combining the results into a data structure..

How do you show percentages in crosstabs in SPSS?

Using the Crosstabs Dialog Window.
Reopen the Crosstabs window (Analyze > Descriptive Statistics > Crosstabs)..
In the Row box, replace variable Rank with RankUpperUnder ..
Click Cells. In the Percentages area, check off Row, Column, and Total percentages. ... .
Click OK to run..

How do you use crosstab in Python?

The crosstab() function is used to compute a simple cross tabulation of two (or more) factors. By default computes a frequency table of the factors unless an array of values and an aggregation function are passed. Values to group by in the rows. Values to group by in the columns.