How do you show quartiles in python?

We have seen what are quartiles and how can be useful in quickly presenting the main characteristics of a group of data.

Let’s see how to visualise them.

I will use as example the age of the Nobel Prize winners – a discrete values set – from the Nobel Prize official site.

You can follow along with the code on GitHub.

The best way to chart a data set with its quartiles is to use a box plot:

Boxplot of a normal distributed population

a box that goes from the upper to the lower quartile, plus optionally lines [the whiskers] extending from the box that go until a specified multiplier of the Inter-Quartile Range [IQR = upper – lower quartiles], while any other point outside this range is considered an outlier data point and displayed as a point.
Inside the box the median and the mean can be displayed, as lines or points.

The matplotlib function to draw a boxplot is appropriately called boxplot[] 

The function requires to pass as input data an array or a list of vectors, for example:

agePhysics = [ 25, 31, 31, 31, ... ]  # goes on for almost 200 values

The basic plot would be:

import matplotlib.pyplot as plt
# basic plot
plt.boxplot[agePhysics]
plt.show[]

The defaults used in the other parameters of the boxplot function are:

notch = False : draw a rectangular box, not notched
vert = True: the box is vertical, not horizontal
sym = None: no fliers displayed
whis = 1.5 : the multipliers from the whiskers variability, they go until whis * IQR

Now let’s print how much are the quartiles and the mean before plotting and display the mean by using the parameter showmeans [default is False], by adding/changing these lines:

from datascience import stats

 print[stats.summary[agePhysics]]
 print["range = ", stats.range[agePhysics]]
 plt.boxplot[agePhysics, showmeans=True, whis = 99]

Output printed is:

Summary statistics
Min: 25
Lower Qu.: 45.0
Median: 54.0
Mean: 54.955
Upper Qu.: 64.0
Max: 88
That's all

And the lines above display a box-and-whiskers chart like this:

A simple box plot

As you see the box itself goes from the upper to the lower quartile [45 and 64 in this case], while the whiskers [the bars extending from the box] go from the minimum to the maximum [25 and 88 in this case] because whis is set to a very high number [99] therefore including all the data points.

The red line is the median [54] while the mean [similar value] is a red square but can be changed through the parameter meanprops.

Now to add bit more fun, let’s add two more boxplots, respectively for the Literature and the Economics winners. Assuming we have the ages in two arrays called ageLiterature and ageEconomics, the first thing to do is to concatenate all the arrays and pass them to the boxplot function:

ages=[agePhysics, ageLiterature, ageEconomics]
box = plt.boxplot[ages, showmeans=True, whis=99]

Each boxplot can have its own colours, this can be set through the pyplot function setp[]:

# add colours
   # physics = green
plt.setp[box['boxes'][0], color='green']
plt.setp[box['caps'][0], color='green']
plt.setp[box['whiskers'][0], color='green']

and so on for the other boxplots …

As for the other plots, you can add titles, labels and a grid:

plt.ylim[[20, 95]] # y axis gets more space at the extremes
plt.grid[True, axis='y'] # let's add a grid on y-axis
plt.title['Distribution of the Nobel Prize winner ages', fontsize=18] # chart title
plt.ylabel['Age [years] at winning time'] # y axis title
plt.xticks[[1,2,3], ['Physics','Literature','Economics']] # x axis labels

this is the final graph:

The Nobel Prize winners arranged by field and ageSo

So, it seems that you have almost no chance to win a Nobel in Literature before you are 40 and more likely before you’re 55 years old but it’s even worse for Economics: nobody won it till now before age 50 and the mean/median are 65 …

How do you define Q1 in Python?

The first quartile [Q1], is defined as the middle number between the smallest number and the median of the data set, the second quartile [Q2] – median of the given data set while the third quartile [Q3], is the middle number between the median and the largest value of the data set.

What is quantile [] function in Python?

Numpy's Quantile[] Function In Python, the numpy. quantile[] function takes an array and a number say q between 0 and 1. It returns the value at the q th quantile. For example, numpy. quantile[data, 0.25] returns the value at the first quartile of the dataset data .

How do you split data into quartiles in Python?

Quartiles.
Percentiles divide the whole population into 100 groups where as quartiles divide the population into 4 groups..
p = 25: First Quartile or Lower quartile [LQ].
p = 50: second quartile or Median..
p = 75: Third Quartile or Upper quartile [UQ].

How do you represent quartiles?

There are three quartile values—a lower quartile, median, and upper quartile—to divide the data set into four ranges, each containing 25% of the data points. The lower quartile, or first quartile, is denoted as Q1 and is the middle number that falls between the smallest value of the dataset and the median.

Chủ Đề