Statistics with python 3 random experiment

I got what I expected. Don't know which one is head: 4995 or 5005?

print(y[0])
print(y[1])

4995
5005

Here is more code to explain your tossing:

from scipy.stats import binom
data_binom = binom.rvs(n=1,p=0.5,size=10000)

heads = 0
tails = 0
edges = 0
count = 0

for coin in data_binom:
    count += 1
    if coin == 1:
        heads += 1
    elif coin == 0:
        tails += 1
    else:
        edges += 1

print("Observed " + str(count) + " of coin tossing with heads " + str(heads)
      + ", tails " + str(tails) + ", and edges " + str(edges))

Results of four tests:

$ python3.7 test.py
Observed 10000 of coin tossing with heads 4989, tails 5011, and edges 0
$ python3.7 test.py
Observed 10000 of coin tossing with heads 5109, tails 4891, and edges 0
$ python3.7 test.py 
Observed 10000 of coin tossing with heads 4968, tails 5032, and edges 0
$ python3.7 test.py 
Observed 10000 of coin tossing with heads 5046, tails 4954, and edges 0

All you need to know to generate random numbers in Python

Photo by Naser Tamimi on Unsplash

Generating random numbers is one of the common tasks that you need to perform when writing applications. Random numbers serve many purposes — from cryptography to machine learning, random numbers play an extremely important role in making our applications work correctly.

As a Python programmer, you are spoilt for choice when it comes to generating random values as there are so many ways to do that. However, this flexibility comes with a price — it is often not very clear when to use what. And this is exactly what I intend to address in this article.

At the end of this article, you would have a much clearer picture of what functions to use to generate the random numbers you desire.

Generating Random Numbers in Python

To generate random numbers in Python, you can use the random module:

import random

To generate a floating-point random number, call the random() function:

random.random()   # e.g. 0.49543508709194095

The random() function generates a floating number in the half-open interval — [0,1). This means that the number generated will be from 0 to 1 (where 1 is excluded).

If you want to generate a integer value from 0 to 10 (inclusive), use the randint() function:

random.randint(0,10)  # [0,10] - 0 to 10 (inclusive) e.g. 6

If you want a random floating-point number that is within a specific range (e.g. 1 to 5 (exclusive)), use the uniform() function:

random.uniform(1,5)  # [1,5) e.g. 4.756596651114043

To generate a list of random integer values within a specific range (e.g. 0 to 32 (exclusive)) without repeating values, use the sample() function:

random.sample(range(0, 32), 5)    # result is a list of 5 values 
# from [0,32) with no repeats
# [12, 15, 26, 10, 7]

The sample() function is useful for cases such as lucky draws where you need to pick some winners from a list of values.

If you want a list of random floating-point values in the half-open interval, you can use the random() function via list comprehension:

[random.random() for _ in range(5)]        # [0, 1)
# [0.26800994395551214,
# 0.3322334781304659,
# 0.5058884832347348,
# 0.2552912262686192,
# 0.33885158106897195]

Likewise, if you need a list of random floating-point values in a specific range, you can do this:

[random.uniform(1,5) for _ in range(5)]    # [1, 5)
# [1.4556516495709206,
# 1.94075804553687,
# 4.775979596495107,
# 4.118159382173641,
# 3.860434558608088]

Finally, if you want need to generate a list of random integer numbers, use this:

[random.randint(0,10) for _ in range(5)]   # [0,10]
# [3, 9, 8, 7, 10]

Generating Random Numbers using NumPy

If you are using the NumPy module, you can also use it to generate random numbers. The random module contains several functions that allows you to generate random numbers.

The uniform() function generates a floating-point numbers within the half-open interval:

import numpy as npnp.random.uniform()         # [0,1) e.g. 0.6603742810407641

You can also specify the range:

np.random.uniform(1,5)      # [1,5) e.g. 2.1809140016758803

And also the number of random values to generate:

np.random.uniform(1,5,8)    # [1,5) x 8
# array([3.15101237, 3.52431302, 2.43564056, 4.22373224,
# 1.82549706, 4.30782957, 2.1383488 , 3.71130947])

You can also specify the shape of the result that you desire:

np.random.uniform(1,5,(2,4))  # [1,5) - result is a 2D array of 
# 2 rows and 4 columns
# array([[4.85777402, 2.41464442, 3.47972032, 3.61706258],
# [1.39591689, 2.41386733, 3.34813041, 3.13411887]])

If you just want to generate numbers in the half-open interval, there is one more function you can use — rand():

np.random.rand()      # [0,1) e.g. 0.11705786929477491

The rand() function makes it easy to generate values of half-open interval in various dimensions:

np.random.rand(5)     # [0,1) x 5
# array([0.52310231, 0.87305847, 0.03870784, 0.69239079, 0.47626848])
np.random.rand(2,3) # [0,1) in 2D
# array([[0.16926449, 0.06317189, 0.03222409],
# [0.24243086, 0.11270682, 0.40499002]])

The rand() function takes in additional arguments for the shape of the result to return while the uniform() function takes in three arguments — low, high, and size.

Another function that is similar to the rand() function is random(). It too generates numbers in the half-open interval. The key difference between the two is that the random() function takes in a single argument for the dimension of numbers you want to generate.

np.random.random(5)  
# array([0.90351056, 0.96734226, 0.06753921,
# 0.31758607, 0.69686297])
np.random.random((2,3)) # passed the dimension you want
# as a tuple
# array([[0.04207297, 0.92656545, 0.93526291],
# [0.8104269 , 0.18834308, 0.58731822]])

The difference between random() and uniform() is that the random() function takes in a single argument. So if you want to generate a multi-dimensional array result, you need to wrap the shape as a tuple.

If you need a random integer value, use the randint() function:

np.random.randint(0,9)        # [0,9) e.g. 7

You can also generate a list of integer values in multi-dimension:

np.random.randint(0,9,5)      # [0,9) x 5
# array([3, 7, 3, 2, 8])
np.random.randint(0,9,(4,5)) # [0,9) in 2D array
# array([[5, 2, 4, 8, 0],
# [5, 2, 3, 7, 2],
# [6, 1, 2, 4, 7],
# [2, 3, 5, 8, 4]])

Specifying the distribution of numbers

So far all the numbers that we have generated are uniformly distributed.

The uniform distribution is a continuous probability distribution and is concerned with events that are equally likely to occur.

This means that if you try to generate a large number of values, there should be equal chances for any values to be generated. You can see this by trying to generate a million values using the random() function, and then dividing the range of numbers into 25 bins and counting the occurrences of each value that falls into each bin:

import matplotlib.pyplot as plt
_ = plt.hist(np.random.random(1_000_000), bins = 25)

The above statement displays a histogram showing the distribution of the one million numbers:

Statistics with python 3 random experiment

If you want to generate a list of numbers that are normally distributed, you can use the randn() function:

np.random.randn(4,3)
# array([[-0.58617287, 0.99765344, 1.00439116],
# [-0.45170132, -0.01265149, 0.75739522],
# [ 0.70970036, -0.1740791 , 1.14584093],
# [ 1.2637344 , 0.77962903, -0.97546801]])

Normal distribution, also known as the Gaussian distribution, is a probability distribution that is symmetric about the mean, with most data clustered near the mean. On a graph, a normal distribution appears as a bell curve.

The randn() function returns a sample of values from the standard normal distribution. In the above code snippet, the randn() function returns the result in a 2D array.

The standard normal distribution is a normal distribution with a mean of zero and standard deviation of 1. For the standard normal distribution, 68.27% of the observations lie within 1 standard deviation of the mean; 95.45% lie within two standard deviation of the mean; and 99.73% lie within 3 standard deviations of the mean.

Statistics with python 3 random experiment

Source: https://en.wikipedia.org/wiki/Normal_distribution#/media/File:Standard_deviation_diagram.svg

The following example generates one millions numbers that are normally distributed and then used to plot a histogram with the numbers split into 50 bins:

_ = plt.hist(np.random.randn(1_000_000), bins=50)

You should see something like the following:

Statistics with python 3 random experiment

Seeding your Random Number Generator

The irony about random numbers is that they are not really random. Instead, the random number generators in Python uses the current time to generate them, and since every time you run your code to generate the random numbers the time changes, you would think that the numbers are truly random. But that’s not the issue most of us are concerned with. Instead, very often for reproducibility reasons we want to ensure that the random numbers generated are the same, so that we can always get the same result for our analysis.

If you generate your random numbers in Python, use the seed() function by passing in an integer value:

random.seed(1)              # pass in an integer value as the seed
random.sample(range(0, 32), 5)
# [8, 4, 16, 7, 31]

The above code snippet will always generate the same list of random numbers.

If you use NumPy, use the random.seed() function:

np.random.seed(2)           # pass in an integer value as the seed
np.random.uniform(0, 10, 5)
# [4.35994902 0.25926232 5.49662478 4.35322393 4.20367802]

NumPy also comes with the RandomState class where you can create an instance of it using a random seed and then use it to generate different types of random values:

r = np.random.RandomState(1)   # pass in an integer value as the 
# seed
print(r.uniform(0, 10, 5)) # [0,10)
# [4.17022005e+00 7.20324493e+00 1.14374817e-03 3.02332573e+00
# 1.46755891e+00]
print(r.rand(2,3)) # [0,1)
# [[0.09233859 0.18626021 0.34556073]
# [0.39676747 0.53881673 0.41919451]]

Generating Random Numbers using sklearn

Besides generating random numbers that are uniformly or normally distributed, it is sometimes necessary to generate numbers that are linearly distributed, or clustered around a specific centroids. For example, you might want to try out linear regression using a set of points, or you want to try out some clustering algorithms for unsupervised learning.

Generating random numbers that are linearly distributed

You can make use of the make_regression() function from the sklearn.datasets module to generate a set of points that are linearly distributed:

from sklearn.datasets import make_regression
import numpy as np
x, y = make_regression(n_samples=100, n_features=1, noise=12.3)

The n_samples parameter specifies how many numbers to generate, the n_features specifies the number of columns to generate, and noise indicates the standard deviation applied to the numbers (how much they are dispersed). The above code snippet will produce an output that looks like this:

print(x)
# [[ 1.20630427]
# [-1.02041981]
# ...
# [-0.95098556]
# [ 0.09247152]]
print(y)
# [ 66.34055577 -52.39063718 51.46433162 -12.56089116
# 10.62491393 8.00035735 4.80360232 -28.99765946
# ...
# 12.75554229 9.75147261 2.67890648 -32.4981596
# -30.16046261 -4.56704054 -43.56250488 -9.30790306]

A better way to understand the numbers is to plot a scatter plot:

import matplotlib.pyplot as plt
_ = plt.scatter(x, y)

Statistics with python 3 random experiment

If you modify the noise to a larger value:

x, y = make_regression(n_samples=100, n_features=1, noise=19)
_ = plt.scatter(x, y)

You will see that the values are now more dispersed:

Statistics with python 3 random experiment

What if you change the n_features to 2? In this case, X will be a 2D array:

X, y = make_regression(n_samples=1000, n_features=2, noise=3)
print(X)
# [[-0.10171443 1.59563406]
# [ 0.39154137 -0.21477808]
# [ 0.00732151 0.24783439]
# ...
# [-0.62820116 0.16688806]
# [-0.35656323 -1.1761519 ]
# [ 0.04589981 0.59696238]]

A good way to visualize the set of random numbers generated is to plot a 3D scatter plot using the scatter3D() function:

from sklearn.datasets import make_regression
import numpy as np
import matplotlib.pyplot as plt
X, y = make_regression(n_samples=1000, n_features=2, noise=3)fig = plt.figure(figsize=(13,13))
ax = plt.axes(projection='3d')
ax.scatter3D(X[:,0], X[:,1], y, c=y, cmap='Greens')
ax.set_xlabel('X[0]')
ax.set_ylabel('X[1]')
ax.set_zlabel('y')
plt.show()

You should save the above code snippet in a file named random_regression.py and run it in the command prompt. You will then be able to visualize the plot by rotating it around.

Here is how the plot looks like from the various angles:

Statistics with python 3 random experiment

Statistics with python 3 random experiment

Statistics with python 3 random experiment

Interpolating the random numbers generated

The values generated by the make_regression() may not be in the range that you desire. For example, if you want to generate a set of points showing the relationships between the height and weight of a group of people. In this case, you want the height to be between 148cm to 185cm and the weight to be between 44kg and 74kg. The following code snippet scales the x and y values using the interp() function from NumPy:

x, y = make_regression(n_samples=100, n_features=1, noise=2.6)# scale x (e.g. height in cm) to 148..185 range
x = np.interp(x, (x.min(), x.max()), (148, 185))
# scale y (e.g. weight in kg) to 44..74 range
y = np.interp(y, (y.min(), y.max()), (44, 74))
plt.scatter(x, y)

The scatter plot confirms the interpolation performed:

Statistics with python 3 random experiment

Generating random numbers that cluster around centroids

Very often when you do unsupervised learning, you need to generate random points that cluster around a few centroids. For this purpose, you can use the make_blobs() function from the sklearn.datasets module:

from sklearn.datasets import make_blobsX, y = make_blobs(n_samples = 500, 
centers = 3,
n_features = 2)

The above code snippet returns 500 pairs of random numbers (contained in X) and y contains the classes that each point is in:

print(X)
# [[ -9.86754851 9.27779819]
# [-11.50057906 8.88609894]
# ...
# [ -5.96056302 -3.21866963]
# [-10.38173377 8.82254368]]
print(y)
# [2 2 0 1 2 2 0 1 2 1 1 1 0 2
# 1 1 2 1 1 1 2 2 0 1 1 1 1 1
# ...
# 2 0 0 0 2 0 1 0 2 2 1 2 1 2
# 2 1 2 2 1 1 1 0 0 0 2 1 2 1]

As usual, visualization always make things much clearer:

rgb = np.array(['r', 'g', 'b'])# plot the blobs using a scatter plot and use color coding
_ = plt.scatter(X[:, 0], X[:, 1], color=rgb[y])
plt.xlabel('X[0]')
plt.ylabel('X[1]')

Statistics with python 3 random experiment

How about 3D points? Sure, just set n_features to 3 and plot using the scatter3D() function:

from sklearn.datasets import make_blobs
import numpy as np
import matplotlib.pyplot as plt
X, y = make_blobs(n_samples = 1500,
centers = 3,
n_features = 3)
fig = plt.figure(figsize=(13,13))
ax = plt.axes(projection='3d')
ax.scatter3D(X[:,0], X[:,1], X[:,2], c = y, cmap = 'tab20b')
ax.set_xlabel('X[0]')
ax.set_ylabel('X[1]')
ax.set_zlabel('y')
plt.show()

You should save the above code snippet in a file named random_blobs.py and run it in the command prompt. You will then be able to visualize the plot by rotating it around.

Statistics with python 3 random experiment

For reproducibility , set the random_state parameter to a value:

X, y = make_blobs(n_samples = 1500, 
centers = 3,
n_features = 3,
random_state = 0)

Summary

Phew, looks like there are quite a number of different ways to generate random numbers in Python. The best way to remember what to use it to refer to the following summary of the functions we have discussed in this article.

Statistics with python 3 random experiment

Did I missed out any important functions? Let me know in the comments!