Statistics with python 3 random experiment
I got what I expected. Don't know which one is head: 4995 or 5005? Show
Here is more code to explain your tossing:
Results of four tests: All you need to know to generate random numbers in PythonPhoto by Naser Tamimi on UnsplashGenerating random numbers is one of the common tasks that you need to perform when writing applications. Random numbers serve many purposes — from cryptography to machine learning, random numbers play an extremely important role in making our applications work correctly. As a Python programmer, you are spoilt for choice when it comes to generating random values as there are so many ways to do that. However, this flexibility comes with a price — it is often not very clear when to use what. And this is exactly what I intend to address in this article. At the end of this article, you would have a much clearer picture of what functions to use to generate the random numbers you desire. Generating Random Numbers in PythonTo generate random numbers in Python, you can use the random module: import random To generate a floating-point random number, call the random() function: random.random() # e.g. 0.49543508709194095 The random() function generates a floating number in the half-open interval — [0,1). This means that the number generated will be from 0 to 1 (where 1 is excluded). If you want to generate a integer value from 0 to 10 (inclusive), use the randint() function: random.randint(0,10) # [0,10] - 0 to 10 (inclusive) e.g. 6 If you want a random floating-point number that is within a specific range (e.g. 1 to 5 (exclusive)), use the uniform() function: random.uniform(1,5) # [1,5) e.g. 4.756596651114043 To generate a list of random integer values within a specific range (e.g. 0 to 32 (exclusive)) without repeating values, use the sample() function: random.sample(range(0, 32), 5) # result is a list of 5 values
If you want a list of random floating-point values in the half-open interval, you can use the random() function via list comprehension: [random.random() for _ in range(5)] # [0, 1) Likewise, if you need a list of random floating-point values in a specific range, you can do this: [random.uniform(1,5) for _ in range(5)] # [1, 5) Finally, if you want need to generate a list of random integer numbers, use this: [random.randint(0,10) for _ in range(5)] # [0,10] Generating Random Numbers using NumPyIf you are using the NumPy module, you can also use it to generate random numbers. The random module contains several functions that allows you to generate random numbers. The uniform() function generates a floating-point numbers within the half-open interval: import numpy as npnp.random.uniform() # [0,1) e.g. 0.6603742810407641 You can also specify the range: np.random.uniform(1,5) # [1,5) e.g. 2.1809140016758803 And also the number of random values to generate: np.random.uniform(1,5,8) # [1,5) x 8 You can also specify the shape of the result that you desire: np.random.uniform(1,5,(2,4)) # [1,5) - result is a 2D array of If you just want to generate numbers in the half-open interval, there is one more function you can use — rand(): np.random.rand() # [0,1) e.g. 0.11705786929477491 The rand() function makes it easy to generate values of half-open interval in various dimensions: np.random.rand(5) # [0,1) x 5
Another function that is similar to the rand() function is random(). It too generates numbers in the half-open interval. The key difference between the two is that the random() function takes in a single argument for the dimension of numbers you want to generate. np.random.random(5)
If you need a random integer value, use the randint() function: np.random.randint(0,9) # [0,9) e.g. 7 You can also generate a list of integer values in multi-dimension: np.random.randint(0,9,5) # [0,9) x 5 Specifying the distribution of numbersSo far all the numbers that we have generated are uniformly distributed.
This means that if you try to generate a large number of values, there should be equal chances for any values to be generated. You can see this by trying to generate a million values using the random() function, and then dividing the range of numbers into 25 bins and counting the occurrences of each value that falls into each bin: import matplotlib.pyplot as plt The above statement displays a histogram showing the distribution of the one million numbers: If you want to generate a list of numbers that are normally distributed, you can use the randn() function: np.random.randn(4,3)
The randn() function returns a sample of values from the standard normal distribution. In the above code snippet, the randn() function returns the result in a 2D array. Source: https://en.wikipedia.org/wiki/Normal_distribution#/media/File:Standard_deviation_diagram.svg The following example generates one millions numbers that are normally distributed and then used to plot a histogram with the numbers split into 50 bins: _ = plt.hist(np.random.randn(1_000_000), bins=50) You should see something like the following: Seeding your Random Number GeneratorThe irony about random numbers is that they are not really random. Instead, the random number generators in Python uses the current time to generate them, and since every time you run your code to generate the random numbers the time changes, you would think that the numbers are truly random. But that’s not the issue most of us are concerned with. Instead, very often for reproducibility reasons we want to ensure that the random numbers generated are the same, so that we can always get the same result for our analysis. If you generate your random numbers in Python, use the seed() function by passing in an integer value: random.seed(1) # pass in an integer value as the seed The above code snippet will always generate the same list of random numbers. If you use NumPy, use the random.seed() function: np.random.seed(2) # pass in an integer value as the seed NumPy also comes with the RandomState class where you can create an instance of it using a random seed and then use it to generate different types of random values: r = np.random.RandomState(1) # pass in an integer value as the Generating Random Numbers using sklearnBesides generating random numbers that are uniformly or normally distributed, it is sometimes necessary to generate numbers that are linearly distributed, or clustered around a specific centroids. For example, you might want to try out linear regression using a set of points, or you want to try out some clustering algorithms for unsupervised learning. Generating random numbers that are linearly distributedYou can make use of the make_regression() function from the sklearn.datasets module to generate a set of points that are linearly distributed: from sklearn.datasets import make_regression The n_samples parameter specifies how many numbers to generate, the n_features specifies the number of columns to generate, and noise indicates the standard deviation applied to the numbers (how much they are dispersed). The above code snippet will produce an output that looks like this: print(x) A better way to understand the numbers is to plot a scatter plot: import matplotlib.pyplot as plt If you modify the noise to a larger value: x, y = make_regression(n_samples=100, n_features=1, noise=19) You will see that the values are now more dispersed: What if you change the n_features to 2? In this case, X will be a 2D array: X, y = make_regression(n_samples=1000, n_features=2, noise=3) A good way to visualize the set of random numbers generated is to plot a 3D scatter plot using the scatter3D() function: from sklearn.datasets import make_regression
Here is how the plot looks like from the various angles: Interpolating the random numbers generatedThe values generated by the make_regression() may not be in the range that you desire. For example, if you want to generate a set of points showing the relationships between the height and weight of a group of people. In this case, you want the height to be between 148cm to 185cm and the weight to be between 44kg and 74kg. The following code snippet scales the x and y values using the interp() function from NumPy: x, y = make_regression(n_samples=100, n_features=1, noise=2.6)# scale x (e.g. height in cm) to 148..185 range The scatter plot confirms the interpolation performed: Generating random numbers that cluster around centroidsVery often when you do unsupervised learning, you need to generate random points that cluster around a few centroids. For this purpose, you can use the make_blobs() function from the sklearn.datasets module: from sklearn.datasets import make_blobsX, y = make_blobs(n_samples = 500, The above code snippet returns 500 pairs of random numbers (contained in X) and y contains the classes that each point is in: print(X) As usual, visualization always make things much clearer: rgb = np.array(['r', 'g', 'b'])# plot the blobs using a scatter plot and use color coding How about 3D points? Sure, just set n_features to 3 and plot using the scatter3D() function: from sklearn.datasets import make_blobs
For reproducibility , set the random_state parameter to a value: X, y = make_blobs(n_samples = 1500, SummaryPhew, looks like there are quite a number of different ways to generate random numbers in Python. The best way to remember what to use it to refer to the following summary of the functions we have discussed in this article. Did I missed out any important functions? Let me know in the comments! |