Difference between numpy array and list with example

Python Lists Are Sometimes Much Faster Than NumPy. Heres Proof.

Be careful what to use.

Mohammed Ayar

Apr 11·6 min read

Photo by Braden Collum on Unsplash

I have been recently working on a digital image processing project. Hyperparameter tuning took quite some time before I got the desired accuracy. All because of the overfitting parasite and my useless low-end hardware.

For each execution, my machine took approximately 1520 min. 20 min to process 20 000 entries. I imagine if I had been working on a 1 million record dataset, I would have had to wait for the earth to do a complete rotation before the end of the training.

I was satisfied with the accuracy of the model. Yet, I wanted to try many other Convolutional Neural Network [CNN] architectures before sending in my code. Therefore, I decided to look for optimization rooms in my code.

Because I was using the pre-built machine learning algorithms residing in PyPi Scikit-Learn and Tensorflow very few subroutines were left to optimize. One option was to boost my code in terms of data structure. I was storing data in lists, and since NumPy is super fast, I thought using it might be a viable option.

Guess what happened after converting my list code to NumPy array code?

Much to my surprise, the execution time didnt shrink. Rather it soared.

That being said, in this post, I will walk you through the exact situation where lists ended up performing way better than NumPy arrays.

NumPy & Lists

Let us discuss the difference between NumPy arrays and lists, to begin with.

NumPy is the de-facto Python library for N-dimensional arrays manipulation and computational computing. It is open-source, easy to use, memory friendly, and lightning-fast.

Originally known as Numeric, NumPy sets the framework for many data science libraries like SciPy, Scikit-Learn, Panda, and more.

While Python lists store a collection of ordered, alterable data objects, NumPy arrays only store a single type of object. So, we can say that NumPy arrays live under the lists umbrella. Therefore, there is nothing NumPy arrays do lists do not.

However, when it comes to NumPy as a whole. Numpy covers not only arrays manipulation but also many other routines such as binary operations, linear algebra, mathematical functions, and more. I believe it covers more than one can possibly need.

The next thing to consider is why we usually use NumPy arrays over lists.

The short answer, which I believe everybody reading this post knows, is: it is faster.

NumPy is indeed ridiculously fast, though Python is known to be slow. This is because NumPy serves as a wrapper around C and Fortran. And needless to say how fast these two are.

NumPy Arrays Are Faster Than Lists

Before we discuss a case where NumPy arrays become slow like snails, it is worthwhile to verify the assumption that NumPy arrays are generally faster than lists.

To do that, we will calculate the mean of 1 million element array using both NumPy and lists. The array is randomly generated.

The following code is an example:

"""General comparison between NumPy and lists"""import numpy as np
from time import time
#Random numpy array
numpy_array = np.random.rand[1000000]
list_conv = list[numpy_array]
#Start timing NumPy compuation
start1 = time[]
#Compute the mean using NumPy
numpy_mean = np.mean[numpy_array]
print[f"Computing the mean using NumPy: {numpy_mean}"]
#End timing
end1 = time[]
#Time taken
time1 = end1 - start1
print[f"Computation time: {time1}"]
#Start timing list computation
start2 = time[]
#Compute the mean using lists
list_mean = np.mean[list_conv]
print[f"Computing the mean using lists: {list_mean}"]
#End timing
end2 = time[]
#Time taken
time2 = end2 - start2
print[f"Computation time: {time2}"]
#Check results are equal
assert abs[numpy_mean - list_mean]

Chủ Đề