Difference between numpy array and list with example

Python Lists Are Sometimes Much Faster Than NumPy. Heres Proof.

Difference between numpy array and list with example
Photo by Braden Collum on Unsplash

I have been recently working on a digital image processing project. Hyperparameter tuning took quite some time before I got the desired accuracy. All because of the overfitting parasite and my useless low-end hardware.

For each execution, my machine took approximately 1520 min. 20 min to process 20 000 entries. I imagine if I had been working on a 1 million record dataset, I would have had to wait for the earth to do a complete rotation before the end of the training.

I was satisfied with the accuracy of the model. Yet, I wanted to try many other Convolutional Neural Network (CNN) architectures before sending in my code. Therefore, I decided to look for optimization rooms in my code.

Because I was using the pre-built machine learning algorithms residing in PyPi Scikit-Learn and Tensorflow very few subroutines were left to optimize. One option was to boost my code in terms of data structure. I was storing data in lists, and since NumPy is super fast, I thought using it might be a viable option.

Guess what happened after converting my list code to NumPy array code?

Much to my surprise, the execution time didnt shrink. Rather it soared.

That being said, in this post, I will walk you through the exact situation where lists ended up performing way better than NumPy arrays.

NumPy & Lists

Let us discuss the difference between NumPy arrays and lists, to begin with.

NumPy is the de-facto Python library for N-dimensional arrays manipulation and computational computing. It is open-source, easy to use, memory friendly, and lightning-fast.

Originally known as Numeric, NumPy sets the framework for many data science libraries like SciPy, Scikit-Learn, Panda, and more.

While Python lists store a collection of ordered, alterable data objects, NumPy arrays only store a single type of object. So, we can say that NumPy arrays live under the lists umbrella. Therefore, there is nothing NumPy arrays do lists do not.

However, when it comes to NumPy as a whole. Numpy covers not only arrays manipulation but also many other routines such as binary operations, linear algebra, mathematical functions, and more. I believe it covers more than one can possibly need.

The next thing to consider is why we usually use NumPy arrays over lists.

The short answer, which I believe everybody reading this post knows, is: it is faster.

NumPy is indeed ridiculously fast, though Python is known to be slow. This is because NumPy serves as a wrapper around C and Fortran. And needless to say how fast these two are.

NumPy Arrays Are Faster Than Lists

Before we discuss a case where NumPy arrays become slow like snails, it is worthwhile to verify the assumption that NumPy arrays are generally faster than lists.

To do that, we will calculate the mean of 1 million element array using both NumPy and lists. The array is randomly generated.

The following code is an example:

"""General comparison between NumPy and lists"""import numpy as np
from time import time
#Random numpy array
numpy_array = np.random.rand(1000000)
list_conv = list(numpy_array)
#Start timing NumPy compuation
start1 = time()
#Compute the mean using NumPy
numpy_mean = np.mean(numpy_array)
print(f"Computing the mean using NumPy: {numpy_mean}")
#End timing
end1 = time()
#Time taken
time1 = end1 - start1
print(f"Computation time: {time1}")
#Start timing list computation
start2 = time()
#Compute the mean using lists
list_mean = np.mean(list_conv)
print(f"Computing the mean using lists: {list_mean}")
#End timing
end2 = time()
#Time taken
time2 = end2 - start2
print(f"Computation time: {time2}")
#Check results are equal
assert abs(numpy_mean - list_mean) <= 10e-6, "Alert, means are not equal"

My machine output is as follows:

Computing the mean using NumPy: 0.4996098756973947
Computation time: 0.01397562026977539
Computing the mean using lists: 0.4996098756973947
Computation time: 0.17974257469177246

As predicted, we can see that NumPy arrays are significantly faster than lists. The considerable speed difference is noticeable.

That said, can we generalize and say that NumPy arrays are always faster than lists?

It turns out that NumPy arrays do not always overtake lists. Lists, too, have tricks up their sleeves, which brings us to the next session.

NumPy Arrays Are NOT Always Faster Than Lists

If lists had been useless compared to NumPy arrays, they would have probably been dumped by the Python community.

An example where lists rise and shine in comparison with NumPy arrays is the append() function. "append()" adds values to the end of both lists and NumPy arrays. It is a common and very often used function.

The script below demonstrates a comparison between the listsappend() and NumPys append() . The code simply adds numbers from 0 to 99 999 to the end of a list and a NumPy array.

"""numpy.append() vs list.append()"""
import numpy as np
from time import time
def numpy_append():
arr = np.empty((1, 0), int)
for i in range(100000):
arr = np.append(arr, np.array(i))
return arr
def list_append():
list_1 = []
for i in range(100000):
list_1.append(i)
return list_1
def main ():
#Start timing numpy array
start1 = time()
new_np_arr = numpy_append()
#End timing
end1 = time()
#Time taken
print(f"Computation time of the numpy array : {end1 - start1}")
#Start timing numpy array
start2 = time()
new_list = list_append()
#End timing
end2 = time()
#Time taken
print(f"Computation time of the list: {end2 - start2}")
#Testing
assert list(new_np_arr) == new_list, "Arrays tested are not the same"
if __name__ == "__main__":
main()

My machine produces the following output:

Computation time of the numpy array : 2.779465675354004
Computation time of the list: 0.010703325271606445

As we can see, in this example, lists performed way better than NumPy arrays. Numpy has poorly performed to the point that it has been overtaken by over 2000 %.

The case demonstrates that NumPy should not be considered as the always go-to option whenever speed is involved. Rather, careful consideration is needed.

What is Wrong with Numpy.append?

To unravel this mystery, we will visit NumPys source code. The docstring of the append() function tells the following:

"Append values to the end of an array. Parameters
----------
arr : array_like
Values are appended to a copy of this array.
values : array_like
These values are appended to a copy of `arr`. It must be of
the correct shape (the same shape as `arr`, excluding
`axis`). If `axis` is not specified, `values` can be any
shape and will be flattened before use.
axis : int, optional
The axis along which `values` are appended. If `axis` is
not given, both `arr` and `values` are flattened before use.
Returns
-------
append : ndarray
A copy of `arr` with `values` appended to `axis`. Note that
`append` does not occur in-place: a new array is allocated
and filled. If `axis` is None, `out` is a flattened array."

After a thorough read of the docstring, we can see a note on what the function returns. It states that the appending process does not occur in the same array. Rather a new array is created and filled.

In lists, however, things are very different. The list filling process stays within the list itself, and no new lists are generated.

In summary, we can see that the copy-fill process of numpy.append() is what makes it an overhead.

Takeaways

There is no snake oil in programming.

Our findings establish the fact that NumPy arrays are not the cure for every performance issue. And one should not blindfoldedly jump into action before taking all options into account. Doing this also maximizes the chances of producing a better-designed code.

Additionally, exploring many paths via experimentation ensures that one will not eventually regret picking one choice over the other. And more importantly, experimentation spots misinformation.

In conclusion, and as a bonus for sticking with me to this point, here is an insightful quote from A. Einstein that summarises the point from this post.

No amount of experimentation can ever prove me right; a single experiment can prove me wrong Albert Einstein.

Enjoy your programming day!