Python find similarity between two lists
Now you can compare these by entries or frequencies:
You can calculate their cosine similarity using:
The closer to 1 that value, the more similar the two lists are.
The cosine similarity is one score you can calculate. If you care about the length of the list, you can calculate another; if you keep that score between 0.0 and 1.0 as well you can multiply the two values for a final score between -1.0 and 1.0.
For example, to take relative lengths into account you could use:
and then combine into a function that takes the lists as inputs:
For your two example lists, that results in:
You can mix in other metrics as needed.
Sometimes, while working with Python list, we have a problem in which we need to find how much a list is similar to other list. The similarity quotient of both the list is what is required in many scenarios we might have. Let’s discuss a way in which this task can be performed.
Method : Using
The original list 1 is : [1, 4, 6, 8, 9, 10, 7] The original list 2 is : [7, 11, 12, 8, 9] Percentage similarity among lists is : 33.33333333333333
A while ago I wrote a guide on how to compare two dictionaries in Python 3, and how this task is not as simple as it might sound. It turns out comparing two lists in Python is just so tricky as comparing
The way we've been taught to compare two objects in Python is a bit misleading. Most books and tutorials teach object comparison by using
The list goes on and on, and for all of these use cases using
That's what we are going to see in this article. We’ll learn the best ways of comparing two lists
in Python for several use cases where the
Ready? Let's go!
Comparing if two lists are equal in python
The easiest way to compare two lists for equality is to use the
An example of a simple case would be a list of
Pretty simple, right? Unfortunately, the world is complex, and so is production grade code. In the real world, things get complicated really fast. As an illustration, consider the following cases.
Suppose you have a list of floating points that is built dynamically. You can add single elements, or
elements derived from a mathematical operation such as
Clearly, floating point arithmetic has its limitations, and sometimes we want to compare two lists but ignore precision errors, or even define some tolerance. For cases like this, the
Things can get more complicated if the lists have custom objects or objects from other libraries, such as
You might also like to compare the lists and return the matches. Or maybe compare the two lists and return the differences. Or perhaps you want to compare two lists ignoring the duplicates, or compare a list of dictionaries in Python.
In every single case, using
Comparing two lists of float numbers
In the previous section, we saw that floating point arithmetic can cause precision errors. If we have a list of floats and want to compare it with another list, chances are that the
Let's revisit the example from the previous section and see what is the best way of comparing two lists of floats.
As you see,
There are a few ways of doing approaching this task. One would be to create our own custom function, that iterates over the elements and compare it one by one using the
Fortunately we don't have to reinvent the wheel. As I showed in the
"how to compare two dicts" article, we can use a library called
The example below starts off by setting up the two lists we want to compare. We then pass it to the
Since we want to ignore the precision error, we can set the number of digits AFTER the decimal point to be used in the comparison.
The result is an empty dict, which means the lists are equal. If we try comparing a list with a float number that differs in more than 3 significant digits, the library will return that diff.
For reproducibility, in this article I used the latest version of
Comparing if two lists without order (unordered lists) are equal
Lists in Python are unordered by default. Sometimes we want to compare two lists but treat them as the same as long as they have the same elements—regardless of their order.
There are two ways of doing this:
These first two methods assume the elements can be safely compared using the
Sorting the lists and using the == operator
You can sort lists in Python in two different ways:
The first method sorts a list in place, and that means your list will be modified. It's a good idea to not modify a list in place as it can introduce bugs that are hard to detect.
Let's see how it works.
As a consequence, by sorting the lists first we ensure that both lists will have the same order, and thus can be compared using the
Converting the lists to a set
Contrary to lists, sets in Python don’t care about order. For example, a set
To do so, we convert each list into a set, then using the
Using the deepdiff library
This library also allows us to ignore the order in sequences such as
How to compare two lists and return matches
In this section, we'll see how we can compare two lists and find their intersection. In other words, we want to find the values that appear in both.
To do that, we can once more use a
How to compare two lists in python and return differences
We can the find difference between two lists in python in two different ways: