Binary search in dictionary python

A binary search would be an effective way to do something like this, but you're still going to have to move the data from text (just a bunch of bytes, after all) into some other data structure like a list. If you have a program with a short lifetime, or no long-term memory restrictions, it will (probably) be even faster to just load the whole thing into a Python dict at startup (or whenever is appropriate):

# This may not work exactly right for your file format, but you get the idea.
lookup = {}
for line in f:
    if line:
        value, key = line.trim().split():
        lookup[key] = value

Then you access it using Python's builtin dictionaries, which are nice and fast:

def get_value(word):
    return lookup.get(word)

EDIT

If your only option is to read in the whole file for each word, and you're searching for many words, then the time you save by implementing a clever search algorithm is probably going to be somewhat marginal compared to the time you spend opening and reading files over and over. What you really want is a database, which could actually handle this sort of thing quickly. That said, given these parameters, I'd probably do something like this if I had to use the filesystem:

import bisect

# Define searchable (word, value) tuples for every word in the file.
# I'm assuming your files are sorted, but if not, sort this list (SLOW!!)
words = [(w[1], w[0]) for w in (line.strip().split() for line in f if line)]

# Binary search for the word and return its associated value.
def get_value(word):
    idx = bisect.bisect_left(words, (word,None)) # Tuples compare element-wise
    if idx != len(words) and words[idx][0] == word:
        return words[idx][1]
    raise ValueError('word not found')

Finally, I notice you're using gzipped files, which is sensible if storage space is an issue, but it's going to slow your process down even more. Once again I have to suggest a database. In any case, I don't know if you're even having trouble here, but just in case, reading gzipped files is not really any "harder" than reading normal files. Just take a look at the gzip module. Basically, gzip files work just like regular files, so you can still write for line in file and so on.

Today, we will learn a very fast searching algorithm – the binary search algorithm in Python. We will see its logic, how to write it in Python and what makes it so fast.

There is one thing to note before starting, the algorithm requires that the given list should be sorted. This is because we can find if a number is after or before a certain another number in a list based on the list’s sorting.

Recall how we find words in a dictionary or page numbers in a book. We simply go to a point in the sequence and check if what we need to find is after or before that point, we make guesses like this until we have found the item.

Similarly, in Binary Search, we start by looking at the centre of the list. Either we will find the item there, in which case the algorithm is over, or we will know whether the item is after or before the middle item based on how the list is sorted.

After this, we will simply ignore the half which is not supposed to have the item we need. And we repeat this process by going to the middle of the other half.

Eventually, we will either find the item or there will be no more halves to eliminate, which will end the algorithm either successfully or unsuccessfully.

Notice that we are dividing the list into two halves and then eliminating one half, because of this behaviour of the algorithm, it is aptly named Binary Search.

Merriam-Webster Dictionary’s meaning of “Binary”: a division into two groups or classes that are considered diametrically opposite.

Recommended read: Binary search tree algorithm in Python

Theoretical Example of the Binary Search Algorithm

Let us take an example to understand it better:

Given List: 11, 23, 36, 47, 51, 66, 73, 83, 92
To find: 23

  • The list has 9 items, so the center one must be in position 5, which is 51.
  • 51 is not equal to 23, but it is more than 23. So if 23 is there in the list, it has to be before 51. So we eliminate 51 and all items after it.
  • Remaining List: 11, 23, 36, 47
  • Now we have 4 items in the list, and depending on how you calculate the center index, it will either tell us that 2 is the center position or 3 is the center position.
  • For simplicity, we will calculate the mean of the start and end positions to get the center.
  • Here, start = 1 and end = 4, so the mean is 2 (integer part of 2.5).
  • So, in position 2, we have 23, which is the item we needed to find. And the algorithm will end and give us the target’s position.

Now let us see how the binary search algorithm is coded in Python.

Binary Search in Python

def binary_search(lst, target):
    start = 0
    end = len(lst) - 1
    while(start <= end):
        mid = (start + end) // 2
        if(lst[mid] > target):
            end = mid - 1
        elif(lst[mid] < target):
            start = mid + 1
        else:
            return mid
    return None

Let us go through the algorithm,

  • We create a function that takes two arguments, the first one is the list and the second one is the target that we need to find.
  • We declare two variables start and end that point to the start (0) and end (length – 1) of the list respectively.
  • These two variables are responsible for eliminating items from the search because the algorithm won’t consider items outside this range.
  • The next loop will continue to find and eliminate items as long as the start is less than or equal to the end because the only way the start becomes greater than the end is if the item is not on the list.
  • Inside the loop, we find the integer value of the mean of start and end, and consider that as the middle item of the list.

Now, if the middle item is more than the target, it means that the target can only be present before the middle item. So we set the end of the list as the index before the middle, this way, all indexes after mid, including mid, are eliminated from consideration.

Similarly, if the middle item is less than the target, it means that the target can only be present after the middle item, and in order to eliminate the index mid and all indexes before mid, we set the start variable as the index after mid.

If none of the above two cases is true, i.e. if the item at the middle is neither greater nor smaller than the target, then it must be the target. So we simply return the index of this middle item and end the algorithm.

If the loop finishes, then that means that the target was not found, this means that the target was not in the list and the function simply returns None.

Let us see the code perform and check its output.

The Output

Binary search in dictionary python
Binary Search example

We can see that 23 was present in the list numbers, so the function returned its index, which is 2, but 70 was not present in the list, and so the function returned None.

What Makes Binary Search Fast?

Consider a simple searching algorithm like linear search where we have to go through each item till we find what we are looking for. This means that for larger input sizes, the time it takes to find an item increases by as much amount as the input size increases. Quantifiably, its time complexity is O(n).

Time complexity is a way of quantifying how fast or efficient an algorithm is. In the case of Binary Search, its time complexity is “O(log2n)“, which means that if we double the size of the input list, the algorithm will perform just one extra iteration.

Similarly, if the input size is multiplied by a thousand, then the loop will just have to run 10 more times.

Recall that in every iteration, half of the list is eliminated, so it doesn’t take very long to eliminate the entire list.

Conclusion

In this tutorial, we studied what Binary Search is, how it got its name, what it exactly does to find items, and how is it so fast. We discussed its efficiency in terms of time complexity, and we saw how to code it in Python.

Binary Search is one of many search algorithms, and it is one of the fastest. I hope you enjoyed learning about Binary Search and see you in the next tutorial.