N choose k recursive python

The recursion is based on a simple observation, for which I will give a combinatorial argument, as to why it is true, rather than a mathematical proof through formulae.

Whenever you choose k elements out of n, there are two cases:

  1. You choose element #n
  2. You don't choose element #n

Since these events are mutually exclusive, the total amount of combinations is given by the amount of combinations when choosing #n, and those when you don't choose #n.

Choosing element #n

Since we have already chosen one element, we need only choose another k-1 elements. Also, since we have decided upon one element – as to whether it is included or not – already, we only need to consider the remaining n-1 elements.

Thus, the amount of combinations for choosing element #n is given by

    subset[n - 1, k - 1]

Not choosing element #n

There are still k elements to choose, but since we have already made up our mind about element #n, there remain only n - 1 elements to choose from. Thus:

    subset[n - 1, k]

The base case

The recursion uses the fact, that we can usually differentiate between two situations, solutions where element n is part of that solution, and those where it is not.

However, such a distinction can not always be made:

  • When choosing all elements [corresponding to case n == k in code below]
  • or when choosing no elements at all [corresponding to case k == 0 in code below]

In these cases, there is only exactly one solution, hence

if k == 0:
    return 1
if n == k:
    return 1

Ensuring it works

To do that, we need to convince ourselves [or prove] that the base case is always hit at some point.

Let us assume, that n < k at some point. Since per our assumption, n was originally greater or equal to k, there must have been some point where n = k, because n and k decrease in unison or only n decreases by one, i.e. it follows

This implies, that there must have been a call to subset[n - 1, k] for it to happen, that n decreases below k. However, this is not possible since we have a base case on n = k where we return a constant 1.

We conclude that either n decreases at some point such that n = k, or decrease in unison exactly k times such that k = 0.

Thus, the base case works.

Combinations

A great many problems in math and computer science have recursive solutions. In this section of the lecture we will study a problem from mathematics that has an elegant recursive solution.

The combination symbol or "n choose k" has many applications in mathematics. In probability theory

counts the number of ways that we can get k heads in a sequence of n flips of a fair coin.

is computed via the formula

Two special cases are easy to compute.

One way to compute a combination is to compute all the factorials involved and then divide the denominator into the numerator. This is not practical for even moderately large value of n because the factorials grow very quickly as a function of n. Instead, the most common method for computing a combination makes use of this recursive relationship:

This recursive relationship coupled with the two special cases makes it possible to construct a recursive function definition in Python to compute a combination.

def C[n,k]:
  if k == 1:
    return n
  if k == n:
    return 1
  return C[n-1,k-1]+C[n-1,k]

This works, and does compute the correct value for a combination.

This approach does have one flaw. Consider what happens when we try to compute C[8,5].

C[8,5] calls C[7,4] and C[7,5]
C[7,4] calls C[6,3] and C[6,4]
C[7,5] calls C[6,4] and C[6,5]
C[6,3] calls C[5,2] and C[5,3]
C[6,4] calls C[5,3] and C[5,4]
C[6,4] calls C[5,3] and C[5,4]
C[6,5] calls C[5,4] and C[5,5]

What you start to notice after a while is that the same functions are getting called multiple times. This effect becomes so severe for large n and k that the simple recursive function shown above becomes immensely inefficient.

We can construct a program that illustrates just how bad this problem is. The trick is to declare a global two dimensional array

counts = np.zeros[[18,18],dtype=int]

that stores information about how many times we call C with each pair of parameter values n and k.

We then modify the function that computes the combination to record how many times it gets called with each pair of parameters.

def C[n,k]:
  counts[n][k] += 1;

     if k == 1:
    return n
  if k == n:
    return 1
  return C[n-1,k-1]+C[n-1,k]

At the end of the program we dump this count data to a file.

def saveCounts[]:
  with open['counts.txt','w'] as f:
    for row in counts:
      for col in row:
        f.write['{:5d}'.format[col]]
      f.write['\n']

The results are shocking. Here is the set of counts that results when we try to compute C[16,10]:

0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0 1287 1287    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0  495 1287  792    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0  165  495  792  462    0    0    0    0    0    0    0    0    0    0    0    0    0
    0   45  165  330  462  252    0    0    0    0    0    0    0    0    0    0    0    0
    0    9   45  120  210  252  126    0    0    0    0    0    0    0    0    0    0    0
    0    1    9   36   84  126  126   56    0    0    0    0    0    0    0    0    0    0
    0    0    1    8   28   56   70   56   21    0    0    0    0    0    0    0    0    0
    0    0    0    1    7   21   35   35   21    6    0    0    0    0    0    0    0    0
    0    0    0    0    1    6   15   20   15    6    1    0    0    0    0    0    0    0
    0    0    0    0    0    1    5   10   10    5    1    0    0    0    0    0    0    0
    0    0    0    0    0    0    1    4    6    4    1    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    1    3    3    1    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    1    2    1    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    1    1    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    1    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0

What we can read off from this table is that C[2,1], C[2,2], and C[3,2] each got called a total of 1287 times in the process of computing C[16,10]. This problem of redundant function calls gets exponentially worse as n and k get larger. By the time you get to n = 30 this simple recursive method for computing C[n,k] becomes completely impractical.

Remembering results

The simple fix needed to eliminate redundant calls to a recursive function is called memoization. The idea is to remember any values we compute, so that the next time we are asked to compute a value we can look it up instead of recomputing it. This works best for recursive functions with a moderate number of integer parameters. In that case, we can remember values we have computed earlier by storing the values in a table. Whenever we are asked to compute a value, we start by consulting the table.

In the case of the function to compute combinations, we proceed as follows. We start by making a global two dimensional array to hold the remembered function values.

c = np.zeros[[100,100],dtype=int]

We then rewrite the recursive function to use the table. Whenever we are asked to compute C[n,k] for a particular n and k, we start by checking the table to see if has already been computed. If it hasn't already been computed, we compute it and save the value in the table before returning with the result.

def C[n,k]:
  result = c[n][k]
  if result == 0:
    if k == 1:
      result = n
    elif k == n:
      result = 1
    else:
      result = C[n-1,k-1]+C[n-1,k]
    c[n][k] = result
  return result

Replacing recursion with a table

An interesting side effect of the memoized solution is that it ends up filling up a table c[n][k] with values for C[n,k].

A more direct approach to doing the combinations problem efficiently is to just write code that fills the table with the right values.

We can do this with a function

def initTable[]:
  # Fill the table with correct values
  # for c[n][k] for all values of n and k

Once the c array is properly filled with values, we can rewrite the function to compute combinations

def C[n,k]:
  if k 

Chủ Đề