Python generators – saving time and memory

Python version 2.2 introduced support for generator functions. A generator is a function that returns something kind of like a list. You can think of a generator as a function that gets to return more than once; each time it returns, it produces a new item in the list. Here is a simple example of a generator:

def mygen() :
  print "calling generator"
  for x in [1, 2, 3, 4] :
    print "yielding next value"
    yield x

for item in mygen() :
  print item

This example prints the following:

calling generator
yielding next value
yielding next value
yielding next value
yielding next value

Instead of return, we used the yield statement inside our generator. yield returns an item from the generator, but marks that point in the code so that we continue processing when we go back to the generator to fetch the next item in the pseudo-list. Notice how the generator is only called once, but the yield points are interleaved with the print statements in the calling code. Each time the generator needs to produce a new value, it picks up from the previous yield point. When the generator reaches the end of the function, no more values are produced. You cannot use return within a generator.

Let’s look at some code where the use of generators might help us. The following code instantiates a list of objects and then creates HTML to display them. We’ll assume the existence of a database API to execute queries and retrieve results:

def get_objects() :
  result = []
  query = db_execute("...")
  row = query.fetchrow()
  while not row is None :
    result.append(MyObject(row))  # Build object from DB, append to result
    row = query.fetchrow()
  return result
. . .
for object in get_objects() :
  print object.getHTML()

This code creates the entire list of objects before it can print any of them. There are two problems with this — first, there is a huge delay to create all of the objects before any progress is made in printing; this means that the user’s browser has no partial results to display. Second, all of the objects must be held in memory at the same time; if there are many objects, this can cause significant overhead.

Generators allow us to attack both of these problems. Since a generator produces items one at a time, on demand, it avoids both of these problems. We don’t have to wait to construct all of the objects in the list before we use the first one. And once we are done using an object, the Python garbage collector is now free to immediately clean it up. Here’s how we might rewrite the above code to use generators:

def get_objects() :
  query = db_execute("...")
  row = query.fetchrow()
  while not row is None :
    yield MyObject(row)    # Build object from DB, yield to caller
    row = query.fetchrow()
. . .
for object in get_objects() :
  print object.getHTML()

With a small change we have now significantly improved our code’s memory footprint — all of the objects do not need to be held in memory at the same time. And we are now producing the output for each object as we create it, without needing to wait for all the objects to be instantiated first. This is a significant improvement!

We can also build a chain of generators. Let’s assume that we need to modify the code above to optionally display only those objects that have an “active” flag set. Ordinarily, if we use Python’s filter function to accomplish this, it needs to create the entire list all at once. But if we use a generator to perform the filtering, then we can still keep our optimizations since we are only ever creating and filtering one object at a time. Here’s an example:

my_objs = get_objects()         # This returns a generator object

if display_active_only :
  def active_filter(objects) :  # A filtering generator
    for object in objects :
      if :
        yield object

  my_objs = active_filter(my_objs)

for object in my_objs :
  print object.getHTML()

A generator function doesn’t produce a real list; instead, it produces a generator object that behaves like something called an iterator. You can’t write either of the following statements for a generator or iterator:

print len(get_objects())
print get_objects()[3]

The for statement is smart enough to traverse a generator, and it will probably be sufficient for your needs. Perhaps you can get the count of objects by other means, such as executing an SQL COUNT request. If you absolutely need to access a generator as a list, you can coerce it to a list as follows:

objlist = list(get_objects())

But be aware that this removes all of the advantages that we’ve discussed here, since this causes all of the objects returned by the generator to be created at once and stored in the list. If you find yourself needing to do this, you should consider rewriting your code so that you don’t need to do so. Or perhaps generators aren’t the right solution for your particular problem.

Functional Python

A friend of mine is learning Python and was curious about functional programming, so I have written this brief tutorial. Python isn’t a full-fledged functional language, but it supports some very useful functional idioms.

It’s best to approach this tutorial by programming along at the Python interactive prompt. Try typing everything in to see what the results are.


Imagine you have a list of numbers and want to filter out all even numbers:

q = [1,2,5,6,8,12,15,17,20,23,24]

def is_even(x) :
  return x % 2 == 0

result = filter(is_even, q)

(Remember that the “%” operator is the modulus or remainder operator. If the remainder when a number is divided by two is zero, then the number must be even.)

If we only use is_even once, it’s kind of annoying that we have to define it. Wouldn’t it be nice if we could just define it inside of the call to filter? We want to do something like “q = filter(x % 2 == 0, q)“, but that won’t quite work, since the “x % 2 == 0” is evaluated before the call to filter. We want to pass a function to filter, not an expression.

We can do this using the lambda operator. lambda allows you to create an unnamed throw-away function. Try this:

q = [1,2,5,6,8,12,15,17,20,23,24]
result = filter(lambda x : x % 2 == 0, q)

lambda tells Python that a function follows. Before the colon (:) you list the parameters of the function (in this case, only one, x). After the colon you list what the function returns. This function doesn’t have a name, and Python disposes of it as soon as filter is done with it.

You can, however, assign the function to a variable to give it a name. The following two statements are identical:

def is_odd(x) :
  return x % 2 == 1
is_odd = lambda x : x % 2 == 1


Here’s another example. The map function applys a function to every item in a list, and returns the result. This example adds 1 to every element in the list:

result = map(lambda x : x + 1, [1,2,3,4,5])

(At this point, result holds [2,3,4,5,6].)

The map function can also process more than one list at a time, provided they are the same length. This allows you to do things like add all of the elements in a pair of lists. Note that our lambda here has two parameters:

result = map(lambda x,y : x+y, [1,2,3,4,5], [6,7,8,9,10])

(At this point, result holds [7, 9, 11, 13, 15].)


Python also provides the reduce function. This is a bit more complicated; you provide it a list and a function, and it reduces that list by applying the function to pairs of elements in the list. An example is the best way to understand this. Let’s say we want to find the sum of all the elements in a list:

sum = reduce(lambda x,y : x+y, [1,2,3,4,5])

reduce will first apply the function to 1,2, yielding 3. It will then apply the function to 3,3 (the first 3 is the result of adding 1+2), yielding 6. It will then apply the function to 6,4 (the 6 is the result of adding 3+3), yielding 10. Finally, it will apply the function to 10,5, yielding 15. The result stored in sum is 15.

Similarly, we can find the cumulative product of all items in a list. The following example stores 120 (=1*2*3*4*5) in product:

product = reduce(lambda x,y : x*y, [1,2,3,4,5])


Beginning in Python 2.5 you can even express conditions within your lambdas, using Python’s conditional expressions. So, for example:

reciprocals = map(lambda x : 1.0/x if x !=  0 else None, [0, 1, 2, 3, 4, 5])

At this point, reciprocals holds the list [None, 1.0, 0.5, 0.333..., 0.25, 0.2]. The expression “1/x if x != 0 else None” is the conditional expression, and it is used to prevent division by zero. If x is not 0, then the result is 1/x, but otherwise it is None.


One of the advantages of lambda-functions is that they allow you to write very concise code. A result of this, however, is that the code is dense with meaning, and it can be hard to read if you are not accustomed to functional programming. In our first example, filter(is_even, q) was fairly easy to understand (fortunately, we chose a descriptive function name), while filter(lambda x : x % 2 == 0, q) takes a little longer to comprehend.

If you are a masochistic mathematical geek, you’ll probably enjoy this (I do). If not, you might still prefer to break things into smaller pieces by defining all of your functions first and then using them by name. That’s ok! Functional programming isn’t for everyone.

Another advantage of functional programming is that it corresponds very closely to the mathematical notion of functions. Note that our lambda-functions didn’t store any results into variables, or have any other sort of side effect. Functions without side effects cause fewer bugs, because their behavior is deterministic (this is related to the dictum that you aren’t supposed to use global variables). It is also much easier to mathematically prove that such functions do what you think they are doing. (Note that it’s still possible to write a lambda function that has side effects, if the lambda function calls another function that has side effects; for example lambda x : sys.stdout.write(str(x)) has the side effect of printing its parameter to the screen.)