Python combinations

In September 2007, Brian Adkins posted a simple Logo program to print all possible combinations of lists of items, and asked for alternatives in other languages. He provided a Ruby alternative which would translate fairly easily into Python. For my own Python implementation I decided to try a completely different approach:

prod1   = lambda elem, vec : map(lambda x : elem+' '+x, vec)
xprod   = lambda vec1, vec2 : reduce(list.__add__, map(lambda elem : prod1(elem ,vec2), vec1))
choices = lambda x : 'n'.join(reduce(xprod, x))+'n'

q = [['small', 'medium', 'large'],
     ['vanilla', 'ultra chocolate', 'lychee', 'rum raisin', 'ginger'],
     ['cone', 'cup']]

print choices(q),

Blocking ads

For awhile now I’ve been using a tool to block advertisements in my web browsing. Here’s how to install it:

  1. Do you use “Firefox” to surf the web? If not, I recommend it; you can get it here. This ad blocking tool only works with Firefox (though I understand there are ad blockers for other browsers).
  2. Once you have Firefox installed, visit Adblock Plus to install the ad blocker:
    1. Click on the “Install Adblock Plus” link.
    2. A bar will pop up indicating that the install was blocked.
    3. Click the “Options” button on the bar and enable installations for this website.
    4. Click on the “Install Adblock Plus” link again, then click on the “Install” button.
  3. When it asks you which blocking list to subscribe to, choose the “EasyList” blocking list. Once a week your computer will check this list to see if there are any new known ads to be blocked.

After you do this you should rarely see ads in Firefox.

Resume keywords

I’ve had my resume posted online and from time to time I receive a solicitation for job openings. Over time I’ve paid attention to my referral logs to see how folks find my resume. Almost every search includes the following keywords, and almost always they are required keywords:

+objective +education +experience

A large majority of searches also exclude the following keywords, which was a little surprising to me. I imagine that these keywords are meant to exclude job listings. If you post your resume online you should take care not to use these words!

-job -career

I have also seen the following keywords and exclusions. They are less common than the above, but it is worth considering them:

  • +senior +programmer
  • +engineer developer
  • engineer developer
  • intitle:Resume OR inurl:Resume
  • “application programmer” OR “application engineer”
  • -sample -example
  • -your -we
  • -opportunities
  • -apply -submit
  • -tips -guidelines
  • -interview
  • -service

It’s possible that I’m missing other common searches because my own resume doesn’t meet the search criteria! If you are aware of other common search terms please let me know.

Functional Python

A friend of mine is learning Python and was curious about functional programming, so I have written this brief tutorial. Python isn’t a full-fledged functional language, but it supports some very useful functional idioms.

It’s best to approach this tutorial by programming along at the Python interactive prompt. Try typing everything in to see what the results are.

Introduction

Imagine you have a list of numbers and want to filter out all even numbers:

q = [1,2,5,6,8,12,15,17,20,23,24]

def is_even(x) :
  return x % 2 == 0

result = filter(is_even, q)

(Remember that the “%” operator is the modulus or remainder operator. If the remainder when a number is divided by two is zero, then the number must be even.)

If we only use is_even once, it’s kind of annoying that we have to define it. Wouldn’t it be nice if we could just define it inside of the call to filter? We want to do something like “q = filter(x % 2 == 0, q)“, but that won’t quite work, since the “x % 2 == 0” is evaluated before the call to filter. We want to pass a function to filter, not an expression.

We can do this using the lambda operator. lambda allows you to create an unnamed throw-away function. Try this:

q = [1,2,5,6,8,12,15,17,20,23,24]
result = filter(lambda x : x % 2 == 0, q)

lambda tells Python that a function follows. Before the colon (:) you list the parameters of the function (in this case, only one, x). After the colon you list what the function returns. This function doesn’t have a name, and Python disposes of it as soon as filter is done with it.

You can, however, assign the function to a variable to give it a name. The following two statements are identical:

def is_odd(x) :
  return x % 2 == 1
 
is_odd = lambda x : x % 2 == 1

map

Here’s another example. The map function applys a function to every item in a list, and returns the result. This example adds 1 to every element in the list:

result = map(lambda x : x + 1, [1,2,3,4,5])

(At this point, result holds [2,3,4,5,6].)

The map function can also process more than one list at a time, provided they are the same length. This allows you to do things like add all of the elements in a pair of lists. Note that our lambda here has two parameters:

result = map(lambda x,y : x+y, [1,2,3,4,5], [6,7,8,9,10])

(At this point, result holds [7, 9, 11, 13, 15].)

reduce

Python also provides the reduce function. This is a bit more complicated; you provide it a list and a function, and it reduces that list by applying the function to pairs of elements in the list. An example is the best way to understand this. Let’s say we want to find the sum of all the elements in a list:

sum = reduce(lambda x,y : x+y, [1,2,3,4,5])

reduce will first apply the function to 1,2, yielding 3. It will then apply the function to 3,3 (the first 3 is the result of adding 1+2), yielding 6. It will then apply the function to 6,4 (the 6 is the result of adding 3+3), yielding 10. Finally, it will apply the function to 10,5, yielding 15. The result stored in sum is 15.

Similarly, we can find the cumulative product of all items in a list. The following example stores 120 (=1*2*3*4*5) in product:

product = reduce(lambda x,y : x*y, [1,2,3,4,5])

Conditionals

Beginning in Python 2.5 you can even express conditions within your lambdas, using Python’s conditional expressions. So, for example:

reciprocals = map(lambda x : 1.0/x if x !=  0 else None, [0, 1, 2, 3, 4, 5])

At this point, reciprocals holds the list [None, 1.0, 0.5, 0.333..., 0.25, 0.2]. The expression “1/x if x != 0 else None” is the conditional expression, and it is used to prevent division by zero. If x is not 0, then the result is 1/x, but otherwise it is None.

Background

One of the advantages of lambda-functions is that they allow you to write very concise code. A result of this, however, is that the code is dense with meaning, and it can be hard to read if you are not accustomed to functional programming. In our first example, filter(is_even, q) was fairly easy to understand (fortunately, we chose a descriptive function name), while filter(lambda x : x % 2 == 0, q) takes a little longer to comprehend.

If you are a masochistic mathematical geek, you’ll probably enjoy this (I do). If not, you might still prefer to break things into smaller pieces by defining all of your functions first and then using them by name. That’s ok! Functional programming isn’t for everyone.

Another advantage of functional programming is that it corresponds very closely to the mathematical notion of functions. Note that our lambda-functions didn’t store any results into variables, or have any other sort of side effect. Functions without side effects cause fewer bugs, because their behavior is deterministic (this is related to the dictum that you aren’t supposed to use global variables). It is also much easier to mathematically prove that such functions do what you think they are doing. (Note that it’s still possible to write a lambda function that has side effects, if the lambda function calls another function that has side effects; for example lambda x : sys.stdout.write(str(x)) has the side effect of printing its parameter to the screen.)

Gödel, Escher, Bach: an Eternal Golden Braid

Hofstadter, Douglas. Gödel, Escher, Bach: an Eternal Golden Braid. New York: Basic Books, 1999.

This book is an excellent and fun, if lengthy, romp through art (visual, literary, and musical), mathematics, logic, provability and computability, linguistics, cognitive science and artificial intelligence, and more. Hofstadter cleverly explores a myriad of amazing connections between all these fields. And while he ends up drawing no substantive conclusions about his final hypothesis of emergent intelligence, the journey is no less exciting.

I disagree with Hofstadter’s opinion of the nature of intelligence. But I thoroughly enjoyed this book for a number of reasons. First, despite its imposing size, it is accessible; Hofstadter presents mathematical proofs in an easily understood way, using examples, analogies, and much explanatory prose. Second, despite its imposing size, it is delightfully fun; this book is brimming with humor, wit, cleverness, and exciting coincidences. Lastly, it is an excellent introduction to a broad variety of topics.

I first read this book in eighth grade and deeply enjoyed it. In part this book served as my introduction to the exciting fields of logic, mathematics, and computer science.

Python odd word problem

I crafted this compact solution to Dijkstra’s OddWordProblem:

from sys import stdin, stdout
def even(char) : stdout.write(char); return select(even, 0, "");
def odd(char)  : q = select(odd, 0, ""); stdout.write(char); return q
def select(fn, skipspace, prefix) :
  char = stdin.read(1)
  if char.isspace() and skipspace : return select(fn, 1, prefix)
  elif char.isspace()             : return 0
  elif char.isalpha()             : stdout.write(prefix); return fn(char)
  elif char == "."                : return 1
  else                            : raise "Invalid input"

if not select(even, 1, "") :
while not select(odd, 1, " ") and not select(even, 1, " ") : pass
stdout.write(".")

Python list partition

I came up with these simple solutions to RonJeffriesCodingProblem. The first uses Python 2.2’s generators, the second uses the functional idiom, and the third uses list comprehensions. The third is my favorite:

def split1(list, num) :
  for index in range(num) :
    yield list[index * len(list) / num:(index + 1) * len(list) / num]

def split2(list, num) :
  return map(lambda x: list[x * len(list) / num:(x+1) * len(list) / num], range(num))

def split3(list, num) :
  return [list[x*len(list)/num : (x+1)*len(list)/num] for x in range(num)]

Python Timestamp Increment

I composed this concise solution to ncm‘s “coding challenge” in November 2004. I’m not entirely satisfied with its elegance, but I’m pleased with its brevity.

def dateinc(s) :
  def carry(list, modulus) :
    if len(list) == 1 : return [list[0] // modulus[0], list[0] % modulus[0]]
    else              : q = carry(list[1:], modulus[1:]); return [(list[0] + q[0]) // modulus[0], (list[0] + q[0]) % modulus[0]] + q[1:]
  s = reduce(lambda x,y:x+y, map(lambda x : (x >= '0' and x <= '9') * x, s))
  result = map(lambda x,y:x+y, map(int,(s[0:4],s[4:6],s[6:8],s[8:10],s[10:12],s[12:14])),(0,-1,-1,0,0,1))
  modulus = [10000,12,[31,28,31,30,31,30,31,31,30,31,30,31][result[1]],24,60,60]
  if result[0] % 4 == 0 and (result[0] % 100 != 0 or result[0] % 400 == 0) :
    modulus[2] += 1                               # Account for leap day.
  result = carry(result, modulus)[1:]             # Carry the extra second.
  result[1:3] = map(lambda x:x+1, result[1:3])    # Readjust to 1-base.
  return reduce(lambda x,y:x+y, map(lambda x : (x < 10) * '0' + str(x), result))

I further reduced this as follows, turning the carry function into a one-liner that computes the carry in-place (and also using builtins wherever possible). I’m much more satisfied with this solution.

def dateinc(s) :
  s = reduce(lambda x,y:x.isdigit()*x+y.isdigit()*y, s)
  ans = map(int.__add__, map(int,(s[0:4],s[4:6],s[6:8],s[8:10],s[10:12],s[12:14])),(0,-1,-1,0,0,1))
  mod = [10000,12,[31,28,31,30,31,30,31,31,30,31,30,31][ans[1]],24,60,60]
  if ans[0] % 4 == 0 and (ans[0] % 100 != 0 or ans[0] % 400 == 0) : mod[2] += 1
  ans = map(lambda x:(ans[x] + reduce(int.__mul__, map(lambda y:ans[y]>=mod[y]-1,range(x+1,len(ans))),x!=len(ans)-1)) % mod[x], range(len(ans)))
  ans[1:3] = map(int.__add__,ans[1:3],[1,1])      # Readjust to 1-base.
  return reduce(str.__add__, map(lambda x : (x < 10) * '0' + str(x), ans))

Introduction to Pointers in C++

Originally written in February 1998. At least one person seems to have found it useful. It’s a little incomplete, because it doesn’t deal with malloc and free, nor with C++ references. If you find this useful, feel free to copy and pass along, with attribution. Thanks!

Basics

All data and code are stored in memory. The location in memory where they are stored is known as the address of that data or code. Usually they are accessed through variable names that represent them, such as counter, printf, etc. We can, however, also access data using its address, rather than a formal name. This is done using pointers, special variables which store the address of data. Following are several annotated examples of simple pointers at work.

int*   x;                             // Declare x, a pointer to an integer.
int    y;                             // Declare y, an integer.
float* r;                             // Declare r, a pointer to a float.
float  s;                             // Declare s, a float.

x = &y;                               // x gets y's address -- it points to y.
r = &s;                               // r gets s's address -- it points to s.

The next few are a tad trickier. We use the “*” to dereference the pointer. Basically, this means to access whatever it is the pointer is pointing to. You can think of it as counteracting the “*” used in the declaration of the pointer; they neutralize each other, making the result a regular variable.

*x = 15;                              // Set value pointed to by x -- y -- to 15.
cout << *r;                           // Print value pointed to by r: s.

Complications

If it were that simple, of course, then nobody would have trouble with pointers. The fact of the matter is, however, that there are a number of complications — extensions to the idea of pointer — that can be hard to keep track of.

Pointers as Lists

The first is the idea of pointers being equivalent to lists. This is a crucial idea in C and C++. Essentially, instead of thinking of a pointer as pointing to a single variable, you can think of it as pointing to the first variable in a list of variables. Likewise, a list can be accessed without any subscripts to find the pointer to the first element in the list. It works like this:

int* x;
int  y[15];
 
x = new int[8];                       // Allocate array of 8 integers.

*y   = 8;                             // Set first element of y-list to 8.
x[3] = 7;                             // Set fourth element of x-list to 7.

Note how pointer notation can be used for the list y, and how list notation can be used for the pointer x.

This brings up a similar topic: pointer arithmetic. Since a pointer is a memory address, you might think that adding 1 to a pointer would simply make it point to the next byte of memory. The C compiler, however, is smarter than that; it realizes that you’re adding something to a pointer, you probably want to make it point to the next element of whatever you’re pointing at. So instead of adding whatever you specified to the pointer, it adds that times the size of the object the pointer points to. For example:

int* x = new int[8];

x++;                                  // Add four to x pointer, to
                                      //   point at next integer.
x[0] = 5;                             // This was originally the second
                                      //   element in the array.
x--;                                  // Subtract 4 again from pointer.
*(x + 2) = 6;                         // Set third element (second after the
                                      //   first) to 6.

cout << x[1];                         // Will now print "5".
cout << x[2];                         // Prints "6".

Pointers to Pointers (aka “The Middleman”)

Another pointer curiosity that C throws our way is pointers that point to other pointers. This may seem like a needless feature, but it comes in very handy when you have multidimensional data whose size you don’t know before-hand. You can then use these pointers to pointers to set up an arbitrary-sized multidimensional array.

It works like this: you can think of a pointer to a pointer as being essentially a list of lists. It’s kind of like the words in the dictionary: the first pointer tells you where to find each of the lists for the letters of the alphabet. Each letter is then itself a pointer that forms a list (by pointing to the first element) of all of the words beginning with that letter. If you add another dimension (and make a pointer to a pointer to a pointer), you can have each word also be a list, pointing to the first out of several different meanings for the word.

Here’s how it works in C++. The following program reads in a table of numbers and finds the average for each row and column. The first two numbers it reads in tell how many rows and columns there are. The rest of the numbers are the ones in the table.

#include <iostream.h>

void main(void)
{
  int** table;                        // *Two*-dimensional pointer.
  int   rows, cols, i, j, sum;        // Dimensions of table.

  cin >> rows >> cols;                // Find out the number of rows & cols.

// What we'd really like to do here is say:
//   table = new int[rows][cols];
// Unfortunately, this doesn't work in C++, since it instead tries to set up
// a one-dimensional array like this:
//   table = new int[rows * cols];
// And a one-dimensional array (a pointer to integers) is absolutely
// incompatible with a two-dimensional array (a pointer to a pointer to
// integers), so our program will crash.  Note that the syntax above will
// work, however, in Java.

  table = new (int*)[rows];           // Allocate the rows.
  for(i = 0; i < rows; i++)
    table[i] = new int[cols];         // Allocate each row's columns.

  for(i = 0; i < rows; i++)           // Find row sums.
  {
    sum = 0;
    for(j = 0; j < cols; j++)
      sum += table[i][j];
    cout << "Row sum for row " << i << ": " << sum << endl;
  }

  for(j = 0; j < cols; j++)           // Find column sums.
  {
    sum = 0;
    for(i = 0; i < rows; i++)
      sum += table[i][j];
    cout << "Col sum for col " << j << ": " << sum << endl;
  }
}

Notice how we had to explicitly allocate both dimensions of the array, starting with the first dimension, the rows; and then for each row we allocated its columns. You may be wondering why the outer dimension is allocated using “new (int*)[n]”, while the inner dimension is allocated using “new int[n]”. This is because each dimension but the last is a pointer to the next dimension. The final array dimension is obviously simply a list of integers, so in the inner loop we merely allocate a list of integers. The next dimension up, however, is not a list of integers; it’s a list of lists of integers. As such, each entry in this list will itself be a pointer to a list of integers — the final dimension. Therefore, the code allocating the outside dimension must allocate a list of pointers to integers.

If we had additional dimensions, for each column in each row we would then have to allocate the list of cells, and so forth. Here’s how a three-dimensional array allocation might look like. Assume that x, y, and z are the size of each array dimension. Notice how the outer dimension is now a list of (int**)’s — a list of lists of lists — and the second dimension is now the one that is (int*) — a list of lists — while the final dimension is still a list of integers.

int    i, j;
int*** array3d;
 
array3d = new (int**)[x];
for(i = 0; i < x; i++)
{
  array3d[i] = new(int*)[y];
  for(j < 0; j < y; j++)
    array3d[i][j] = new int[z];
}

Python persistence

I’ve implemented a rudimentary persistent object store in Python. It is implemented as a Python extension module that implements a set of persistent types: strings, integers, floats, lists, and dictionaries. Each of these are backed by the persistent object store, which is implemented using a memory-mapped file.

In addition, using a specially crafted Python base class for Python objects, Python objects may be stored in the object store (as dictionaries) and instantiated out of the object store.

The result is an persistent object graph (the root of which is a persistent dictionary) whose objects and attributes may be manipulated in-place using native Python syntax. Rudimentary locking is provided so that multiple Python threads / processes may concurrently manipulate the object store.

Details

Some aspects of this system are:

  • It is a Python extension module written in C and C++.
  • It is tested on Linux. It will likely work on *BSD systems, though it is possible that the location of the mapped storage may need to be moved.
  • It is implemented in a hierarchical manner:
    • A page manager handles the allocation of 4kB pages within a memory-mapped file. It is multi-process safe. It is, in a sense, a glorified sbrk() for a memory-mapped file.
    • A heap manager abstracts the page manager’s services to manage the allocation and deallocation of arbitrary-sized storage segments within the memory-mapped file. It is essentially a malloc() and free() for a memory-mapped file. This is also multi-process safe.
    • An object manager manages five new base types (persistent int, float, string, list, and dictionary) backed by persistent storage, using the heap manager’s services. It also provides rudimentary locking facilities for concurrency-safeness.
    • The persist Python extension uses the object manager’s services to implement persistent types that mimic the equivalent Python types. Additionally, it has the ability to reinstantiate a Python object that was stored as a dictionary (using the appropriate Python base class). The object manager’s locking facilities are made available for application usage.
  • Only one file may be mapped at a time (because it is mapped to a fixed logical address).
  • It is available for use under the MIT license. Contact me if you are interested in using it.

Examples

Some examples of its use are a simple counter:

import persist

root = persist.root()
root.lockExcl()
try :    root['x'] += 1                # Increment counter
except : root['x'] = 1                 # First pass; initialize
print "Content-type: text/htmln"
print "<p>You are visitor " + str(root['x']) + " to visit this site!</p>"
root.unlock()

and rudimentary objects:

import persist
from pbase import pbase

class person (pbase) :
  def __init__(self, name = "", age = 0) :
    pbase.__init__(self)
    self.name = name
    self.age = age

  def printAge(self) :
    print "<p>" + self.name + " is " + str(self.age) + " years old</p>"

root = persist.root()
root.lockExcl()

if not root.has_key('Joe') :            # First time through
  root['Joe'] = person('Joe', 27)
if not root.has_key('John') :           # First time through
  root['John'] = person("John", 29)

# On subsequent passes we will retrieve the objects stored on the first pass.

print "Content-type: text/htmln"
root['Joe'].printAge()
root['John'].printAge()

root.unlock()