Using Eight Cores (incorrectly) with Python

One of my web apps, The Wub Machine, is very computationally expensive.
Audio decoding, processing, encoding, and streaming, all in Python. Naturally,
my first instinct was to turn to the multiprocessing module to spread the
CPU-bound work across multiple processes, thus avoiding Python’s global
interpreter lock

Remixing is hard work.

In theory, it’s simple enough, but I did run into a few very nasty problems
when dealing with multiprocessing in Python:

To find and fix these bugs took a lot of time, and a good debugging strategy.
The most valuable tool turned out, surprisingly, to be GDB. GDB 7 has
support for debugging Python runtimes
, complete with pseudo-stack traces.
Take a look at the following backtrace of a Python process provided by GDB and
formatted for clarity:

    [Thread debugging using libthread_db enabled]
    [New Thread 0xb0c2fb70 (LWP 12895)]
    0x006da405 in __kernel_vsyscall ()

    Thread 1 (Thread 0xaf23ab70 (LWP 12894)):
    #0  0x006da405 in __kernel_vsyscall ()
    #1  0x003a27d5 in sem_wait@@GLIBC_2.1 ()
                  from /lib/i386-linux-gnu/
    #2  0x080f2139 in PyThread_acquire_lock (...)
                  at ../Python/thread_pthread.h:309
    #3  0x080f2fd8 in lock_PyThread_acquire_lock (...)
                  at ../Modules/threadmodule.c:52
    #4  0x080da7d5 in call_function
                    (f=Frame 0x937b47c,
                      for file /usr/lib/python2.7/,
                      line 128,
                      in acquire
                      (self=<_RLock(...) at remote 0x9caabec>,
                      throwflag=0) at ../Python/ceval.c:4013
    (goes down 79 frames)

Obviously, this looks much more complicated than a normal Python stack trace,
but it’s a huge step up from zero debugability. If I proceed down a couple more
frames, I find:

    #7  0x080dac2a in fast_function
                      (f=Frame 0x9ca278c,
                      for file /usr/lib/python2.7/logging/,
                      line 693,
                      in acquire (self=<FileHandler(stream=...

…which is the first piece of familiar code. Line 693 of logging/ is
surrounded by a short function, and has a comment that brings the first bit of

    def acquire(self):
        Acquire the I/O thread lock.
        if self.lock:

Well, there you go. After fixing these race conditions and deadlocks, the Wub
Machine’s success rate immediately jumped from horrible to 95% under load.

Look'it dat 95\% success rate.

All it took was GDB and an understanding of fork() to solve these bugs. My only
advice: be very, very, very careful when using multiprocessing.


Now read this

The Ubiquitous Capture Device

Every so often, I find myself in a camera store, gawking at beautiful, expensive cameras and lenses. DSLRs have dropped in price, and mirrorless interchangeable lens cameras (also known as micro four thirds) now fill the gap between... Continue →