Using Eight Cores (incorrectly) with Python

One of my web apps, The Wub Machine, is very computationally expensive. Audio decoding, processing, encoding, and streaming, all in Python. Naturally, my first instinct was to turn to the multiprocessing module to spread the CPU-bound work across multiple processes, thus avoiding Python’s global interpreter lock.

Remixing is hard work.

In theory, it’s simple enough, but I did run into a few very nasty problems
when dealing with multiprocessing in Python:

To find and fix these bugs took a lot of time, and a good debugging strategy. The most valuable tool turned out, surprisingly, to be GDB. GDB 7 has support for debugging Python runtimes, complete with pseudo-stack traces. Take a look at the following backtrace of a Python process provided by GDB and formatted for clarity:

    [Thread debugging using libthread_db enabled]
    [New Thread 0xb0c2fb70 (LWP 12895)]
    0x006da405 in __kernel_vsyscall ()

    Thread 1 (Thread 0xaf23ab70 (LWP 12894)):
    #0  0x006da405 in __kernel_vsyscall ()
    #1  0x003a27d5 in sem_wait@@GLIBC_2.1 ()
                  from /lib/i386-linux-gnu/libpthread.so.0
    #2  0x080f2139 in PyThread_acquire_lock (...)
                  at ../Python/thread_pthread.h:309
    #3  0x080f2fd8 in lock_PyThread_acquire_lock (...)
                  at ../Modules/threadmodule.c:52
    #4  0x080da7d5 in call_function
                    (f=Frame 0x937b47c,
                      for file /usr/lib/python2.7/threading.py,
                      line 128,
                      in acquire
                      (self=<_RLock(...) at remote 0x9caabec>,
                        blocking=1,
                        me=-1356616848),
                      throwflag=0) at ../Python/ceval.c:4013
    ...
    (goes down 79 frames)

Obviously, this looks much more complicated than a normal Python stack trace, but it’s a huge step up from zero debugability. If I proceed down a couple more frames, I find:

    #7  0x080dac2a in fast_function
                      (f=Frame 0x9ca278c,
                      for file /usr/lib/python2.7/logging/__init__.py,
                      line 693,
                      in acquire (self=<FileHandler(stream=...

…which is the first piece of familiar code. Line 693 of logging/__init__.py is surrounded by a short function, and has a comment that brings the first bit of understanding:

    def acquire(self):
        """
        Acquire the I/O thread lock.
        """
        if self.lock:
            self.lock.acquire()

Well, there you go. After fixing these race conditions and deadlocks, the Wub Machine’s success rate immediately jumped from horrible to 95% under load.

Look'it dat 95\% success rate.

All it took was GDB and an understanding of fork() to solve these bugs. My only advice: be very, very, very careful when using multiprocessing.

 
170
Kudos
 
170
Kudos

Now read this

Shared State and Customer Confusion

Let’s go back to the good old days of writing web applications in PHP for a paragraph or two. When running PHP under Apache or nginx, every HTTP request resulted in a clean interpreter with completely new state. Developers had to... Continue →