tag:blog.petersobot.com,2014:/feedPeter Sobot2022-11-28T12:20:47-08:00Peter Sobothttp://blog.petersobot.comhello@petersobot.comSvbtle.comtag:blog.petersobot.com,2014:Post/fixing-the-andromeda2022-11-28T12:20:47-08:002022-11-28T12:20:47-08:00Reviving an Analog Polysynth with an Arduino, Ghidra, and Python<p>About a year ago, smack in the middle of the pandemic, I turned to the internet for some retail therapy. I’m a musician, so my usual retail therapist of choice is Reverb.com; a sort of fancy Craigslist or eBay just for musicians. Every day, I would see new listings pop up for instruments I might want, like a 6-string fan-fret bass, a nice electronic drum kit, or my “holy grail” - a super-rare, 22-year-old analog synthesizer: <a href="https://www.soundonsound.com/reviews/alesis-a6-andromeda">the Alesis Andromeda</a>.</p>
<p><a href="https://svbtleusercontent.com/3cu76eBhDJjHsZMcu2KB310xspap.jpg"><img src="https://svbtleusercontent.com/3cu76eBhDJjHsZMcu2KB310xspap_small.jpg" alt="2001-04-alesis-1-lYbHVu8tbC0bXK62woy5ayoQxdCJGIoS.jpg"></a></p>
<p>The Andromeda is a 16-voice polyphonic analogue synthesizer; basically a keyboard that sounds very lush, human, and organic, and can play a lot of notes at once. (That combination is expensive, for reasons.) Every function is controllable by a separate knob on the front panel, making it extremely interactive; every knob makes the sound change in some way.</p>
<p>As a kid, I remember playing one of these at <a href="https://www.long-mcquade.com/location/Ontario/Burlington/">my local music store</a> in the early 2000s. It was the biggest, most expensive, and most intimidating thing in the shop. Here’s an extremely kitschy ‘90s demo video:</p>
<iframe width="560" height="315" src="https://www.youtube.com/embed/hXVB3g4Y1oA?start=214" title="YouTube video player"></iframe>
<p><a href="https://www.sweetwater.com/store/detail/MoogOne16--moog-one-16-voice-analog-synthesizer">Synthesizers with similar capabilities</a> cost over $9,000 today. The Andromeda was discontinued in 2010, and since then, prices for working units have shot up to around $6,000. There’s no way I could justify spending that kind of money.</p>
<p>Until one day, in October 2021, <a href="https://reverb.com/item/46101686-alesis-a6-andromeda-61-key-polyphonic-analog-synthesizer">a listing popped up on Reverb</a>. The seller was explicit: this unit was used, broken, and non-functioning.</p>
<p><a href="https://svbtleusercontent.com/gW9rJ9uXefyXmeKdWDuv3L0xspap.png"><img src="https://svbtleusercontent.com/gW9rJ9uXefyXmeKdWDuv3L0xspap_small.png" alt="F72BE355-D140-4940-9BD3-6F6E4B2A8D35.png"></a></p>
<blockquote>
<p><em>I’m selling this synth for parts. It turns on but hangs on the splash screen. It’s missing side trim pieces, pitch/mod assembly, several knobs and several screws. The casing has nicks and scratches. The metal sides are a bit bent. The cable that connects the analog board to the main board has a cracked tensioner so maybe that’s part of the issue? I don’t have the tools or knowledge to fix this one so I’m passing it on. I can’t get it to do any tests so I can’t tell if anything is working. No returns on this one.</em></p>
</blockquote>
<p>I was tempted. I looked up the service manual online, and found that there were many debugging steps one could take to try to fix problems like this. I’ve also had plenty of experience dealing with hardware. My computer hardware classes at university even dealt with the same CPU used by this synth - the Motorola Coldfire (which uses a variant of the M68k architecture) - and I had a small cache of tools that might be useful. Feeling bold, and desperately bored after 18 months of working from home, I sent an offer.</p>
<p><a href="https://svbtleusercontent.com/7HaWA39cfKgbHKrNR4QYhz0xspap.png"><img src="https://svbtleusercontent.com/7HaWA39cfKgbHKrNR4QYhz0xspap_small.png" alt="C2FEC189-ED9F-45E9-830C-70C32E5B357E.png"></a></p>
<p>After two weeks of eager waiting, the synth arrived to my apartment in New York from Portland in a massive box. As described, it was in bad shape. The service manual provided a list of debugging functions that could be accessed by holding down one of eight buttons on the front panel during boot:</p>
<p><a href="https://svbtleusercontent.com/pcxdEHxUk4esPFXKvwtJzv0xspap.png"><img src="https://svbtleusercontent.com/pcxdEHxUk4esPFXKvwtJzv0xspap_small.png" alt="9FEE8994-6CA1-43EC-8F18-269EC73C9326.png"></a></p>
<p>I had hoped that the previous seller just hadn’t discovered this information, but it turned out that these functions did not work at all. Time to dig deeper. No combination of buttons would do anything, nor would any other tips or tricks from the official manual.</p>
<h1 id="rtfm-read-the-fancy-manual_1">RTFM (Read The Fancy Manual) <a class="head_anchor" href="#rtfm-read-the-fancy-manual_1">#</a>
</h1>
<p>Luckily, back in February of 2015, users on the popular forum GearSpace started <a href="https://gearspace.com/board/electronic-music-instruments-and-electronic-music-production/985035-troubleshooting-alesis-a6-andromeda-no-boot.html">a 15-page thread about how to debug a non-booting Andromeda</a>. This thread included links to <a href="https://archive.org/details/sm_Alesis_Andromeda_A6_Official_Service_Manual_BOM_PCB_files_PCB_Schematics">a confidential <em>service</em> manual</a> that contained more debugging tips, intended for distribution only to Alesis-approved service centers. This service manual also included full schematics for the entire synth, showing how all of the components were logically connected together. </p>
<p><a href="https://svbtleusercontent.com/hxcMyTHnzngrCAM4zHiwFC0xspap.png"><img src="https://svbtleusercontent.com/hxcMyTHnzngrCAM4zHiwFC0xspap_small.png" alt="B8A5E198-D97D-4F4F-897C-968691F0451D.png"></a></p>
<p>This service manual revealed a couple of important things: while this is an analogue synthesizer, meaning that sound is generated via non-digital, analogue circuitry, its brain is entirely digital. It uses a Coldfire CPU (an MCF5307 running at 90MHz), has 2MB of Flash memory to store its upgradeable operating system, 1MB of RAM for use at runtime, and 512kB of battery-backed RAM for persistent storage of user settings.</p>
<p>The great people in this thread also suggested a number of fixes to try:</p>
<ul>
<li><p><strong>Replacing the resonator on the LCD panel</strong>: the Andromeda will often fail to boot if the front panel controller can’t communicate with the LCD. The LCD panel uses a 3MHz ceramic resonator that can sometimes fail, and when the LCD’s clock is unstable, serial communication with it can fail. Luckily, a new crystal oscillator costs about $1, and that part is very easy to replace. (Unluckily, that didn’t seem to help.)</p></li>
<li>
<p><strong>Adding a pull-up resistor to the SRAM chips</strong>: the Andromeda uses two external static RAM chips, providing a total of one megabyte of RAM. These external RAM chips are connected to the data and address busses of the CPU. During boot time, each SRAM chip may accidentally be enabled by default. If this happens, other devices on the bus (like the Flash memory chip that stores the operating system) will have their data overwritten by the data coming from each SRAM chip until they are disabled. (Think: too many people talking at once.)</p>
<p>This problem is called <strong>bus contention</strong>. The solution to this problem is to “tie” the Chip Enable pin of each chip to its “off” value (+3.3 volts) by using a resistor. This resistor is called a <strong>pull-up</strong>, as it <em>pulls up</em> the voltage on the pin when no other devices are controlling the line. By using a resistor, other devices are still able to pull the pin high or low; the resistor essentially sets a default value.</p>
<p>This resistor is a cheap and plentiful part - a 4.7kΩ resistor costs only pennies. Adding the resistor requires very careful soldering, though, as the pins on the SRAM chip are extremely close together. More on that later.</p>
</li>
<li><p><strong>Replacing the Flash memory chip that stores the operating system</strong>. This logically made sense; Flash memory is reprogrammable, and it’s possible that the Flash may have been corrupted somehow, preventing the system from booting. Unfortunately, the Flash memory on the board is an extremely small part and is difficult to replace without advanced soldering skills or the proper equipment.</p></li>
<li><p><strong>Replacing the entire CPU</strong>. I was incredulous about this; I’ve never heard of entire CPUs failing, but many people suggested that a whole-CPU replacement was necessary to get their synthesizers working again. This was a last-resort issue, as the CPU had 208 extremely fine pins that would be difficult to solder.</p></li>
</ul>
<h1 id="breaking-out-the-soldering-iron_1">Breaking out the Soldering Iron <a class="head_anchor" href="#breaking-out-the-soldering-iron_1">#</a>
</h1>
<p>At this point, I thought the next step was to start making changes to the hardware to try to fix one or more broken parts. I’ve been using a soldering iron on and off since I was about 10 years old, so I thought I had the dexterity, patience, and steady hand required to solder a single resistor across the two pins.</p>
<p>I did <strong><em>not</em></strong>.</p>
<p><a href="https://svbtleusercontent.com/fd47dMnVANFNUDJ4pnun8g0xspap.jpg"><img src="https://svbtleusercontent.com/fd47dMnVANFNUDJ4pnun8g0xspap_small.jpg" alt="damage_closeup.jpg"></a></p>
<p>In my effort to add a resistor between chip U12 and capacitor C50, I managed to short out multiple pins of U12. Then, when trying to fix my mistake by replacing the chip, I <strong>accidentally tore off at least 12 of the 44 solder pads</strong> that connect the chip to the circuit board.</p>
<p>If the synth wasn’t working before, it definitely wasn’t working now. I had to concede defeat and call in someone to help.</p>
<p>I started emailing around to local, NYC-area electronics repair shops - including the famed <a href="https://www.rossmanngroup.com">Rossmann Repair Group</a> only blocks away, but none of them said they were able to fix a problem like this. After some more searching, I found <a href="https://videogamerepairs.ca/2019/03/25/not-just-video-game-consoles-alesis-andromeda-a6-vintage-synthesizer-repair/">a blog post from Edmonton-based VideogameRepairs.ca</a> in which they had replaced the CPU and Flash chips of an Andromeda in the past, and sent them photos, asking if they would be able to fix my self-inflicted soldering damage. To my surprise, they said they’d be able to repair the board and replace its CPU, although had no means to test it.</p>
<p>Two months later, after buying a replacement CPU on eBay and shipping my main board from New York to Edmonton and back, I finally had a repaired board with a new SRAM chip and CPU. I had opted <strong>not</strong> to replace the Flash memory, as I didn’t know if it was good or bad, or how to go about reprogramming it. As part of the repair, <strong>a pull-up resistor was also added to the SRAM chips’ chip-select pin</strong>, just like the folks on Gearspace.com had suggested. (The repair job was amazing; huge thank you to <a href="https://videogamerepairs.ca/about/">Daniel Wynne at VideogameRepairs in Edmonton</a> for such intricate rework - and for only $300 USD, too!)<br>
<a href="https://svbtleusercontent.com/vavw1YycS9qvkehcp6Eoy30xspap.jpg"><img src="https://svbtleusercontent.com/vavw1YycS9qvkehcp6Eoy30xspap_small.jpg" alt="DSC03248.jpg"></a></p>
<p>…but the machine still <strong>refused to boot</strong>. Time to step things up a bit.</p>
<h1 id="breaking-out-the-debugger_1">Breaking out the Debugger <a class="head_anchor" href="#breaking-out-the-debugger_1">#</a>
</h1>
<p>As I was waiting for my repaired circuit board to arrive in the mail, I pored through the service manual carefully. Surely there must have been some way to get more insight into what was going wrong during the boot process. The design engineers at Alesis included many <a href="https://en.wikipedia.org/wiki/Test_point">test points</a> in the synth where it was possible to hook up an <a href="https://en.wikipedia.org/wiki/Oscilloscope">oscilloscope</a> or logic analyzer to ensure the system was behaving as expected.</p>
<p>Along with those test points, I noticed that the Coldfire CPU exposed a number of pins to a 26-pin header, conveniently labelled <code class="prettyprint">DEBUG PORT</code>.</p>
<p><a href="https://svbtleusercontent.com/ekiWSwhzZ7hBy5kZxUEoE50xspap.png"><img src="https://svbtleusercontent.com/ekiWSwhzZ7hBy5kZxUEoE50xspap_small.png" alt="AndromedaDebugPort.png"></a></p>
<p>Some searching for some of the keywords on the circuit diagram - including <code class="prettyprint">DDATA</code> and <code class="prettyprint">PST0</code> - led me to discover that this was a proprietary (but well documented) debugging interface specific to Coldfire processors. This is a form of debug interface known as <a href="https://en.wikipedia.org/wiki/Background_debug_mode_interface">Background Debug Mode, or BDM</a>; which provides much of the functionality required by today’s software debuggers, like <a href="https://en.wikipedia.org/wiki/GNU_Debugger">GDB</a> or <a href="https://lldb.llvm.org">LLDB</a>.</p>
<p>I spent a couple of days searching more to find any existing software and hardware that could connect to this debug port. Unfortunately, each solution was a pain, for different reasons:</p>
<ul>
<li>BDM interfaces exist on eBay for less than $20, but they target a slightly different BDM protocol than the one used by this CPU.</li>
<li>Open-source projects like <a href="https://usbdm.sourceforge.io/USBDM_V4.12/html/index.html">USBDM</a> exist, but require custom hardware interfaces that don’t seem to be for sale anywhere, only seem to work on Windows and Linux, and require proprietary IDEs like CodeWarrior.</li>
<li>
<a href="https://www.pemicro.com/index.cfm">PEMicro</a> sells debug probes that are pin-compatible with exactly this Coldfire debug interface, but the cheapest hardware options cost $300. (This would have probably worked, to be honest.)</li>
</ul>
<h2 id="building-a-bdm-interface_2">Building a BDM Interface <a class="head_anchor" href="#building-a-bdm-interface_2">#</a>
</h2>
<p>Given that it was fairly difficult or expensive to use existing tools, I looked to see if maybe I could build my own using common parts, like an Arduino. <a href="https://www.nxp.com/docs/en/data-sheet/MCF5307BUM.pdf">The CPU’s 484-page user manual</a> goes into tons of detail about how its debug port works: it’s really just a serial interface where the debugger sends <strong>one bit of information at a time</strong> over a single wire, while toggling a clock line to indicate when information is ready to be read. The CPU can then send data back to the debugger, also one bit at a time, by putting either 0V or 3.3V on its output line when the debugger toggles the clock line.</p>
<p><a href="https://svbtleusercontent.com/pGMaAGWYnk687cA8FiFT520xspap.png"><img src="https://svbtleusercontent.com/pGMaAGWYnk687cA8FiFT520xspap_small.png" alt="timing-diagram.png"></a></p>
<p>One very nice part of this serial interface is that it’s completely asynchronous: there are no timing requirements on either the debugger or the CPU. If the Arduino is busy doing something else, or <em>is just slow</em> (as Arduinos are) then the CPU doesn’t care - it just waits for the next bit to come in, one at a time.</p>
<p>On top of this serial interface, the Coldfire encodes its debug data into <strong>17-bit packets</strong> - one bit (called the “status”) to indicate if an error has occurred, and 16 bits to indicate the data in that packet:<br>
<a href="https://svbtleusercontent.com/3d3Z68aYsLpDJANsE8Jjqr0xspap.png"><img src="https://svbtleusercontent.com/3d3Z68aYsLpDJANsE8Jjqr0xspap_small.png" alt="bdm-packet-fields.png"></a></p>
<p>Then, on top of this packet format, different commands can be sent to the CPU to ask it to do things - like read or write memory addresses, read or write processor registers, continue processor execution, or put the processor into step mode.<br>
<a href="https://svbtleusercontent.com/w84dXs5aU2V6XuqPe1rTpy0xspap.png"><img src="https://svbtleusercontent.com/w84dXs5aU2V6XuqPe1rTpy0xspap_small.png" alt="bdm-command-fields.png"></a></p>
<p>With these operations, there’s enough there to build a rudimentary debugger: we can halt the processor, move the program counter where we want, read registers and memory, and watch as the operating system tries to boot.</p>
<p>So, I wrote (and published!) <a href="https://github.com/psobot/arduino-coldfire-bdm">a simple Python library called <code class="prettyprint">arduino-coldfire-bdm</code></a> that encodes data and provides an interface to the Coldfire’s BDM port. A tiny <a href="https://en.wikipedia.org/wiki/Arduino">Arduino</a> program allows using pretty much any Arduino as a serial bridge between my laptop and the Andromeda’s CPU, so that we can send commands directly from Python to the board.</p>
<p>With that, we’re able to capture an execution trace to see what the processor is doing when it tries to boot, and it’s kinda neat: we can watch the program counter tick up!</p>
<p><strong>And then the whole thing stops:</strong><br>
<a href="https://svbtleusercontent.com/qNxhAkmUWV4coi6MUrfhfT0xspap.png"><img src="https://svbtleusercontent.com/qNxhAkmUWV4coi6MUrfhfT0xspap_small.png" alt="captured-execution-trace.png"></a></p>
<h1 id="turning-to-ghidra_1">Turning to Ghidra <a class="head_anchor" href="#turning-to-ghidra_1">#</a>
</h1>
<p>Alright, now we’ve got an execution trace. We can watch the processor try to boot. And we can see that the processor gets a certain amount of the way through the process, and then halts.</p>
<p>To actually make sense of this without having to read assembly directly, I turned once again to <a href="https://ghidra-sre.org">Ghidra</a>, the NSA’s open-source reverse engineering tool, which includes good support for the Coldfire architecture and allows us to decompile assembly code into C.</p>
<p>Unlike <a href="http://blog.petersobot.com/patching-the-k2500">the last time that I wrote an extended blog post about using Ghidra</a>, this experience was much simpler: the bootloader of the Andromeda is quite readable. I’ve annotated the boot code below, which also (pretty much) corresponds with the execution trace above:<br>
<a href="https://svbtleusercontent.com/3ACD4gSzqi3mKZXshebhWY0xspap.png"><img src="https://svbtleusercontent.com/3ACD4gSzqi3mKZXshebhWY0xspap_small.png" alt="decompilation.png"></a></p>
<p>Based on the execution trace, it seems the initial code runs for a bit - and then enters the loop at the bottom of this initial function, which copies the bootloader into RAM. Then, immediately after jumping to the code that was just copied into RAM, the processor halts immediately.</p>
<p>Well, that sounds suspicious. The code in RAM should be executable, but it seems that it’s either incorrect or didn’t get copied correctly. Let’s see if we can re-flash the bootloader firmware, to ensure that the code is correct.</p>
<h1 id="flashing-the-flash-in-a-flash_1">Flashing the Flash in a Flash <a class="head_anchor" href="#flashing-the-flash-in-a-flash_1">#</a>
</h1>
<p>Flash memory seems like it’s all around us today; it’s what you find in SD cards, in your SSDs, and so on. Flash memory was novel when it first came out in the 1980s and 90s, as it was able to hold its contents <strong>without power</strong>, whereas other kinds of memory (like static RAM, or SRAM) required power to avoid having its contents fade away.</p>
<p>However, flash memory (or at least NAND flash, cheapest version) has a couple unexpected quirks that make it more complicated to use than regular RAM. A static RAM chip allows for reading and writing to any address just with a single read or write - functionally, by setting the address lines of the chip to the desired address, asserting one of the “write enable” or “output enable” signals, then either asserting the data on the data pins, or reading the data off of the data pins. Flash memory can be <em>read</em> the same way, but <strong>can only be written</strong> after sending special commands to the chip first.</p>
<p>Worse yet, <strong>bits in flash memory can only be switched from <code class="prettyprint">1</code> to <code class="prettyprint">0</code></strong>. To change a <code class="prettyprint">0</code> to a <code class="prettyprint">1</code>, an entire block of memory (usually many kilobytes in size) must be <a href="https://en.wikipedia.org/wiki/Flash_memory#Invention_and_commercialization">“flashed”</a> at once, setting all of that block’s bits to <code class="prettyprint">1</code>. After that erase operation is complete, individual bytes or words of memory can then be written one-at-a-time; but only by flipping <code class="prettyprint">1</code> bits to <code class="prettyprint">0</code>.</p>
<p>All of this complexity means that if we want to reprogram the flash memory in the Andromeda, we’ll need to send a special sequence of commands to the CPU, rather than just asking it to write to memory. These commands are listed in the datasheet for each specific flash memory chip (although many chips share the same command sequences). The chip on the Andromeda main board responds to the commands in <a href="https://www.digchip.com/datasheets/parts/datasheet/013/AM29LV160D-pdf.php">the following table from its datasheet</a>:</p>
<p><a href="https://svbtleusercontent.com/s6RgM5muXcT2Lg8Rgz7Bs50xspap.png"><img src="https://svbtleusercontent.com/s6RgM5muXcT2Lg8Rgz7Bs50xspap_small.png" alt="am29lv160d.png"></a></p>
<p>What this somewhat hard-to-read table suggests is that to “program” (write) data to the flash memory, we need to send four individual writes to the memory chip: <code class="prettyprint">0x555 = 0xAA</code>, <code class="prettyprint">0x2AA = 0x55</code>, <code class="prettyprint">0x555 = 0xA0</code>, followed by a write directly to the address we want to place the data at. (This is presumably to prevent errant writes, as writing to flash memory would almost certainly result in corrupted data due to its inability to flip individual bits from <code class="prettyprint">0</code> to <code class="prettyprint">1</code>.)</p>
<p>This is super slow, though. Sending four writes per word means that our writes actually go four times slower than they could, which - given how slow our custom BDM interface is - would result in us writing to the flash chip at a rate of only about 400 bytes per second. (It would take just over an hour to write just the bootloader at that rate.)</p>
<p>Luckily, this flash chip supports a feature its manufacturer calls “Unlock Bypass.” By sending a specific command to enter “Unlock Bypass” mode, writes can be performed by sending only two individual write commands instead of four. This doubles our writing speed, and allows us to upload the entire bootloader in only about half an hour.</p>
<p>To do so, though, we have to send commands in a very specific sequence:</p>
<pre><code class="prettyprint lang-python"># Send full-chip erase
write(0x555, 0xAA)
write(0x2AA, 0x55)
write(0x555, 0x80)
write(0x555, 0xAA)
write(0x2AA, 0x55)
write(0x555, 0x10)
# Wait the 30 seconds it takes the chip to actually erase itself:
time.sleep(30)
# Unlock the flash for writing:
write(0x555, 0xAA)
write(0x2AA, 0x55)
write(0x555, 0x20)
# Send one word at a time:
for i in range(0, len(data), 2):
# Send the "write a word" command
write(0x555, 0xA0) # note: address here could be anything
# Send the actual data
write(i, (data[i] << 8) | data[i + 1])
# Exit "Unlock Bypass" mode
write(0x90, 0x90)
write(0x00, 0x00)
</code></pre>
<p>After running this command with <a href="https://electro-music.com/forum/topic-59970.html">a copy of the latest bootloader found on the internet</a>, I was able to verify that the code had been uploaded correctly and the contents of the flash should allow it to boot. Then when trying to boot…</p>
<p><em>Still nothing.</em> What happens if we try to run a quick RAM test, to ensure that the code being written to RAM is correct? Let’s use <a href="http://github.com/psobot/arduino-coldfire-bdm">the Python debugging library I wrote</a> to write data to RAM, then read it back over and over again:</p>
<p><a href="https://svbtleusercontent.com/az26uoWRfiUVWuHy7QtkJu0xspap.png"><img src="https://svbtleusercontent.com/az26uoWRfiUVWuHy7QtkJu0xspap_small.png" alt="read_failed.png"></a></p>
<p>Huh. It seems that if we write a value to the RAM, that value isn’t “sticky” - the RAM, <em>random access memory</em>, isn’t <em>remembering</em> what we’ve written. Something’s off. What if we print out the bits themselves, and show how they change over time?</p>
<p><a href="https://svbtleusercontent.com/7Z8hDsPRRKsgX26yTFXWkH0xspap.png"><img src="https://svbtleusercontent.com/7Z8hDsPRRKsgX26yTFXWkH0xspap_small.png" alt="Fh9PXM7XgAEW0NK.png"></a></p>
<p>That looks an awful lot like something is wrong with the RAM! The bits are fading away to 0 quickly; which implies that either the bits weren’t written correctly, <em>or</em> they were written but they’re being read incorrectly, <em>or</em> maybe the chip is slowly losing power.</p>
<h1 id="reading-more-closely_1">Reading More Closely <a class="head_anchor" href="#reading-more-closely_1">#</a>
</h1>
<p>As it turns out, one thing that could cause SRAM chips to behave like this is that one of three pins could be at the wrong voltage:</p>
<ul>
<li>the power pin, which is used to provide +3.3 volts</li>
<li>the ground pin, which is used to provide, well, ground</li>
<li>the “chip select” pin; which should be controlled by the CPU, but may have a pull-up resistor on it (as mentioned way at the top of this blog post).</li>
</ul>
<p>With my oscilloscope, I was able to measure and find that:</p>
<ul>
<li>the chip was getting +3.3v on the power pin</li>
<li>the chip’s ground pin was, indeed, ground</li>
<li>the “chip select” pin was high at +3.3v <strong>all the time</strong>.</li>
</ul>
<p>That last one was a bit suspicious; we’d expect that when memory accesses were happening, the chip select pin should go low, to indicate that the chip <u>should</u> be selected, even if only for nanoseconds at a time.</p>
<p>I took a peek at the resistor that had been installed, and took a look at its colour bands, which indicate the value of the resistor. I plugged them into an online calculator, and found:<br>
<a href="https://svbtleusercontent.com/x3dtaYXygZdyd44Yc5L6pN0xspap.png"><img src="https://svbtleusercontent.com/x3dtaYXygZdyd44Yc5L6pN0xspap_small.png" alt="four-point-seven-resistor.png"></a></p>
<p><strong>4.7Ω</strong>.</p>
<p>After all of this debugging, it turns out that the SRAM chip was properly connected and working; it was just <strong>never being enabled</strong>, because its chip-select pin was being <strong>held at +3.3V all the time</strong>. This resistor should have been something like 4.7kΩ, which would provide more resistance - enough resistance to allow the CPU to overcome this resistor when enabling the chip. I must have missed a single “k” when indicating the resistor value.</p>
<p>I pulled out a pair of snips, clipped the resistor off the chip, and:</p>
<p><a href="https://svbtleusercontent.com/u4Cn6cx7SFH7ZGdc74yzEc0xspap.jpeg"><img src="https://svbtleusercontent.com/u4Cn6cx7SFH7ZGdc74yzEc0xspap_small.jpeg" alt="IMG_1670.HEIC.jpeg"></a></p>
<p><strong>It lives!</strong></p>
<iframe width="560" height="315" src="https://www.youtube.com/embed/FKvHwxbLBCc" title="YouTube video player"></iframe>
<h1 id="a-bunch-of-knobs_1">A Bunch of Knobs <a class="head_anchor" href="#a-bunch-of-knobs_1">#</a>
</h1>
<p>Now that the synth worked, there were a couple more problems to fix. Turning the knobs on the left side of the synth’s panel all caused the synth to “glitch out” - values on the screen would jump around wildly. This wasn’t a complete dealbreaker, but definitely made the synth hard to use. To figure out why this was happening, I had to go back to the schematics once more. The knobs that were glitchy seemed to share one thing in common: they were all connected to one chip: a CD4051 analog multiplexer, labeled <code class="prettyprint">U27</code>.</p>
<p><a href="https://svbtleusercontent.com/jTpFNkjnBHamC1416q2TnX0xspap.png"><img src="https://svbtleusercontent.com/jTpFNkjnBHamC1416q2TnX0xspap_small.png" alt="analog-comparator.png"></a></p>
<p>The Andromeda might be controlled by a digital CPU, but a lot of it is surprisingly analog. In fact, the analog signal from each knob on the front panel is sent all the way through to the main circuit board via a neat method called an <strong>analog multiplexer</strong>. Each potentiometer (knob) is connected to one multiplexer chip, which is essentially a controllable digital switch. The main CPU drives seven signals: <code class="prettyprint">POT_MUX_SEL[0-3]</code>, along with <code class="prettyprint">POT_ADDR[0-2]</code>. Only one of the four <code class="prettyprint">POT_MUX_SEL</code> signals are active at once, while the three <code class="prettyprint">POT_ADDR</code> lines encode 3 bits of data, and thus have 8 possible values; putting these together allows the CPU to select between <strong>32 different potentiometers</strong> whose values can be sampled. The analog multiplexers that sit on these lines are, well, <em>analog</em>, which means that even though they’re controlled by a digital address bus, their output is completely analog, allowing for extremely high fidelity.</p>
<blockquote>
<p>A quick aside; what’s the difference between an analog and a digital synthesizer? Alesis claims in the Andromeda manual:</p>
<blockquote>
<p>An analog instrument uses electronic circuitry for sound creation and filtering that is not dependent on its computer chip. While the instrument’s processor provides many control and memory functions, <strong>the basic sound path is in the hardware that is separate from the microprocessor</strong>.</p>
</blockquote>
<p>This is true; but the line is somewhat blurred in the Andromeda, as the analog circuitry is controlled by a digital processor, whose inputs and outputs are 16-bit numbers.</p>
<p>In the 1990s when this synthesizer was developed, most synthesizers used 8-bit resolution for their parameters; each knob only had 2<sup>8</sup>= 128 “steps,” which caused noticeable stair-stepping when turning knobs. (To think of this geometrically: if you turn a knob by less than about 3º, its value wouldn’t change due to this low resolution.) This led people to associate “digital” synthesis with “audible stair-steps when turning a knob.”</p>
<p>However, the designers of the Andromeda took a lot of care to keep all of the signals as analog as possible for as long as possible. As such, these front panel knobs send their analog values to the main board, where they’re then turned into digital values at the fairly high resolution of 16 bits, providing 65,536 possible steps. To put this in geometric terms again: if 8-bit synthesizers provide one step per 3º of rotation, the Andromeda provides one step per <em>0.0055º</em> of rotation. That’s enough resolution to only be noticeable if you were to attach a 100-meter stick to each knob; at that scale, the far end of the stick would still move only about one centimeter per step. And with the number of parameters available on the synth, this level of detail means that there are approximately <strong>2.4x10<sup>462</sup> different unique combinations</strong> of sounds that could be made - <strong>1.5x10<sup>231</sup> <em>times</em> more</strong> than if the designers had used 8-bit parameter resolution.</p>
</blockquote>
<p>So. If the Andromeda uses analog multiplexers to send its front panel values on to the main CPU, what could be causing those values to be glitchy? Well, let’s take a look at the analog value using an oscilloscope.</p>
<p><a href="https://svbtleusercontent.com/2r6pxsYcyuxSTbqjC33Eri0xspap.png"><img src="https://svbtleusercontent.com/2r6pxsYcyuxSTbqjC33Eri0xspap_small.png" alt="skyline-broken-old-mult.png"></a></p>
<p>Wow! This is kind of neat - we can see the multiplexing happening visually. Each of the “towers” in this visual “skyline” are the values of different knobs; with knobs turned all the way up being higher on the graph. If this was working correctly, we’d expect to see solid, flat values all along the graph. Seeing spikes, slopes up or down, or noisy values all indicate that something is wrong here; and sampling any of those values will probably result in the CPU thinking that some knobs are moving, even when they’re not.</p>
<p>Of particular interest here are the solid yellow sections of the graph, which indicate that values are moving up and down so quickly that they look like noise. Let’s zoom in on one of those:</p>
<p><a href="https://svbtleusercontent.com/gpTwiVpmiu2LuZ5RBv4cAz0xspap.png"><img src="https://svbtleusercontent.com/gpTwiVpmiu2LuZ5RBv4cAz0xspap_small.png" alt="skyline-broken-zoom.png"></a></p>
<p>Oof, that’s pretty bad. The value of this signal seems to be oscillating, which will make the CPU think that we’re turning this knob back and forth all the time. It’s hard to tell why this might be happening: this could be a broken multiplexer chip, or it could be one or more other broken chips causing bad signals to go into a multiplexer chip.</p>
<p>To debug this, I went ahead an ordered <a href="https://www.digikey.com/en/products/detail/texas-instruments/CD4051BM96/528372">a brand new multiplexer chip for the low price of $0.66 (plus shipping)</a>. Unlike my last soldering job, this chip was big enough that I could replace it myself. (One gotcha: the chips on this board weren’t just soldered in place, but were also <strong>glued</strong> in place from underneath, causing me to rip up a couple traces despite my best attempts to be careful.)</p>
<p><a href="https://svbtleusercontent.com/hTAzjryPigBZEcwZUighXk0xspap.jpg"><img src="https://svbtleusercontent.com/hTAzjryPigBZEcwZUighXk0xspap_small.jpg" alt="new-multiplexer.jpg"></a></p>
<p>However, after installing this new chip, the problem wasn’t quite resolved: the new signal was even dirtier than before! The oscillation hadn’t stopped, the signal had more overall noise, and some values are now sloping down instead of remaining constant:</p>
<p><a href="https://svbtleusercontent.com/4JoKwZGZPND32T2nHcqtP40xspap.png"><img src="https://svbtleusercontent.com/4JoKwZGZPND32T2nHcqtP40xspap_small.png" alt="skyline-broken.png"></a></p>
<p>Let’s go back to the drawing board a bit. The schematic shows that this multiplexer is connected to four knobs on the front panel, as well as two other signals that I hadn’t tested, labeled <code class="prettyprint">PITCH</code> and <code class="prettyprint">MOD</code>.<br>
<a href="https://svbtleusercontent.com/sSVKn9iM8CedWLAGMh8fmf0xspap.png"><img src="https://svbtleusercontent.com/sSVKn9iM8CedWLAGMh8fmf0xspap_small.png" alt="analog-comparator.png"></a></p>
<p>These two signals come from the pitch and modulation <em>wheels</em> at the left side of the keyboard; they’re both potentiometers, but attached to large vertically-mounted wheels that can be used more easily during performance. Let’s trace the schematic a bit more to find out where those signals actually come from, and how they’re generated.</p>
<p><a href="https://svbtleusercontent.com/4t973gi1AGMU32vpuJ5xxD0xspap.png"><img src="https://svbtleusercontent.com/4t973gi1AGMU32vpuJ5xxD0xspap_small.png" alt="pitch-mod.png"></a></p>
<p>It looks like the “raw” signal from the pitch and modulation wheels goes through another chip - an operational amplifier, or <em>op-amp</em> - which then amplifies its value to the 5V output by the other potentiometers.</p>
<p>This is where I’d show you a screenshot of my oscilloscope to illustrate how high the voltage was, or what signal was coming off of the op-amps here. However, I don’t have that screenshot, because instead, I touched the op-amp while the synth was powered up, and gave myself a mild burn. <strong>It was red-hot.</strong></p>
<p><a href="https://svbtleusercontent.com/me8XK7MAPA4onn2uqmLtKp0xspap.png"><img src="https://svbtleusercontent.com/me8XK7MAPA4onn2uqmLtKp0xspap_small.png" alt="opamp-power.png"></a></p>
<p>The op-amp chip - <a href="https://www.ti.com/lit/ds/symlink/tl082-n.pdf">a TL082</a> - was supplied by two voltage rails: one at -15V, and one at +15V, making the maximum voltage across the chip a huge <strong>30 volts</strong>. (Huge is relative here; but for a synth with many components that operate at 3.3V, this is a problem.) An op-amp has no business getting this in a correctly-functioning circuit. My best guess is that this component failed on its own, or may have failed catastrophically when I accidentally plugged in the cable between the front panel and main board <strong>backwards</strong> at one point.</p>
<p>Either way, this chip had to go. Not only was it causing instability in other chips, it was also sinking a ton of power and could have been a fire hazard or a danger to other parts of the circuit. Rather than trying to desolder it this time, though, I just cut it off with a pair of snips.</p>
<p><a href="https://svbtleusercontent.com/gX6VxgVaCdoFiG9QHyzu750xspap.jpg"><img src="https://svbtleusercontent.com/gX6VxgVaCdoFiG9QHyzu750xspap_small.jpg" alt="opamp-goodbye.jpg"></a></p>
<p>And with that, even without a new op-amp in place, the glitches were gone! Soldering in a new op-amp was pretty simple, but the result was great: all knobs worked again, including the pitch wheel and ribbon controller. The moral of the story: <em>don’t plug cables in backwards when working on delicate analog electronics</em>.</p>
<h1 id="the-conclusion_1">The Conclusion <a class="head_anchor" href="#the-conclusion_1">#</a>
</h1>
<p>Well, thirteen months and hundreds of dollars later, my impulse buy is now a fully-working, beautiful-sounding, ultra-rare synthesizer. I still have a couple things left to do - replace some missing knobs, get replacement side panels made, fix some dead LEDs, fix the mod wheel, and replace some yellowed and scratched keys - but the hard parts of the project are done.</p>
<p>What was the root cause of the failure? Well, despite the many twists and turns along the way, it certainly looks like the Andromeda’s CPU was just dead. A full CPU replacement was enough to kick it back into life. Second to that, the data in the flash ROM <em>may</em> have been bad, but it’s very hard to tell if that would have been a blocker. The other issues (broken SRAM, blown op-amp, bad multiplexer) were all caused by my own attempts to repair the Andromeda without having its CPU replaced.</p>
<blockquote>
<p><strong>What if I’ve got a broken A6 Andromeda?</strong></p>
<p>Having gone through this ordeal, I would suggest trying the following repair tips in order:</p>
<ul>
<li>If you’ve got an Arduino and are handy with software, open up your Andromeda and use <a href="https://github.com/psobot/arduino-coldfire-bdm">my <code class="prettyprint">arduino-coldfire-bdm</code> Python library</a> to try to connect to your Andromeda’s CPU over its debug port. From there, you’ll be able to see if the CPU is working and will be able to re-flash the firmware without buying any expensive equipment.</li>
<li>If that fails, try the simple fixes listed above: replace the oscillator on the LCD (a $3 part that’s easy to solder) or try turning the power off and on quickly to see if a pull-up resistor across the SRAM chip would make a difference.</li>
<li>If that fails, buy a new MCF5307AI90B CPU online and replace it. It’s very difficult to do that without advanced soldering skills; you should send your Andromeda’s main board to <a href="https://videogamerepairs.ca">Daniel Wynne at VideogameRepairs.ca</a>. My repair bill came out to about $200 USD, but yours would likely be cheaper.</li>
<li>Whatever you do, don’t flip the cable that connects the front panel to the main board. This will blow an op-amp and maybe an analog multiplexer on the front panel - at the very least - and you’ll wind up with some non-functioning and glitchy knobs.</li>
</ul>
<p>If you’ve got an Alesis A6 Andromeda that’s in need of repair, stuck at the splash screen, not booting up, glitching out, or otherwise in a bad state: <a href="mailto:andromeda@petersobot.com">feel free to get in touch</a>, as I’m apparently a qualified Andromeda repair technician now.</p>
</blockquote>
<p><em>Was it worth it?</em> I definitely came out ahead, ignoring my own labour costs. <a href="https://reverb.com/p/alesis-andromeda#overflowing-row_heading-price-guide">As of 2022, Andromedas are selling on the second-hand market for somewhere between $3,000 and $5,000, according to Reverb</a>:<br>
<a href="https://svbtleusercontent.com/ruTKJg7fZyhVtb4do47Kwa0xspap.png"><img src="https://svbtleusercontent.com/ruTKJg7fZyhVtb4do47Kwa0xspap_small.png" alt="andromeda_price_history.png"></a></p>
<p>But does that mean I’ll be selling this synth? We’ll see. It’ll take a while to decide if this one-of-a-kind synth, which I put <em>so much</em> time into restoring, is worth getting rid of. (Maybe I’ll make a VST out of it instead. 👀)</p>
<hr>
<p>Thanks to <a href="https://musicmachinery.com">Paul Lamere</a>, <a href="https://zameermanji.com">Zameer Manji</a>, <a href="https://www.linklayer.com">Eric Evenchick</a>, and <a href="http://melatonin.dev">Sudara</a> for reviewing drafts of this post.</p>
tag:blog.petersobot.com,2014:Post/patching-the-k25002021-04-23T03:47:59-07:002021-04-23T03:47:59-07:00Patching an Embedded OS from 1996 with Ghidra<p>For reasons I won’t get into, I’ve been working on a tricky reverse engineering puzzle recently: how to patch the operating system of a 26-year-old synthesizer. To be specific, the <a href="https://www.vinta%0Agesynth.com/kurzweil/k2500.php">Kurzweil K2500</a>, a sample-based synthesizer released in 1996.</p>
<p><a href="https://svbtleusercontent.com/3zouwM9N4uforzGEfhdaUi0xspap.png"><img src="https://svbtleusercontent.com/3zouwM9N4uforzGEfhdaUi0xspap_small.png" alt="k2500xs_diagonal.png"></a></p>
<p>As with many digital musical instruments, this synthesizer is really just a computer with some extra chips. In this case, it’s a computer based around the CPU that was popular at the time: <a href="https://en.wikipedia.org/wiki/Motorola_68000">the Motorola 68000</a>, which was also famously used in the original Macintosh and the Sega Genesis. I want to patch the operating system of this beast to do all sorts of other things, most of which which I’ll leave to the imagination in this already-very-long post.</p>
<h2 id="finding-the-operating-system_2">Finding the Operating System <a class="head_anchor" href="#finding-the-operating-system_2">#</a>
</h2>
<p>Modifying the operating system sounds great, but how do we get access to the code in the first place? Luckily, the K2500 operating systems <a href="https://kurzweil.com/content/migration/downloads/pub/Kurzweil/Pro_Products/K2000-K2vx-K2500/K2500/Operating_System/">are still provided by the manufacturer on what looks like an old FTP site</a>. Downloading and unzipping the operating system gives us a <code class="prettyprint">.KOS</code> file, which seems to be a custom format. Opening the file in <a href="https://hexfiend.com">Hex Fiend</a> shows its bytes directly:</p>
<p><a href="https://svbtleusercontent.com/xf2sbD8vBrz54qxagvDUnL0xspap.png"><img src="https://svbtleusercontent.com/xf2sbD8vBrz54qxagvDUnL0xspap_small.png" alt='A hex dump of "K25V00.KOS"'></a></p>
<p>Unfortunately, nothing stands out here. There seems to be a human-readable 4-byte header at the top: <code class="prettyprint">SYS0</code>, possibly followed by other header bytes, but it’s really hard to tell. Regardless, we already know that this operating system runs on a Motorola 68000 CPU. Let’s just try interpreting the data as a binary, and see how far we can get.</p>
<h2 id="enter-ghidra_2">Enter Ghidra <a class="head_anchor" href="#enter-ghidra_2">#</a>
</h2>
<p>The operating system file we’re using is <em>probably</em> raw machine code: literally the instructions and data interpreted by the CPU itself. To make any sense of this whatsoever, we’re going to need to <em>disassemble</em> it, to turn it back into assembly code - and hopefully eventually <em>decompile</em> it back into C-style code.</p>
<p>To do this, let’s use a tool called <a href="https://ghidra-sre.org">Ghidra</a>: an open-source reverse-engineering program built, maintained, and released by the United States National Security Agency. (Yes, that one. Really.) To start, let’s import the <code class="prettyprint">.KOS</code> file directly into Ghidra and analyze it with the default settings, which will search for instructions.</p>
<p><a href="https://svbtleusercontent.com/35HWgnWGJWDh3cToX5ntV0xspap.png"><img src="https://svbtleusercontent.com/35HWgnWGJWDh3cToX5ntV0xspap_small.png" alt="Untitled 4.png"></a></p>
<p>Scrolling through the file shows that parts of the data have been analyzed by Ghidra as valid 68k instructions, but much of the file remains unanalyzed. Strangely, scrolling further through the files shows that Ghidra has correctly identified a number of human-readable strings in the file (great!) but the code seems to be referring to the strings offset by some amount, showing up as cut-off strings in Ghidra.</p>
<p><a href="https://svbtleusercontent.com/3z17RxfqhFCbymoYVDxgYK0xspap.png"><img src="https://svbtleusercontent.com/3z17RxfqhFCbymoYVDxgYK0xspap_small.png" alt="Untitled 8.png"></a></p>
<p>This is because we just loaded the entire <code class="prettyprint">.KOS</code> file into Ghidra, ignoring the fact that it has a header and likely some other extra bytes. This is a pretty big problem. Any cross-references between functions will be inaccurate as we continue to reverse-engineer the data, sending us in the wrong direction nearly every time we try to follow a reference. We need to fix this first.</p>
<h2 id="reverse-engineering-the-bootloader_2">Reverse Engineering the Bootloader <a class="head_anchor" href="#reverse-engineering-the-bootloader_2">#</a>
</h2>
<p>To reverse-engineer the <code class="prettyprint">.KOS</code> file, it would be extremely useful to dig into the code that creates or consumes these files. We don’t have the creation code, but we do have access to the code that consumes these files: <a href="https://kurzweil.com/content/migration/downloads/pub/Kurzweil/Pro_Products/K2000-K2vx-K2500/K2500/Operating_System/Boot_Loader/">the bootloader for the synth itself, which is also still available online (edit: it appears they’ve taken this down, as of August 2022)</a>! Let’s load it into Ghidra and make an assumption to make our lives easier: let’s guess that the first 8 bytes of the file are part of a header.</p>
<blockquote>
<p>Where did that number come from? Well, I tried 0, +4, +8, +12, +16, and +20 byte offsets, and +8 disassembled the most correctly. Yes, this took a while. In hindsight, all of this also happens to work because the code in the file gets loaded into address <code class="prettyprint">0x0</code> in memory. If it was loaded somewhere else, we’d have to figure out what that location is before we could effectively disassemble the code.</p>
</blockquote>
<p>Just like before, let’s look for something human-readable first. Searching through the strings brings up a couple error strings that seem like they might get thrown by the code we care about:</p>
<p><a href="https://svbtleusercontent.com/gKVzk72q5AxiF9pCfVcjhy0xspap.png"><img src="https://svbtleusercontent.com/gKVzk72q5AxiF9pCfVcjhy0xspap_small.png" alt="Untitled 12.png"></a></p>
<p>Ghidra has identified what it calls XREFs here - cross-references, indicating that these strings are called from a certain place. Let’s follow this reference:</p>
<p><a href="https://svbtleusercontent.com/gcBK4NjtDVJ8v5qgrjgB5k0xspap.png"><img src="https://svbtleusercontent.com/gcBK4NjtDVJ8v5qgrjgB5k0xspap_small.png" alt="Untitled 13.png"></a></p>
<p>Aha! Now we’re getting somewhere. This looks an awful lot like a switch statement, decompiled by Ghidra here as an <code class="prettyprint">if</code> tree. It seems like there are a series of error codes (<code class="prettyprint">0x100</code> through <code class="prettyprint">0x105</code>, then <code class="prettyprint">0x200</code>, <code class="prettyprint">0x201</code>, etc.) that each correspond with an error string that presumably gets printed on the screen. Let’s keep pulling on this thread. Using Ghidra’s “Find References” function, we end up at this function:</p>
<p><a href="https://svbtleusercontent.com/2DJ9WVpTN5TMHvFCPDQTjw0xspap.png"><img src="https://svbtleusercontent.com/2DJ9WVpTN5TMHvFCPDQTjw0xspap_small.png" alt="Untitled 15.png"></a></p>
<p>We’re getting closer! Ghidra’s done something great for us here: the decompiled code includes some variable names, automatically determined based on the strings that those variables point to. Given that we know that some of these variables are strings, we can take some guesses and use Ghidra’s “Rename” and “Retype” tools to make this function read a lot more clearly:</p>
<p><a href="https://svbtleusercontent.com/gMeWBvZRTUGa2oJ1RxqpPa0xspap.png"><img src="https://svbtleusercontent.com/gMeWBvZRTUGa2oJ1RxqpPa0xspap_small.png" alt="Untitled 16.png"></a></p>
<p>It looks like we have a two-stage process here: first, the new operating system file is checked by calling <code class="prettyprint">ActuallyCheckOrFlashTheOS?(0, ?)</code>. If the check passes, then the same function is called again with <code class="prettyprint">1</code>. It seems like that function probably reads the <code class="prettyprint">.KOS</code> file format we’re investigating: let’s dig in there.</p>
<p><a href="https://svbtleusercontent.com/rPojvMhT4qmmU5cx729fMG0xspap.png"><img src="https://svbtleusercontent.com/rPojvMhT4qmmU5cx729fMG0xspap_small.png" alt="Untitled 17.png"></a></p>
<p>This doesn’t have any of the hints we saw before; there are no human-readable strings we can read, nor are there any function names. Instead, we can look at the <u>structure</u> of this function to understand what it does. Even without variable names, the structure of this code looks pretty similar to opening a file in C! It looks like we have an <code class="prettyprint">fopen</code>-style call, followed by an <code class="prettyprint">fread</code>, followed by an <code class="prettyprint">fread</code> again in a while loop. Let’s add comments to make this clearer.</p>
<p><a href="https://svbtleusercontent.com/2Qr5fommAbc8fv4V3yvrJA0xspap.png"><img src="https://svbtleusercontent.com/2Qr5fommAbc8fv4V3yvrJA0xspap_small.png" alt="Untitled 18.png"></a></p>
<p>With the comments, it seems we now have a couple questions answered:</p>
<ul>
<li>The <code class="prettyprint">.KOS</code> file starts with a 4-byte header: <code class="prettyprint">SYS0</code>
</li>
<li>After the header, the file is divided into fixed-size chunks</li>
<li>Each chunk starts with a single 4-byte integer</li>
<li>An unknown number of bytes of actual data are read</li>
<li>Each chunk ends with a single 4-byte integer which seems to be some sort of checksum</li>
</ul>
<p>However, there’s one new question we need to answer as well: why are certain constants and functions referenced at very high addresses in memory? (i.e.: <code class="prettyprint">0x021317ac</code> seems to contain the number of bytes in each chunk of the <code class="prettyprint">.KOS</code> file, but the data in the ROM doesn’t reach that high!)</p>
<p>To better understand what’s at those high addresses, let’s turn to the service manual for this unit. (Huge shoutout to <a href="https://www.linkedin.com/in/david-ryskalczyk-237747b/">David Ryskalczyk</a> for this idea!) Buried deep in a non-OCR’d PDF lies this useful tidbit of information, in a list of diagnostic procedures:</p>
<p><a href="https://svbtleusercontent.com/5ZAXWk4ttBBkp68tyS6eLw0xspap.png"><img src="https://svbtleusercontent.com/5ZAXWk4ttBBkp68tyS6eLw0xspap_small.png" alt="Untitled 19.png"></a></p>
<p>Thanks, service manual! It looks like <code class="prettyprint">0x021317ac</code> lands directly in the middle of this synth’s “volatile RAM” - the RAM used by the processor while it’s running.</p>
<blockquote>
<p>It’s great that we had the service manual as a reference here. Without this information, we could have made an educated guess based on address prefixes that show up often in the code. If that didn’t help, we could have tried to find an electrical schematic for the unit and traced the address lines coming from the various chips to the CPU. This stuff gets complicated <em>fast</em>.</p>
</blockquote>
<p>Let’s tell Ghidra to treat this as RAM in its “Memory Map” window, and then jump to near the address we’re interested in: <code class="prettyprint">0x021317ac</code>.</p>
<p><a href="https://svbtleusercontent.com/8tccFh78je8ZKcjR5SmwAn0xspap.png"><img src="https://svbtleusercontent.com/8tccFh78je8ZKcjR5SmwAn0xspap_small.png" alt="Untitled 21.png"></a></p>
<p>There’s no data here (as Ghidra knows this is RAM, which is randomly initialized when a computer starts) and it looks like the address in question is being read from (<code class="prettyprint">(R)</code>) , but never written to (<code class="prettyprint">(W)</code>). Maybe the writes are happening further up?</p>
<p><a href="https://svbtleusercontent.com/kJu9hAXZ6qidEGWCifGPf20xspap.png"><img src="https://svbtleusercontent.com/kJu9hAXZ6qidEGWCifGPf20xspap_small.png" alt="Untitled 20.png"></a></p>
<p>Aha! Ghidra shows us that a function is writing directly to the start of RAM. Given that we had to scroll up 6,060 bytes to find the first write, maybe this method copies a bunch of data into RAM. Let’s click through to see what’s there.</p>
<p><a href="https://svbtleusercontent.com/eMEDtuGhFbk2JSJfW9dxBM0xspap.png"><img src="https://svbtleusercontent.com/eMEDtuGhFbk2JSJfW9dxBM0xspap_small.png" alt="Untitled 22.png"></a></p>
<p>Uh, one sec. Let’s rename some stuff again.</p>
<p><a href="https://svbtleusercontent.com/iV4t4noXP4p7a62LMoDya70xspap.png"><img src="https://svbtleusercontent.com/iV4t4noXP4p7a62LMoDya70xspap_small.png" alt="Untitled 23.png"></a></p>
<p>Much better. It looks like we’re coping a bunch of data from ROM into RAM - specifically from <code class="prettyprint">0x0001860a</code> to <code class="prettyprint">0x02130000</code>. How much is a bunch? Well, <code class="prettyprint">0x690</code> 32-bit long words, which works out to 6,720 bytes. (This snippet of code also then zeros-out the next <code class="prettyprint">0x1e1</code> 32-bit long words, or 1,924 bytes.) Now that we know that the code is probably initialized with the same data as that part of the ROM, we can tell Ghidra to map that part of the ROM to this part of the RAM directly.</p>
<p><a href="https://svbtleusercontent.com/m7P2tJGKvPaf2ZKzsDMXas0xspap.png"><img src="https://svbtleusercontent.com/m7P2tJGKvPaf2ZKzsDMXas0xspap_small.png" alt="Untitled 24.png"></a></p>
<p>Now, going back to the part of RAM that we were reading, we can see that there are bytes present instead of question marks. </p>
<p><a href="https://svbtleusercontent.com/rnK33wetySZthG6rGLWwEF0xspap.png"><img src="https://svbtleusercontent.com/rnK33wetySZthG6rGLWwEF0xspap_small.png" alt="Untitled 25.png"></a></p>
<p>It looks like the value stored at <code class="prettyprint">0x021317ac</code> is <code class="prettyprint">0x20000</code>, which works out to <strong>131,072 bytes</strong>! (We call this “128 kilobytes” because <a href="https://en.wikipedia.org/wiki/Kilobyte#Base_2_.281024_bytes.29">numbers are complicated</a>.)</p>
<p>Great! So we’ve now figured out that each chunk of the <code class="prettyprint">.KOS</code> file format is 128kB in size. That’s all we need to know to build a decoder for this format, remove the chunk headers, and end up with a file that will have correct relative offsets. This allows Ghidra to properly disassemble and decompile the file, and allows us to actually poke around at the operating system code. (I’ve gone ahead and done this already, and <a href="https://gist.github.com/psobot/bf50c2090bb0fbe5380aefaafea17eed">that <code class="prettyprint">.KOS</code> file packer/unpacker is available on GitHub</a>.)</p>
<h2 id="exploring-the-operating-system_2">Exploring the Operating System <a class="head_anchor" href="#exploring-the-operating-system_2">#</a>
</h2>
<p>Alright, we’ve now got a “clean” dump of the operating system. Let’s open up that file in Ghidra, just like we tried to before. Let’s use Ghidra’s search function to find some interesting strings.</p>
<p><a href="https://svbtleusercontent.com/mnnhPaG33kotJD1mrmSCgN0xspap.png"><img src="https://svbtleusercontent.com/mnnhPaG33kotJD1mrmSCgN0xspap_small.png" alt="Untitled 26.png"></a></p>
<p>Let’s do a quick test to see if we can modify the operating system successfully. Ghidra lets you change instructions or data in a binary; so here, let’s change one of these strings to contain different text. (We’ll need to keep the length the same, to avoid moving other code around.)</p>
<p>After re-packing the operating system, let’s load it onto a floppy disk, install it on the real hardware, and…</p>
<p><a href="https://svbtleusercontent.com/kSwaFD7KJaMhvRRnqUC2eN0xspap.png"><img src="https://svbtleusercontent.com/kSwaFD7KJaMhvRRnqUC2eN0xspap_small.png" alt="fake_error.png"></a></p>
<p>What gives? Well, remember that “some sort of checksum” field we saw in the <code class="prettyprint">.KOS</code> format earlier? It turns out, that’s actually checked by the hardware when installing a new OS. Luckily, Ghidra can help us here too -let’s go back to the bootloader and click through to <code class="prettyprint">FUN_0x021302b2</code>, which looks like it computes some sort of checksum for us.</p>
<p><a href="https://svbtleusercontent.com/wQKEXn6U92waidpf4BaSvc0xspap.png"><img src="https://svbtleusercontent.com/wQKEXn6U92waidpf4BaSvc0xspap_small.png" alt="Untitled 37.png"></a></p>
<p>And again, after guessing at some variable names:</p>
<p><a href="https://svbtleusercontent.com/5pEtT4NrqcWRazLRTk2vP70xspap.png"><img src="https://svbtleusercontent.com/5pEtT4NrqcWRazLRTk2vP70xspap_small.png" alt="Untitled 38.png"></a></p>
<p>It looks like this checksum function is pretty simple: for each byte <code class="prettyprint">x</code>, the checksum is equal to <code class="prettyprint">x + checksum</code> shifted left by one bit, which is then bitwise OR’d with <code class="prettyprint">x + checksum</code> shifted right by 31 bits. That’s a neat checksum I hadn’t seen before at all, and which advanced checksum reversing tools like the wonderful <a href="https://github.com/8051Enthusiast/delsum">delsum</a> can’t figure out either.</p>
<p>With this checksum, we can now change <a href="https://gist.github.com/psobot/bf50c2090bb0fbe5380aefaafea17eed">our <code class="prettyprint">.KOS</code> file dumping script</a> to properly re-pack new data with correct checksums. And once that’s done, let’s try flashing the OS again:</p>
<p><a href="https://svbtleusercontent.com/o8cVnXRvAzL6jLoqkCNUsR0xspap.png"><img src="https://svbtleusercontent.com/o8cVnXRvAzL6jLoqkCNUsR0xspap_small.png" alt="fake_success.png"></a></p>
<p>We can now flash a new operating system onto this hardware, modifying or extending its capabilities however we’d like. (That part, however, is left as an exercise for the reader.)</p>
<h2 id="what-we-learned_2">What We Learned <a class="head_anchor" href="#what-we-learned_2">#</a>
</h2>
<p>Wow, that was a bit of an ordeal. I’d never used Ghidra before trying this project, and now I feel comfortable enough to use it for future absurdly-obscure retrocomputing reverse engineering. The techniques that seemed to work the best were:</p>
<ul>
<li>Look for human-readable strings first.</li>
<li>Don’t be afraid to <a href="https://en.wiktionary.org/wiki/yak_shaving">take a “side quest”</a> (like reverse engineering a bootloader) to make your primary effort (patching an operating system) more successful.</li>
<li>Use Ghidra’s decompiler. It’s really amazing.</li>
<li>Look for structure in the decompiled code.</li>
<li>Rename functions, variables, and data types once you even have a guess at what they might do.</li>
<li>Look up documentation and resources for the system if they’re available.</li>
<li>Sometimes, you just have to manually step through dozens of possible examples to find what you’re looking for. (It gets easier the more you do it!)</li>
</ul>
<p>(And thanks in part to this reverse engineering, <a href="https://github.com/mamedev/mame/pull/9545">the K2000 emulation in the MAME project now boots</a>!)</p>
<hr>
<p>Special thanks to David Ryskalczyk for unblocking my work half way through this project, and to David Ryskalczyk and Zameer Manji for reviewing drafts of this post.</p>
tag:blog.petersobot.com,2014:Post/machine-learning-for-drummers2018-07-22T19:24:12-07:002018-07-22T19:24:12-07:00Machine Learning for Drummers<p>TL;DR: In this post, I build an app that classifies whether an audio sample is a kick drum, snare drum, or other drum sample with 87% accuracy using 🎉machine learning. 🎉</p>
<p>First and foremost, I’m a drummer. At my day job, I work on machine learning systems for recommending music to people at <a href="https://spotify.com">Spotify</a>. But outside my 9-to-5, I’m a musician, and my journey through music started as a drummer. When I’m not drumming in my spare time, I’ll often be creating electronic music - with a lot of percussion in it, of course.</p>
<p>If you’re not familiar with electronic music production, many (if not most) modern electronic music uses <em>drum samples</em> rather than real, live recordings of drummers to provide the rhythm. These drum samples are often distributed commercially, as sample packs, or created by musicians and shared for free online. Often, though, these samples can be hard to use, as their labeling and classification leaves a lot to be desired:</p>
<p><a href="https://svbtleusercontent.com/hgcDr9LYBJd8hVpFYqkj8T0xspap.png"><img src="https://svbtleusercontent.com/hgcDr9LYBJd8hVpFYqkj8T0xspap_small.png" alt="kicks.png"></a></p>
<p>Various companies have tried to tackle this problem by creating their own<br>
proprietary formats for sample packs, such as Native Instruments’ <em>Battery</em> or <em>Kontakt</em> formats. Both use explicit metadata and allow users to browse samples by a variety of tags. However, these are all (usually) expensive software packages and require you to learn their workflows.</p>
<p>In an effort to better understand how to use machine learning techniques, I<br>
decided to use machine learning to try to solve this fairly simple problem:</p>
<blockquote class="short">
<p><em>Is a given audio file a sample of a kick drum, snare drum, hi-hat, other percussion, or something else?</em></p>
</blockquote>
<p>For example, which drums do these two samples sound like?</p>
<p><audio style="width:100%;"></audio></p>
<p><audio style="width:100%;"></audio></p>
<p>Humans have no trouble classifying these two sounds, as we’ve likely heard them tens of thousands of times before. The human brain is great at this kind of problem - computers, however, require some training.</p>
<p>In machine learning, this is often called a <a href="https://en.wikipedia.org/wiki/Statistical_classification">classification problem</a>, because it takes some data and <em>classifies</em> (as in <em>chooses a class for</em>) it. You might think of this as a kind of <strong>automated sorting system</strong> (although I’m using the word “sorting” here to mean “sort into groups” rather than “to put in a specific ranking or order”).</p>
<p>For those unfamiliar with machine learning, you might say:</p>
<blockquote class="short">
<p>Why not just train the computer to learn what a kick drum is (and so on) by giving it a whole bunch of data?</p>
</blockquote>
<p>This is <em>mostly</em> correct already! (Hooray, you’re a machine learning<br>
engineer!)</p>
<p>The trouble comes from deciding what <em>data</em> means in the above sentence. We could:</p>
<ol>
<li>Give the computer all of the data we have and let “<em>machine learning</em>” figure out what’s important and what’s not.</li>
<li>
<em>or</em> give the computer all of the data we have, but do a bit of pre-processing first to hint at parts of the data that might be important, then have “<em>machine learning</em>” classify our samples for us.</li>
</ol>
<p>Option 1 above is tricky, as our data comes in many different forms - long audio files, short audio files, different formats, different bit depths, sample rates, and so on, which would add a ton of complexity to our algorithm. Throwing all of this at a machine and asking it to make sense of it would require a <em>lot</em> of data for it to figure out what we humans already know.</p>
<p>Instead of making the computer do a ton of extra work, we can use option 2 as a middle ground: we can choose some <em>things</em> about the audio samples that we think might be relevant to the problem, and provide those <em>things</em> to a machine learning algorithm and have it do the math for us. These <em>things</em> are known as <a href="https://en.wikipedia.org/wiki/Feature_(machine_learning)"><em>features</em></a>.</p>
<p>(If this word is confusing, think of a feature just like a feature of, say, a TV - only instead of “42-inch screen” and “HDMI input”, our features might be “4.2 seconds long” and “maximum loudness 12dB”. The word means the same thing in both contexts.)</p>
<p>This process of figuring out what features we want to use is commonly known as <em>feature extraction</em>, which makes sense. Given our input data (audio files), let’s come up with a list of features that us, as humans, might find relevant to deciding if the file is a kick drum or a snare drum.</p>
<ul>
<li><p><strong>Overall file length</strong> is one simple feature - it’s easy to measure, and it’s possible that maybe a snare drum’s sound continues on for longer than a kick drum’s sound. (To prevent us from getting false positives here, let’s only count the length of time that the sound is not silent, or <strong>not quieter than -60dB</strong>, in the file.)</p></li>
<li><p><strong>Overall loudness</strong> might sound like a great feature to use (as maybe kicks are louder than snares?) but most samples used in electronic music are <a href="https://en.wikipedia.org/wiki/Audio_normalization"><em>normalized</em></a>, meaning their loudness is adjusted to be consistent between files. Instead, we can use <strong>maximum loudness</strong>, <strong>minimum loudness</strong>, and <strong>loudness at middle</strong> (that is, loudness at the 50% mark through the file) to get a better idea for how the loudness changes over time. Drum hits should be loudest at the start of the sample, and should quickly taper off to silence.</p></li>
<li><p>Humans can tell the difference between kick drums and snare drums intuitively, and we do so by listening to the frequencies present in the sound. Kick drum samples have a lot more low-frequency content in them, as kick drums sound low and bassy due to their large diameter. To teach this to a machine learning algorithm, we can take the <strong>average loudness in several frequency ranges</strong> to tell the algorithm a little more about the <a href="https://en.wikipedia.org/wiki/Timbre">timbre of the sound</a> as humans might hear it. (To better represent how this changes over time, we might take this loudness-per-frequency-band feature at regular intervals throughout the sample - 0% length, 5%, 50%, and so on.)</p></li>
<li><p>Drums, while being very percussive instruments, <a href="https://en.wikipedia.org/wiki/Drum_tuning">can still be <strong>tuned</strong></a> to various pitches. To quantify this tuning and help our algorithm use it as input, we can take the <a href="https://en.wikipedia.org/wiki/Fundamental_frequency">fundamental frequency</a> of the sample to help the algorithm distinguish between high drums and low drums.</p></li>
</ul>
<p>These are just some of the many features that might be useful for solving our classification problem, but let’s start with these four and see how far we get.</p>
<p>As with all machine learning problems, to teach the machine to do something, you have to have some sort of <em>training data</em>. In this case, I’m going to use a handful of samples - roughly 20-30 from each instrument - from the tens of thousands of samples I have in my sample collection. When choosing these samples, I want to find:</p>
<ul>
<li>samples that are representative of the different types of each instrument
(e.g.: a few acoustic kick drums, some electronic kick drums, some beatboxed kicks, and so on)</li>
<li>samples from different sources that might have different biases that humans have a harder time picking up on (e.g.: are all samples from one sample pack the exact same length? what about the same fundamental frequency?)</li>
<li>samples of things that <em>aren’t</em> drums, so that the algorithm can learn when a sample falls into the “something else” bucket</li>
</ul>
<p>I put together a list of these samples - 100 files, roughly 50 megabytes of sample data, in five separate folders: <code class="prettyprint">kick</code>, <code class="prettyprint">snare</code>, <code class="prettyprint">hat</code>, <code class="prettyprint">percussion</code>, and <code class="prettyprint">other</code>. (Most of these samples are from freesound.org and are licensed under a Creative Commons Attribution License, so special thanks to <a href="https://freesound.org/people/waveplay/">waveplay</a>, <a href="https://freesound.org/people/Seidhepriest/">Seidhepriest</a>, and <a href="https://freesound.org/people/quartertone">quartertone</a> for making their samples available for free!)</p>
<p>Now that we’ve got some data to train on, let’s write some code to perform the feature extraction mentioned earlier. These features aren’t super hard for us to calculate, but they’re also not super simple, so I’ve written some code below to extract them by using <a href="https://librosa.github.io/librosa/"><code class="prettyprint">librosa</code></a>, a wonderful Python library for audio analysis by the wonderful <a href="http://bmcfee.github.io/">Brian McFee</a> et al.</p>
<p>(All of the code in this blog post is <a href="https://github.com/psobot/machine-learning-for-drummers">available on Github</a> - feel free to download it and try running it on your own machine if you’re interested.)</p>
<pre><code class="prettyprint lang-python"># from feature_extract.py
def features_for(file):
# Load and trim the audio file to only the parts that aren't silent.
audio, rate = load_and_trim(file)
# Use poorly_estimate_fundamental to figure out what the rough
# pitch is, along with the standard deviation - how much it varies.
fundamental, f_stddev = poorly_estimate_fundamental(audio, rate)
# Like an equalizer, find out how loud each "frequency band" is.
# In this case, we're just splitting up the audio spectrum into
# three very wide sections, low, mid, and high.
low, mid, high = average_eq_bands(audio, 3)
return {
"duration": librosa.get_duration(audio, rate),
"start_loudness": loudness_at(audio, 0),
"mid_loudness": loudness_at(audio, len(audio) / 2),
"end_loudness": loudness_at(audio, len(audio)),
"fundamental_freq": fundamental,
"fundamental_deviation": f_stddev,
"average_eq_low": low,
"average_eq_mid": mid,
"average_eq_high": high,
}
</code></pre>
<p>Now we’ve got a number of features extracted from each sample. We can save these as one large JSON file for use later by our machine learning algorithm. (We haven’t done any learning yet, just figured out the data that we want to learn <em>with</em>.)</p>
<p>You can think of these features as measurements we’re taking of the samples, without having to use the entire contents of the samples themselves. (And that’s very true in this case - we started with over 50 megabytes of samples, but the features themselves are only 150 kilobytes - that’s more than 300 times smaller!)</p>
<p>Now, we can take these features and give them to a machine learning<br>
algorithm and have it learn from them. But hold on a sec - let’s get specific about which algorithm we’re talking about, and about what learning means in this context.</p>
<p>We’re going to use an algorithm called a <a href="https://en.wikipedia.org/wiki/Decision_tree">decision tree</a> in this post, which is a commonly used machine learning algorithm that <em>doesn’t</em> involve some of the buzzwords that you may have heard, like “neural networks,” “deep learning,” or “artificial intelligence.” A decision tree is a <strong>system that splits data into categories by learning thresholds for each feature</strong> in a recursive way. (If that’s confusing, don’t worry too much about it - but checkout <a href="http://www.r2d3.us/visual-intro-to-machine-learning-part-1/">R2D3’s amazing visual example of how decision trees work</a> if you’re curious).</p>
<pre><code class="prettyprint lang-python"># from classifier.py
def train_and_evaluate_model():
# First, let's read the features that we got from feature_extract.
features, classes, sample_names, _, _ = read_data()
# Let's use this percentage of the data to train, and the rest for
# testing. Why not just train on all the data? That would result in
# a model that is overfitted, or overly good at the data that it's
# seen and does poorly with data that it hasn't seen.
training_percentage = 0.75
num_training_samples = int(len(features) * training_percentage)
# Here we separate all of our features and classes into just the
# ones we want to train on...
train_features = features[:num_training_samples]
train_classes = classes[:num_training_samples]
# ...and we do the training, which creates our model!
# vvv MACHINE LEARNING HAPPENS ON THIS LINE BELOW vvv
model = DecisionTreeClassifier().fit(train_features, train_classes)
# ^^^ MACHINE LEARNING HAPPENS ON THIS LINE ABOVE ^^^
</code></pre>
<p>In this case, <code class="prettyprint">classifier.py</code> <em>trains a model</em> by creating a decision tree -<br>
which <em>is</em> our <strong>model</strong> - whose weights are statistically determined by the data that we pass in. Again, the specifics aren’t necessary to understand for the rest of this post, but here’s what a similar model looks like when visualized:<br>
<a href="https://svbtleusercontent.com/afzqagCv9TqY5LkQTP5zPD0xspap.png"><img src="https://svbtleusercontent.com/afzqagCv9TqY5LkQTP5zPD0xspap_small.png" alt="graph.png"></a><br>
Each new sample is passed into this tree, and the features that we provided are evaluated from the top down. For example, if a new sample has <code class="prettyprint">average_eq_2_10 ≤ -56.77</code>, as the top block in the diagram shows, the decision tree would move to the left and then check its <code class="prettyprint">fundamental_5</code> feature. It would continue to do so until it reaches the bottom of the tree, or a “leaf” (ha, tree, leaf, get it?), where it would declare that the given sample is whatever <code class="prettyprint">class</code> (or colour, in this diagram) that the leaf is.</p>
<p>Now, if we run <code class="prettyprint">classifier.py</code>, we should see two lists: one of the <strong>training</strong><br>
accuracy (how well the model predicted the kind of sample for samples that it saw during training) and the <strong>test</strong> accuracy (now well the model predicted samples that it hadn’t seen before). Our training accuracy is 100%, which is not surprising - that data was used to create the model in the first place! And thanks to the features we selected, of the samples that the model hadn’t seen before, it got most guesses (~87%) correct. This is pretty good for a first try! (If you run this code on your own laptop, you should find that it takes roughly 12 seconds to train on the provided example data.)</p>
<p><a href="https://svbtleusercontent.com/7asxppXK2cbj37RdeZJsj40xspap.png"><img src="https://svbtleusercontent.com/7asxppXK2cbj37RdeZJsj40xspap_small.png" alt="classification_results.png"></a></p>
<p>Our 87% performance is decent, but that 13% error rate might be considered an example of what’s called <em>overfitting</em> - our model has been trained to be overly specific and be completely accurate for data that it’s seen before, but it has trouble when it sees data that’s new to it. In some sense, this is similar to how humans learn; when someone sees something new that they hadn’t seen in school or heard about before, they’re bound to make mistakes.</p>
<p>To avoid overfitting our model, we could take a number of approaches:</p>
<ul>
<li>We could tune the algorithm’s parameters to try to force it to be less specific. This is a good place to start, especially with decision tree algorithms.</li>
<li>We could change our feature calculation to give more data to the algorithm, possibly introducing data that seems unintuitive to humans but would mathematically help solve our classification problem.</li>
<li>We could add more (and more varied) data so that the decision tree algorithm can create a more general tree, assuming that the existing set of data isn’t complete enough.</li>
</ul>
<p>All three of these are valid approaches, and they’re also left up to the reader to investigate. We could also try other classification methods instead of using a decision tree, although surprisingly a naïve decision tree works pretty well for this problem.</p>
<p>So! We’ve built a machine learning classifier for drum samples. <strong><u>That’s kinda cool.</u></strong> There are a couple things to note about this system:</p>
<ul>
<li>We do our machine learning training on <em>features</em>, rather than the audio data itself. This means that if we wanted to write a program to classify new, unknown samples against this model, it would first have to run the sample through the same logic that’s in <code class="prettyprint">feature_extract.py</code> before it would be compatible with the model.</li>
<li>The current model is held <em>in memory</em> and never written out to disk. This is somewhat impractical, and in a real-world machine learning system, you’d likely save the model as a separate file that you could then pass around and use in different situations. (In many popular machine learning systems, models are trained routinely on up to <em>terabytes</em> of input data, rather than the 40 megabytes we used here, so storing the outputted model on disk is very necessary.)</li>
<li>We’re currently training this model on around 150 samples, which gives okay results and allows us to test this model training in seconds rather than minutes or hours. We could try training this on <em>literally all of the samples available to us</em>, which might give much better results. (In tests on my entire sample library, I was able to get up to 90% accuracy, which is pretty good for a simple decision tree.)</li>
<li>This model is a <em>classifier</em>, which means that while it can put samples into buckets of sorts (and even give probability of a sample being in a bucket) it can’t tell you how much, say, a snare sounds like a kick. If you want to place your sounds along a continuous scale rather than into buckets, you’ll need another kind of machine learning algorithm.</li>
<li>The algorithm used by <code class="prettyprint">scikit</code> uses a random variable to choose how to create its decision tree. If this model was to be used in production, this random number generator should be <a href="https://en.wikipedia.org/wiki/Random_seed">seeded</a> to allow for exactly reproducible results, which makes it easier to test, debug, and use the model.</li>
</ul>
<p>If you’ve got your own sample library, or want to give this problem a try with samples you’ve found online, go for it! All of the code from this blog post is available <a href="https://github.com/psobot/machine-learning-for-drummers">here on Github</a>, and you can pop in your own sample packs and have fun. Some other things to try:</p>
<ul>
<li>Try using different features. <code class="prettyprint">librosa</code> is very advanced and exposes many parameters about the audio it’s analyzing - choose as many features as you’d like and try to improve your accuracy!</li>
<li>Try tuning the algorithm used for machine learning. Scikit’s <code class="prettyprint">DecisionTreeClassifier</code> has a lot of options that might improve accuracy by a lot. (If you end up trying to optimize this automatically, that’s called <a href="https://en.wikipedia.org/wiki/Hyperparameter_optimization">hyperparameter optimization</a> and is its own field of study within machine learning.)</li>
<li>Try throwing new kinds of audio files at this system to see what breaks. My training and test datasets didn’t include any longer audio files, full songs, podcasts, or other audio files that you might find. See how those files work with this model and see if you can improve it to handle those cases better.</li>
</ul>
<hr>
<p>Special thanks to <a href="http://jamie-wong.com/">Jamie Wong</a>, <a href="https://zameermanji.com/">Zameer Manji</a>, <a href="http://www.isaacezer.com/">Isaac Ezer</a>, and <a href="http://markkoh.net/">Mark Koh</a> for their proofreading and feedback on this post.</p>
tag:blog.petersobot.com,2014:Post/echo-dot-vs-chromecast-audio-an-evaluation2017-06-25T08:30:26-07:002017-06-25T08:30:26-07:00Echo Dot vs. Chromecast Audio: An Evaluation<p>I recently came into possession of both an <a href="https://www.amazon.com/All-New-Amazon-Echo-Dot-Add-Alexa-To-Any-Room/dp/B01DFKC2SO">Amazon Echo Dot</a> and a <a href="https://store.google.com/us/product/chromecast_audio?hl=en-US">Google Chromecast Audio</a>, two devices that can both stream music to speakers. While the Echo Dot includes voice control features and does much more than just play music, both devices can stream Spotify, which is basically all I use them for. So which sounds better?</p>
<blockquote>
<p><em>Disclaimer: As of time of posting, I am employed as a software engineer at Spotify, but this post does not reflect the views, opinions or position of my employer.</em></p>
</blockquote>
<p>A number of <a href="http://www.avsforum.com/forum/173-2-channel-audio/2659049-amazon-echo-dot-vs-chromecast-audio-streaming.html">forum posts</a> around the web feature audiophiles claiming that one device clearly sounds better, even after <a href="https://support.google.com/chromecast/answer/6290498?hl=en">enabling “Full Dynamic Range”</a> (really just turning off a built-in compressor) on the Chromecast. As I had already biased myself by reading these posts, I decided to perform an objective test.</p>
<p>To test this, I connected the 3.5mm audio outputs from each device to a <a href="https://www.presonus.com/products/audiobox-usb">USB audio interface</a> and streamed <a href="https://open.spotify.com/track/2L2ifQOG4wIdJDZh7ZgAqD">the same song</a> via each device’s Spotify integration.</p>
<iframe src="https://open.spotify.com/embed/track/2L2ifQOG4wIdJDZh7ZgAqD" width="300" height="100"></iframe>
<p>I took the resulting audio files and ran them through <a href="https://www.izotope.com/en/products/repair-and-edit/rx/rx-advanced.html">a spectrogram, followed by a spectral analyzer</a> to get an estimate at the real-world frequency response of each device.</p>
<h2 id="the-spectrograms_2">The Spectrograms <a class="head_anchor" href="#the-spectrograms_2">#</a>
</h2><h3 id="chromecast-audio_3">Chromecast Audio <a class="head_anchor" href="#chromecast-audio_3">#</a>
</h3>
<p><a href="https://svbtleusercontent.com/rhgnvzwshkcg.png"><img src="https://svbtleusercontent.com/rhgnvzwshkcg_small.png" alt="chromecast_audio_starchy.png"></a></p>
<h3 id="echo-dot_3">Echo Dot <a class="head_anchor" href="#echo-dot_3">#</a>
</h3>
<p><a href="https://svbtleusercontent.com/wfdpyrqck5cyrg.png"><img src="https://svbtleusercontent.com/wfdpyrqck5cyrg_small.png" alt="echo_dot_starchy.png"></a></p>
<h2 id="the-frequency-responses_2">The Frequency Responses <a class="head_anchor" href="#the-frequency-responses_2">#</a>
</h2><h3 id="chromecast-audio_3">Chromecast Audio <a class="head_anchor" href="#chromecast-audio_3">#</a>
</h3>
<p><a href="https://svbtleusercontent.com/zbfl4lhersbeza.png"><img src="https://svbtleusercontent.com/zbfl4lhersbeza_small.png" alt="chromecast_audio_starchy_spectrum.png"></a></p>
<h3 id="echo-dot_3">Echo Dot <a class="head_anchor" href="#echo-dot_3">#</a>
</h3>
<p><a href="https://svbtleusercontent.com/uv5v0eglaxsq.png"><img src="https://svbtleusercontent.com/uv5v0eglaxsq_small.png" alt="echo_dot_starchy_spectrum.png"></a></p>
<h2 id="the-conclusions_2">The Conclusions <a class="head_anchor" href="#the-conclusions_2">#</a>
</h2>
<p>Both devices performed very similarly in this simple test, but the Echo Dot seems to have a visible roll-off at around 16.5kHz, which is just barely within the audible range for most people. The Echo Dot also seemed to have imperceptibly worse stereo performance, with the left channel being about 0.25dB quieter than the right.</p>
<p>As a result of this test, I’m going to continue to use both the Echo Dot and Chromecast Audio, as that gives me the best of both worlds - convenient Spotify streaming with voice control, as well as casting arbitrary, high quality audio content from Google Cast-enabled devices. (It doesn’t hurt that the Echo Dot was free via <a href="https://developer.amazon.com/alexa-skills-kit/alexa-developer-skill-promotion">Amazon’s June 2017 “Publish a Skill, Get an Echo Dot” promotion</a>.) And to have both devices connected to a pair of powered studio monitors at the same time, I’m going to build <a href="http://www.instructables.com/id/Altoids-Tin-18-Stereo-Mixer/">a passive summing stereo mixer</a>.</p>
<h3 id="further-research_3">Further Research <a class="head_anchor" href="#further-research_3">#</a>
</h3>
<p>If I were to repeat this test, I’d take care to set my audio interface to 96kHz instead, as it’s possible that each device used a different sample rate, and that it’d be possible to see the difference when testing with a higher sample rate. It’s also possible that the quality difference comes from different source material - i.e.: the Chromecast Audio might use the 320kbps Ogg Vorbis stream from Spotify, while the Echo Dot might stream the 160kbps version. (I’d expect to see a more dramatic change in the frequency response in that case, though.)</p>
tag:blog.petersobot.com,2014:Post/debugging-an-empty-spam-email2016-10-12T09:52:51-07:002016-10-12T09:52:51-07:00Debugging an Empty Spam Email<p>Despite the best efforts of modern spam filters, we all still receive spam once in a while. When I see a spam email pop up in my main inbox, I often wonder what magic the spammer has discovered that allowed them to bypass Gmail’s spam filtering. (Often times, this translates into me being much more suspicious of a spam email than usual, as it must be “more advanced” in some way to have landed in my inbox.)</p>
<p>Just this past week, I received one such email. It had no subject, no body, was addressed to no one, but was cc’d to myself and 29 other Peters.</p>
<p><a href="https://svbtleusercontent.com/60baspv8nuvilq.png"><img src="https://svbtleusercontent.com/60baspv8nuvilq_small.png" alt="6C6C2CBC-9F08-4990-AA70-8D6B326C9717.png"></a></p>
<p>(The “…” box provided by Gmail did not expand or collapse any content when clicked.)</p>
<p>A side note on the recipients - it looks like the other unlucky email addresses either contained the string “peter” in the local part or the domain part. Interestingly, some of the recipients’ addresses did not contain the string “peter,” but visiting their domains revealed that they belonged to people named Peter. I suspect some other metadata was involved in choosing this list.</p>
<p>The return path of the email was a free account at a Russian webmail provider, bk.ru. It’s hard to tell if the spammer owns this email address, or compromised its credentials and is using it to send out spam, but I’m guessing the latter is true.</p>
<p>This email confused me for a few reasons. Why would a spammer waste time sending out an empty email? What’s the point of a spam email that has no content? To dig deeper into what’s in this email (and it’s not empty, that’s for sure) we’re going to have to look at the raw email body itself. Gmail provides access to the raw message body with the “Show original” option in its drop-down menu:</p>
<p><a href="https://svbtleusercontent.com/fpvskmy3qvh3a.png"><img src="https://svbtleusercontent.com/fpvskmy3qvh3a_small.png" alt="F4B7E714-CA29-4999-9140-A86250C3F52D.png"></a></p>
<p>Clicking on “Show original” will show a summary of the original message, as well as the original message body itself:</p>
<p><a href="https://svbtleusercontent.com/anqboi04n2hnpg.png"><img src="https://svbtleusercontent.com/anqboi04n2hnpg_small.png" alt="958ED991-791D-4890-8CA6-30872B484769.png"></a></p>
<p>If you’re not familiar with raw email message bodies, they’re not unlike HTTP requests. They start with headers, one header per line (with header names separated from values by colons). The end of the headers is indicated by a double-newline (“\n\n”, or “\r\n\r\n” depending on character encoding). These headers contain everything from the sender’s email address to the recipients, to the servers in between that received and forwarded messages. Of particular importance, though, is the Content-Type header:</p>
<pre><code class="prettyprint">Content-Type: multipart/alternative; boundary="--ALT--FP504ntv5azlR7xUQktA3MxnXkgct5eW1475692425"
</code></pre>
<p>As in HTTP, this header denotes the MIME type of the content. This email, like most nowadays, is a <code class="prettyprint">multipart</code> email (as defined by <a href="https://www.w3.org/Protocols/rfc1341/7_2_Multipart.html">RFC 1341</a>), which means it can contain multiple distinct parts. The <code class="prettyprint">multipart/alternative</code> type is a particular kind of multipart message that specifies its parts are semantically equivalent, but presented in different formats. This is how most HTML emails work, to preserve backwards compatibility with email clients that can’t (or are configured not to) display HTML emails. From <a href="http://stackoverflow.com/questions/3902455/smtp-multipart-alternative-vs-multipart-mixed#comment49349618_3984262">StackOverflow</a>:</p>
<blockquote>
<p>The last entry is the best/highest priority part, so you probably want to put the <code class="prettyprint">text/html</code> part as the last subpart. Per <a href="https://www.w3.org/Protocols/rfc1341/7_2_Multipart.html">RFC 1341</a>.</p>
</blockquote>
<p>By specifying both text and HTML parts, older email clients can display the text part that they know how to render, while newer clients can display the HTML.</p>
<p>So is that it? Does this mysterious empty email contain multiple parts that should be semantically equivalent (i.e.: contain the same message) but aren’t? Well, kind of. The first part of the email looks like this:</p>
<pre><code class="prettyprint">----ALT--FP504ntv5azlR7xUQktA3MxnXkgct5eW1475692425
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: base64
CgoKLS0gCldpbHNvbiBEYXZpZA==
</code></pre>
<p>Note that this is a plain-text content section, but with a content-transfer-encoding of <code class="prettyprint">base64</code>. We can decode the <code class="prettyprint">base64</code> string with Python:</p>
<pre><code class="prettyprint">In [49]: base64.decodestring('CgoKLS0gCldpbHNvbiBEYXZpZA==')
Out[49]: '\n\n\n-- \nWilson David'
</code></pre>
<p>And as it turns out, the plain text part of the email contains only the email signature. This is roughly what we’re seeing in Gmail, so one hypothesis would be that Gmail is skipping the HTML part and only displaying the text/plain part. But what about the HTML part? What’s in there?</p>
<pre><code class="prettyprint">----ALT--FP504ntv5azlR7xUQktA3MxnXkgct5eW1475692425
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: base64
CjxIVE1MPjxCT0RZPjxicj48YnI+PGltZyBzcmM9ImRhdGE6aW1hZ2UvcG5nO2Jhc2U2NCxpVkJP...
...50,000 more bytes...
</code></pre>
<p>Hmm. So the email body actually contains 50kb of data, but Gmail’s only displaying a handful of bytes. Let’s run that <code class="prettyprint">base64</code>-encoded string through our Python string decoder again:</p>
<pre><code class="prettyprint">In [56]: base64.decodestring(a)
Out[56]: '\n<HTML><BODY><br><br><img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAgAAAAGAC.../4DgZXPSWUmsAAAAASUVORK5CYII="><br>-- <br>Wilson David</BODY></HTML>\n'
</code></pre>
<p>Aha! So it’s HTML, and not very much HTML at that. In fact, there’s another base64-encoded string within the message, used to encode a data-URI for an embedded image. If we look at what Gmail renders in its DOM, we can actually see that it’s rendering the HTML part, but stripping out the <code class="prettyprint">src</code> attribute from the image:</p>
<p><a href="https://svbtleusercontent.com/dunnefjgzv8cka.png"><img src="https://svbtleusercontent.com/dunnefjgzv8cka_small.png" alt="D1740ADC-5B90-4FE9-B61F-6A09CF24B650.png"></a></p>
<p>So, what’s this image? For the third time, let’s use Python to decode it:</p>
<pre><code class="prettyprint">In [38]: png = base64.decodestring(b[53:-40])
In [39]: png
Out[39]: '\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR
</code></pre>
<p>Well, it looks like a PNG. Many PNG decoders have been exploitable<sup>(<a href="https://securelist.com/blog/virus-watch/74297/png-embedded-malicious-payload-hidden-in-a-png-file/">1</a>, <a href="https://www.exploit-db.com/exploits/39767/">2</a>)</sup>, but I figure a fully-updated and patched Chrome should be impervious to any PNG exploits. After writing the decoded string to a file, I opened it in Chrome to find:</p>
<p><a href="https://svbtleusercontent.com/jgfpcuniixsq.png"><img src="https://svbtleusercontent.com/jgfpcuniixsq_small.png" alt="0DED9413-41FC-4AAC-A762-741FD5D00C28.png"></a></p>
<p>Success! As expected, a Togolese lawyer is offering me $9,580,000. So it looks like the email does contain some spam content - in particular, a PNG of some text. As most spam filters don’t bother doing deep inspection of image attachments (save for scanning for viruses), the text rendered in this particular image made its way through Gmail’s spam filter. However, Gmail’s failure to render the <code class="prettyprint">data-uri</code> image resulted in an empty email, unexpectedly removing the spam in a different way.</p>
tag:blog.petersobot.com,2014:Post/scaling-deepdream2015-07-25T11:22:50-07:002015-07-25T11:22:50-07:00A DeepDream Web Service for $5 a Month<p>Google’s <a href="http://googleresearch.blogspot.ca/2015/06/inceptionism-going-deeper-into-neural.html">DeepDream neural net image processing library</a> is a stunning application of advanced technology. If you haven’t heard of it, DeepDream uses an <a href="https://en.wikipedia.org/wiki/Computer_vision#Recognition">image recognition system</a> in reverse - instead of trying to identify which objects are in a photo, it accentuates what it sees, producing extremely trippy visuals:</p>
<p><a href="https://svbtleusercontent.com/evz3libaghivq.jpg"><img src="https://svbtleusercontent.com/evz3libaghivq_small.jpg" alt="San Francisco's Bay Bridge, through DeepDream"></a></p>
<p>While DeepDream is cool, it’s also notoriously difficult to set up, as it was built by researchers with exceedingly complex software tools. Shortly after its launch, <a href="http://mattogle.com">Matthew Ogle</a> and myself decided to put together <a href="http://deepdre.am">a web interface - http://deepdre.am</a> to make the process simpler.</p>
<p><a href="http://deepdre.am"><img src="https://svbtleusercontent.com/vxhf5neec6na_small.png" alt="Screen Shot 2015-07-25 at 2.43.11 PM.png"></a></p>
<p>The site itself is pretty trivial - one page, with three options and one upload button. The fun part wasn’t the visual design or the user experience, but rather the scalable backend services that adapt the system to varying amounts of load without costing much more than a fancy coffee each month.</p>
<h1 id="really-tiny-microservices_1">Really Tiny Microservices <a class="head_anchor" href="#really-tiny-microservices_1">#</a>
</h1>
<p>“Microservice” is a buzzword that’s used exceedingly often in today’s Hacker News. The concept is simple - to break apart disparate functions of your application as a whole into small services that can be updated, scaled and managed independently. In theory, this reduces the “blast radius” of a single system failure (be it a hardware failure, network instability, logical error, or any other problem) to at most a single service. In practice, microservices often become tightly intertwined with one another, preventing this goal from being achieved.)</p>
<p>When building <a href="http://deepdre.am">deepdre.am</a>, I decided to naïvely try out this concept and split out each logical component of the application into its own service. This resulted in <strong>7 services</strong>, 5 of which are front-facing web services:</p>
<ul>
<li>
<code class="prettyprint">upload</code>, which accepts image uploads, validates their format and adds them to the queue of images to be processed</li>
<li>
<code class="prettyprint">progress</code>, which provides progress updates (via long polling)</li>
<li>
<code class="prettyprint">email</code>, which allows users to be notified when their image is ready</li>
<li>
<code class="prettyprint">abuse</code>, which allows users to report images that violate the TOS</li>
<li>
<code class="prettyprint">monitor</code>, which provides an administrative status dashboard</li>
<li>
<code class="prettyprint">process</code>, which:
<ul>
<li>removes images and metadata from a queue</li>
<li>runs the DeepDream algorithm on each image</li>
<li>emails the uploader to notify them that their image is done</li>
</ul>
</li>
<li>
<code class="prettyprint">scale</code>, which observes the queue and spins up cheap <a href="http://aws.amazon.com/ec2/purchasing-options/spot-instances/">Amazon EC2 spot instances</a> as necessary to process images</li>
</ul>
<p>All of these microservices are implemented in <a href="http://golang.org/">Go</a>, Google’s light, simple, and highly concurrent programming language. Using Go ensures some modicum of type safety, allowing me to catch trivial errors at compile time. Go is also trivial to deploy (binaries are generally statically linked and dependency-free) and extremely lightweight.</p>
<p>Each of the services listed above consumes around <strong>4MB</strong> of memory when serving HTTP requests - less than <em>half</em> the memory used by Ruby <em>just to load the interpreter</em>, not to mention loading Rails or Sinatra. When running on a bare-bones web server to keep costs down, this tiny memory footprint makes an extremely noticeable difference in website performance. On average, both static assets and API requests are served within 30 milliseconds - unheard of when using a large framework like Rails.</p>
<p>While microservices are generally supposed to be fairly isolated, each with their own data stores and infrastructure, this project is small enough that I opted to share data stores. In this case, Redis is used for ephemeral data (queueing tasks to be processed<sup id="fnref1"><a href="#fn1">1</a></sup>, progress updates, and notifications between processes) while MySQL<sup id="fnref2"><a href="#fn2">2</a></sup> is used for more permanent data. This approach allowed me to keep many of the advantages of microservices - including independent scalability, quick development, and tiny codebases - without using too many resources by spinning up multiple databases.</p>
<h1 id="hey-amazon-can-you-spot-me-5_1">Hey Amazon, can you spot me $5? <a class="head_anchor" href="#hey-amazon-can-you-spot-me-5_1">#</a>
</h1>
<p>As with all of my side projects, my primary goal when building <a href="http://deepdre.am">deepdre.am</a> was not just to create a service, but to do so at absolutely minimal cost. My target is to spend under $10 each month on everything - instance hosting, amortized domain costs, S3 usage, and bandwidth. (I’ve found <a href="https://www.digitalocean.com/?refcode=8df55bdeed1c">DigitalOcean</a><sup id="fnref3"><a href="#fn3">3</a></sup> to be powerful enough to support tens of thousands of monthly active users for $5/month, but there are countless hosts at similar price points.)</p>
<p>Hosting Redis, MySQL, and a handful of Go-based microservices on a $5 cloud host is trivial and speedy enough to support thousands of hits per minute. DeepDream, however, is a computationally taxing algorithm that requires a lot of processing power, and - ideally - a GPU to execute on.</p>
<p>This presented me with a hard problem. How do you provide quick response times without paying $468/month for a <code class="prettyprint">g2.2xlarge</code> instance on Amazon EC2?<br>
<a href="https://svbtleusercontent.com/kbkbf0kjwxjva.png"><img src="https://svbtleusercontent.com/kbkbf0kjwxjva_small.png" alt="g2.2xlarge.png"></a></p>
<p>The answer turned out to be simple: use a combination of a task queue and <a href="http://aws.amazon.com/ec2/purchasing-options/spot-instances/">EC2 spot instances</a>. When load on the system is low, the $5 VPS can slowly process images, saving money. When load on the system grows, however, spot instances are used to speed things up:</p>
<ul>
<li>A <code class="prettyprint">g2.2xlarge</code> spot instance can process approximately one image per second, and costs on the ballpark of $0.10/hour. However, as spot instances can be terminated at any time, applications must be termination-aware. (Amazon has a new “Spot Instance Termination Notice” feature that can come in handy here, but processes can also simply respond quickly to <code class="prettyprint">SIGTERM</code> signals to clean up before an instance is terminated<sup id="fnref4"><a href="#fn4">4</a></sup>.)</li>
<li>As EC2 instances are billed rounding up to the hour, committing to spawning an instance will cost at least $0.10<sup id="fnref5"><a href="#fn5">5</a></sup> and process at most 3600 images per hour.</li>
<li>To maximize value, an instance should be spawned when the number of images waiting in the queue to be processed approaches 3600<sup id="fnref6"><a href="#fn6">6</a></sup>.</li>
<li>If an instance is spawned but is no longer necessary (due to the queue being emptied quickly) then it should remain running until its age reaches 59 minutes, as Amazon bills for instances by the hour and rounds up.<sup id="fnref5"><a href="#fn5">5</a></sup>
</li>
</ul>
<p>To spawn spot instances, I used <a href="https://github.com/mitchellh/goamz">Mitchell Hashimoto’s <code class="prettyprint">goamz</code> package</a>, which is a thin Go wrapper around Amazon’s AWS APIs. A combination of a custom private <a href="http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AMIs.html">AMI</a> (that includes all of the required software) and <a href="http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/user-data.html">cloud-init user data</a> (to force a code update from source control) allows an instance to boot, connect to its data stores, and begin processing tasks in less than 120 seconds.</p>
<p>Practically, this combination of queueing and spot instances keeps expenses for <a href="http://deepdre.am">deepdre.am</a> extremely low - somewhere between $5 and $10 per month, depending on system load. (Amazon’s <a href="http://aws.amazon.com/about-aws/whats-new/2012/05/10/announcing-aws-billing-alerts/">Billing Alerts</a> also allow me to keep an eye on my usage, to avoid unexpected spending - and to respond to spikes in traffic as necessary.)</p>
<h1 id="don39t-forget-about-taxes_1">Don’t forget about taxes <a class="head_anchor" href="#don39t-forget-about-taxes_1">#</a>
</h1>
<p>The title of this post is <em>mostly</em> true - when load on the system is low, costs are around $5 each month. However, small additional costs do add up:</p>
<ul>
<li>The domain, <a href="http://deepdre.am">deepdre.am</a>, costs $73/year, or $6/month. (Special thanks to <a href="http://mattogle.com">Matthew Ogle</a> for buying an expensive <a href="https://en.wikipedia.org/wiki/.am">Armenian domain name</a> on a whim <a href="https://twitter.com/flaneur/status/617433314962579456">in response to a tweet</a>.</li>
<li>I lazily used Amazon S3 to store and serve images, which <a href="http://www.wolframalpha.com/input/?i=%28US%240.03%2F%28GB+*+month%29%29+*+%284MB+*+48hours%29+%2B+%282*%284MB+*+US%240.09%2FGB%29%29">costs $0.0007 per image</a> in both storage and transfer fees. (That’s approximately 1,428 images per dollar, which can go away once the S3 dependency is removed altogether.)</li>
<li>Assuming that the site waxes and wanes in popularity in a given month, it’s reasonable to expect about 20 hours of <code class="prettyprint">g2.2xlarge</code> spot instance usage, which costs approximately $2.</li>
</ul>
<h1 id="give-it-a-try_1">Give it a try! <a class="head_anchor" href="#give-it-a-try_1">#</a>
</h1>
<p>While the code’s not (yet) open source, the site is currently up and running - give it a try at <a href="http://deepdre.am">deepdre.am</a> and transform your images, if for no other reason than to stress test the system!</p>
<hr>
<p>Special thanks to <a href="http://malcolmocean.com/">Malcolm Ocean</a> for reviewing this post.</p>
<div class="footnotes">
<hr>
<ol>
<li id="fn1">
<p>As one does, I built <a href="https://github.com/psobot/pressure">my own Redis-backed queueing library with Go bindings</a> that came in very handy here. Many better alternatives exist - I would recommend using something more well supported like Github’s <a href="https://github.com/resque/resque">Resque</a> or Salvatore Sanfilippo’s <a href="https://github.com/antirez/disque">disque</a>. <a href="#fnref1">↩</a></p>
</li>
<li id="fn2">
<p>Yup, MySQL. I had a Puppet manifest laying around for a well-configured MySQL instance, and saved a grand total of 30 minutes by bolting together existing components rather than switching to Postgres. Such is the nature of quick hack projects. <a href="#fnref2">↩</a></p>
</li>
<li id="fn3">
<p>Yes, this link does contain a <a href="https://www.digitalocean.com/?refcode=8df55bdeed1c">DigitalOcean</a> referral code. You caught me. <a href="#fnref3">↩</a></p>
</li>
<li id="fn4">
<p>Terminating an instance seems to result in the normal ACPI shutdown process, which sends <code class="prettyprint">SIGTERM</code> to all processes, allowing processes to finish their tasks or put them back into queues if necessary. This is a terrible practice, as instances could suffer non-graceful failures at any time, and should not be relied upon to put their tasks back into queues - but for an application as frivolous and simple as <a href="http://deepdre.am">deepdre.am</a>, the 2-second delay between receiving <code class="prettyprint">SIGTERM</code> and losing power to an instance seems to allow for enough cleanup. <a href="#fnref4">↩</a></p>
</li>
<li id="fn5">
<p>Instances that are terminated by Amazon before their first hour has elapsed are free, so it’s also possible that an instance could cost $0.00. This means that it’s advantageous to wait until the 59 minute mark before terminating any spot instance, to increase the likelihood that Amazon will terminate the instance for you, making the entire hour free. <a href="#fnref5">↩</a></p>
</li>
<li id="fn6">
<p>This is a knob that can be tweaked - the two extremes are “spend lots of money and have images process very quickly” and “spend very little money and use a spot instance only when the queue becomes huge.” <a href="#fnref6">↩</a></p>
</li>
</ol>
</div>
tag:blog.petersobot.com,2014:Post/the-cost-of-waterloo-software-engineering2014-09-08T20:00:25-07:002014-09-08T20:00:25-07:00The Cost of Waterloo Software Engineering<p>This past June, I graduated from the University of Waterloo’s Software Engineering program. After 5 long and difficult years, I’m extremely proud to say that I’m a Waterloo grad, and very proud of my accomplishments and experiences at the school. Somewhat surprisingly, myself and most of my classmates were able to graduate from a top-tier engineering school with zero debt. (I know this might sound like a sales pitch - stick with me here.)</p>
<p>Waterloo is home to the world’s largest cooperative education programs — meaning that every engineering student is required to take at least 5 internships over the course of their degree. Most take six. This lengthens the duration of the course to five years, and forces us into odd schedules where we alternate between four months of work and four months of school. We get no summer breaks.</p>
<p>One of the most important parts of Waterloo’s co-op program is that the school requires each placement be <em>paid</em>. Without meeting certain minimum requirements for compensation, a student can’t claim academic credit for their internship, and without five internships, they can’t graduate. This results in Waterloo co-op students being able to pay their tuition in full (hopefully) each semester. In disciplines like Software Engineering, where demand is at an all-time high and many students are skilled enough to hold their own at Silicon Valley tech giants, many students end up negotiating for higher salaries at their <em>internships</em>.</p>
<p>To help visualize this financial situation and aid younger Software Engineering students in planning their future, I decided to create a little tool: the <a href="https://github.com/psobot/secalculator">SE Calculator</a>.</p>
<p><img src="https://svbtleusercontent.com/mq5ugnqr5fzjza.png" alt="![secalculator.png](https://svbtleusercontent.com/mq5ugnqr5fzjza_small.png)"></p>
<p>This simple, free, <a href="https://github.com/psobot/secalculator">open-source</a> in-browser tool allows you to calculate and visualize how much money you’ll earn or owe at the end of a five-year Waterloo Software Engineering degree. While it’s not rigorous (and <strong>should not be used as a financial advisor</strong>) it has helped me visualize how much money I’ve earned and spent during my academic career.</p>
<p>By default, the site assumes you’re a student that pays average Software Engineering tuition and average Software Engineering fees, earns one scholarship in your first year, and spends each internship working at software companies in Waterloo. The calculator includes a bunch of preset values, taken from personal experience and that of classmates, to simulate what you might make and spend when working in different regions or industries. (For example, the San Francisco Bay Area preset has a ridiculously high housing cost, but a similarly high salary.)</p>
<p>The site also stores your data in the URL string, because – well, simply – I wanted to store the data somewhere quick and easy. Bookmark the page once you’ve plugged in some values and store multiple datasets in your bookmarks bar.</p>
<p>If you’re a Software Engineering student (or will soon be one), I hope you find the tool useful to you. If you’re a student in some other Waterloo Engineering discipline, or in Computer Science, hopefully most of the fields still apply to you and you might get some utility out of the tool as well. </p>
<p>If you’re interested in customizing the tool - to add new presets, to adapt it to your own academic situation, or just to fix bugs - please feel free to <a href="https://github.com/psobot/secalculator">fork it on GitHub</a>. The tool runs almost entirely in-browser with <a href="http://angularjs.org">Angular.js</a> and uses <a href="http://gulpjs.com">Gulp</a> as a build tool. Happy hacking!</p>
tag:blog.petersobot.com,2014:Post/the-holiday-party-hack2013-12-14T10:45:14-08:002013-12-14T10:45:14-08:00The Holiday Party Hack<p>For this year’s holiday party at <a href="http://twg.ca">The Working Group</a>, I helped build something special to spice up the party - a live, music-synced slideshow of the evening, powered by a nearby photo booth. Take a photo with your friends and loved ones, then see it show up on the big screen seconds later.</p>
<p><a href="https://svbtleusercontent.com/bvipn7c3djtktg.jpg"><img src="https://svbtleusercontent.com/bvipn7c3djtktg_small.jpg" alt="photobooth.jpg"></a></p>
<h2 id="the-hardware_2">The Hardware <a class="head_anchor" href="#the-hardware_2">#</a>
</h2>
<p>To take the photos, we mounted a <a href="http://www.amazon.com/Canon-T2i-Processor-3-0-inch-18-55mm/dp/B0035FZJHQ">Canon Rebel T2i</a> with an [Eye-Fi card](<a href="http://www.eye.fi">www.eye.fi</a>) on a tripod in front of a great backdrop. A generous serving of props was provided for people to play with, and the room was well lit. </p>
<p>Also significant - the photo booth had a glass wall on one side, making it easy for partygoers to notice the fun to be had inside, while still allowing for a little bit of separation from the cacophony outside.</p>
<p><a href="https://svbtleusercontent.com/b2is9svrpny0g.jpg"><img src="https://svbtleusercontent.com/b2is9svrpny0g_small.jpg" alt="outside.jpg"></a></p>
<p>Finally, to allow partygoers to trigger their photos themselves without needing someone behind the camera, <a href="http://twitter.com/bgilham">Brian Gilham</a> and I built a huge, industrial-looking remote with a massive green button. In reality, we just wrapped the camera’s tiny remote in a larger enclosure and physically lined up the remote’s button with the plunger of a larger button.</p>
<p><a href="https://svbtleusercontent.com/r6tn2fqjxhwehq.jpg"><img src="https://svbtleusercontent.com/r6tn2fqjxhwehq_small.jpg" alt="button.jpg"></a></p>
<p>The Eye-Fi card in the camera synced its photos automatically with a nearby Macbook Pro.</p>
<h2 id="the-software_2">The Software <a class="head_anchor" href="#the-software_2">#</a>
</h2>
<p>To get the photos on the screen, a ridiculous number of steps were used. <a href="http://www.noodlesoft.com/hazel.php">Hazel</a>, running on the Macbook Pro, copied the photos from the Eye-Fi card’s folder into a dedicated folder in Dropbox. A Node.js app running on a Rackspace cloud server connected to the Dropbox API and received real-time updates whenever new photos were placed in the Dropbox folder. This app downloaded the high-res photos from Dropbox, used <a href="http://www.imagemagick.org/script/index.php">Imagemagick</a> to crop, scale, and rotate them appropriately, and streamed them down to all connected browsers.</p>
<p>A Macbook Pro connected to the projector ran a client-side JavaScript app and received real-time photo updates via <a href="http://socket.io">Socket.io</a>. This app also used the Web Audio API to run <a href="https://github.com/cjcliffe/beatdetektor/tree/master">BeatDetektor</a>, an open source JS beat detection library, on the audio received by the laptop’s microphone. Finally, <a href="http://www.schillmania.com/">Scott Schiller</a>’s 2003-era <a href="http://www.schillmania.com/projects/snowstorm/">snowstorm.js</a> library provided the wonderfully tacky snow falling in-browser.</p>
<p>This complicated chain of events made it super simple to build the software - by piecing together pre-made components like Dropbox, Hazel, and BeatDetektor, most of the work was already done. Some extra functionality even came for free - for example, by sharing the Dropbox folder with select people at the party, candid photos could be uploaded from people’s phones directly to the projector screen.</p>
<h2 id="the-results_2">The Results <a class="head_anchor" href="#the-results_2">#</a>
</h2>
<p>By the end of the night, more than 350 photos - 1.5GB of data - had been processed by the hack and made it to the big screen. At one point, so many photos were taken in quick succession that the server load spiked to 38 and crashed hard - bringing with it <a href="http://forever.fm">forever.fm</a>, my “infinite” radio station. Despite the small technical hiccups, the hack turned out wonderfully and was a huge success.</p>
<p>–</p>
<p>Huge thanks go out to <a href="https://twitter.com/mud">Chris Mudiappahpillai</a>, <a href="https://twitter.com/bgilham">Brian Gilham</a>, <a href="https://twitter.com/dcwca">Derek Watson</a> and <a href="https://twitter.com/saryev">Shiera Aryev</a> and many more for making the hack - and the evening - a resounding success.</p>
tag:blog.petersobot.com,2014:Post/the-architecture-of-an-infinite-stream-of-music2013-11-05T09:22:13-08:002013-11-05T09:22:13-08:00The Architecture of an Infinite Stream of Music<p>Nearly a year ago, I launched <a href="http://forever.fm">forever.fm</a> - a free online radio station that seamlessly beat matches its songs together into a never-ending stream. At launch, it was hugely popular - with hundreds of thousands of people tuning in. In the months since its initial spike of popularity, I’ve had a chance to revisit the app and rebuild it from the ground up for increased stability and quality.</p>
<p><a href="https://svbtleusercontent.com/9wosn0nn6gksq.png"><img src="https://svbtleusercontent.com/9wosn0nn6gksq_small.png" alt="ffm.png"></a></p>
<p>(<em>Grab the free <a href="https://itunes.apple.com/app/forever.fm/id727405817">iOS</a> and <a href="https://play.google.com/store/apps/details?id=com.appstruments.foreverfm">Android</a> apps to listen to <a href="http://forever.fm">forever.fm</a> on the go.</em>)</p>
<hr>
<p>Initially, Forever.fm was a single-process Python app, written with the same framework I had built for my other popular web app, <a href="https://the.wubmachine.com">The Wub Machine</a>. While this worked as a proof of concept, there were a number of issues with this model.</p>
<ul>
<li>Single monolithic apps are very difficult to <strong>scale</strong>. In my case, Forever.fm’s monolithic Python process had to service web requests and generate the audio to send to its listeners. This task is what’s known as a “soft real-time” task - in which any delays or missed deadlines cause noticeable degradation of experience to the user. As the usage of the app grew, it became difficult to balance the high load generated by different parts of the app in a single process. Sharding was not an option, as Forever is built around a single radio stream - only one of which should exist at the same time. Unlike a typical CRUD app, I couldn’t just deploy the same app to multiple servers and point them at at the same database.</li>
<li>Single monolithic apps are very difficult to <strong>update</strong>. Any modifications to the code base of Forever required a complete restart of the server. (In my initial iteration and blog post, I detailed a method for reloading Python modules without stopping the app - but ran into so many stability issues with this method that I had to abandon it altogether.) As with any v1 app, Forever had a constant stream of updates and fixes. Restarting the app every time a bug fix had to be made - thereby stopping the stream of music - was ridiculous.</li>
<li>Memory usage and CPU profiling were both difficult problems to solve with a one-process app. Although Python offers a number of included profiling tools, none of them are made to be used in a production environment - which is often the environment in which these problems occur. Tracking down which aspect of the app is eating up gigabytes of memory is critical.</li>
</ul>
<p>To solve all of these problems in one go, I decided to re-architect Forever.fm as a <strong>streaming <a href="http://en.wikipedia.org/wiki/Service-oriented_architecture">service-oriented architecture</a></strong> with a custom queueing library called <a href="https://github.com/psobot/pressure"><code class="prettyprint">pressure</code></a>. </p>
<p><a href="https://svbtleusercontent.com/ogunduu3u59xg.png"><img src="https://svbtleusercontent.com/ogunduu3u59xg_small.png" alt="foreverq.png"></a></p>
<p>Usually, service oriented architectures are strongly request/response based, with components briefly talking with each other in short bursts. Forever does make use of this paradigm, but its central data structure is an unbounded stream of MP3 packets. As such, a lot of the app’s architecture is structured around <strong>pipelines of data</strong> of different formats. To make these pipelines reliable and fast when working with large amounts of streaming data, I constructed my own <a href="https://github.com/psobot/pressure">Redis-based bounded queue protocol</a> that currently has bindings in Python and C. It also creates really nice <a href="http://d3js.org">d3</a> graphs of the running system:</p>
<p><a href="https://svbtleusercontent.com/xa3lmm7qyv6g.png"><img src="https://svbtleusercontent.com/xa3lmm7qyv6g_small.png" alt="queues.png"></a></p>
<p>Forever.fm is broken down into multiple services that act on these pipelines of data:</p>
<ul>
<li> The <strong>brain</strong> picks tracks from a traditional relational database, orders them by approximating the <a href="http://en.wikipedia.org/wiki/Travelling_salesman_problem">Traveling Salesman Problem</a> on a graph of tracks and their similarities, and pushes them into a bounded queue.</li>
<li> The <strong>mixer</strong> reads tracks from this queue in order, analyzes the tracks and calculates the best-sounding overlaps between each track and the next. This is essentially the “listening” step. These calculations also go into a bounded queue.</li>
<li> The <strong>renderer</strong> reads calculations from this queue and actually renders the MP3 files into one stream, performing time stretching and volume compression as required. This step pushes MP3 frames, each roughly 23ms long, into another bounded queue.</li>
<li> The <strong>mp3_server</strong> reads mp3 frames from this queue at a precise rate (38.28125 frames per second, for 44.1kHz audio) and sends them to each listener in turn over HTTP. (It also keeps track of who’s listening to help produce a detailed report of how many people heard each song.) There are a number of other services that come together to make Forever.fm work, including the excitingly-named <strong>web_server</strong>, <strong>info_server</strong>, <strong>social_server</strong>, <strong>manager</strong>, <strong>tweeter</strong>, <strong>relay</strong> and <strong>playcounter</strong>. Each of these services consists of less than 1000 lines of code, and some of them are written in vastly different languages. At the moment, they all run on the same machine - but that could easily change without downtime and <strong>without dropping the music</strong>. Each service has a different pid and memory space, making it easy to see which task is using up resources.</li>
</ul>
<p>To help achieve an unbroken stream of music and more easily satisfy the soft real-time requirements of the app, <code class="prettyprint">pressure</code> queues have two very important properties: <strong>bounds</strong> and <strong>buffers</strong>. </p>
<p>Each <code class="prettyprint">pressure</code> queue is <strong>bounded</strong> - meaning that a producer cannot push data into a full queue, and may choose to block or poll when this situation occurs. Forever uses this property to lazily compute data as required, reducing CPU and memory usage significantly. Each data pipeline necessarily has one <strong>sink</strong> - one node that consumes data but does not produce data - which is used to limit the data processing rate. By adjusting the rate of data consumption at this sink node, the rate (and amount of work required) of the entire processing chain can be controlled extremely simply. Furthermore, in Forever, if no users are listening to a radio stream, the sink can stop consuming data from its queue - implicitly stopping all of the backend processing and reducing the CPU load to zero. By blocking on IO, we let the OS schedule all of our work for us - and I trust the OS’s scheduler to do a much better job than Python’s.</p>
<p>In addition, each queue has a <strong>buffer</strong> of a set size that is kept in reliable out-of-process storage - Redis, in this case. If a process were to crash for any reason, the buffer in the queueing system would allow the next process to continue processing data for some amount of time before exhausting the queue. With current parameters, nearly all of the services in Forever could fail for up to 5 minutes without causing an audio interruption. These buffers allow each component to be independently stopped, started, upgraded or debugged <strong>in production</strong> without interrupting service. (This does lead to some high-pressure bug hunting sessions where I’ll set a timer before launching GDB.)</p>
<p>Most of the services involved in this pipeline are backend processors of data - not front-facing web servers. However, I’ve applied the same service-oriented philosophy to the frontend of the site, creating separate servers for each general type of data served by the app. In front of all of these web servers sits nginx, being used as a fast, flexible proxy server with the ability to serve static files. HAProxy was considered, but has not yet been implemented - as nginx has all of the features needed, including <a href="http://serverfault.com/questions/108261/how-to-make-modification-take-affect-without-restart-nginx">live configuration reloads</a>.</p>
<p>With this combination of multiple specialized processes and a reliable queuing system, Forever has enjoyed very high availability since the new architecture was deployed. I’ve personally found it indispensable to be able to iterate quickly on a live audio stream - often in production. The ability to make impactful changes on a real-time system in minutes is incredible - and although somewhat reckless at times, can be an amazing productivity boon to a tiny startup.</p>
<hr>
<p>Partially thanks to this new architecture, I’ve also built free <a href="https://itunes.apple.com/app/forever.fm/id727405817">iOS</a> and <a href="https://play.google.com/store/apps/details?id=com.appstruments.foreverfm">Android</a> clients for <a href="http://forever.fm">forever.fm</a>. Download them and listen to infinite radio on the go!</p>
tag:blog.petersobot.com,2014:Post/co-working-at-the-working-group2013-10-25T06:29:55-07:002013-10-25T06:29:55-07:00Co-Working at The Working Group<p>Early in my academic career at the University of Waterloo, I was fortunate enough to land a co-op placement at The Working Group. Back then, the team was just over a dozen people. We were taking on our first mobile projects, and were starting to outgrow our old office at the Burroughes building – where we still had musical jam sessions with the partners every couple weeks. I learned more and had more fun in that four-month placement than I thought possible.</p>
<p><a href="https://svbtleusercontent.com/jgutijkimzwrya.jpg"><img src="https://svbtleusercontent.com/jgutijkimzwrya_small.jpg" alt="IMG_5800.jpg"></a></p>
<p>That was two years ago. In February 2013, I founded <a href="http://appstruments.com">a software company</a> that creates music apps that anybody can use. So far, our portfolio of products includes <a href="https://the.wubmachine.com">The Wub Machine</a>, an automatic music remixing app, and <a href="http://forever.fm">Forever.fm</a>, an app that creates an infinite DJ mix of the hottest songs on SoundCloud. These two apps have proven popular, and have already reached more than 1,000,000 people across the world. However, their development had also plateaued – the “next steps” in each project required too much time and effort for me to complete in my spare time. Luckily, as a Waterloo co-op student, my classes are interrupted regularly by mandatory four-month work terms. For my sixth and final internship slot, I decided to forgo the tempting internship offers from San Francisco startups – and to instead spend four months bootstrapping my own products.</p>
<p><a href="https://svbtleusercontent.com/eh2exn4uz6qmw.jpg"><img src="https://svbtleusercontent.com/eh2exn4uz6qmw_small.jpg" alt="IMG_6077.jpg"></a></p>
<p>When I set out on this plan, I was first greeted by incredulity from my classmates who were returning to cushy internships in the Bay Area. One of the first people to offer encouragement was Andrés Aquino, partner at The Working Group. After I dropped back into the office to give <a href="https://speakerdeck.com/psobot/ops-for-devs">a tech talk</a> in early May, Andrés was quick to extend an invitation to return if I needed an environment to work in. For me, working full time to bootstrap my company, this simple invitation solved many problems. Without TWG, who would I bounce ideas off of? Who would I show my work to to ensure that I’m building the right products? Most importantly, who would point out to me when I was making mistakes? Incubators like Y Combinator or Waterloo’s own <a href="http://velocity.uwaterloo.ca/">VeloCity Garage</a> usually provide people who can fill that mentorship role – but I wasn’t yet at a stage to get accepted by either.</p>
<p>So far, only one month into my endeavour, things have been going extremely well. Having a desk to come in to and co-workers to talk with has been surprisingly motivating. The office has a very open culture that’s made me feel like part of the team again, despite only sharing a desk and hanging out in the team’s HipChat room. Each week, I’m held accountable by participating in morning standup meetings. (While I should hope that I don’t need external motivation to accomplish my goals, being present at the office has made it impossible for me to procrastinate.) I also make a point to demo two things every Friday: both the product I’ve worked on and the technology behind it. If I don’t learn something new each day, I’m not satisfied with my progress – and if I don’t pass on what I learn to the team, then I’m not doing my part. This spirit of “learning and teaching” also helps me solidify what I’ve learned and distill it into meaningful information that’s useful to others.</p>
<p><a href="https://svbtleusercontent.com/aizve901almza.jpg"><img src="https://svbtleusercontent.com/aizve901almza_small.jpg" alt="IMG_6052.jpg"></a></p>
<p>In the three months I’ve got left at TWG, I have a long list of things to accomplish. If productivity stays as high as it has been in the past month, I’ll have plenty to show for it by the time I’m done. My goal is to make sure that the TWG team learns just as much as I do.</p>