Patching an Embedded OS from 1996 with Ghidra
For reasons I won’t get into, I’ve been working on a tricky reverse engineering puzzle recently: how to patch the operating system of a 26-year-old synthesizer. To be specific, the Kurzweil K2500, a sample-based synthesizer released in 1996.
As with many digital musical instruments, this synthesizer is really just a computer with some extra chips. In this case, it’s a computer based around the CPU that was popular at the time: the Motorola 68000, which was also famously used in the original Macintosh and the Sega Genesis. I want to patch the operating system of this beast to do all sorts of other things, most of which which I’ll leave to the imagination in this already-very-long post.
Finding the Operating System #
Modifying the operating system sounds great, but how do we get access to the code in the first place? Luckily, the K2500 operating systems are still provided by the manufacturer on what looks like an old FTP site. Downloading and unzipping the operating system gives us a .KOS
file, which seems to be a custom format. Opening the file in Hex Fiend shows its bytes directly:
Unfortunately, nothing stands out here. There seems to be a human-readable 4-byte header at the top: SYS0
, possibly followed by other header bytes, but it’s really hard to tell. Regardless, we already know that this operating system runs on a Motorola 68000 CPU. Let’s just try interpreting the data as a binary, and see how far we can get.
Enter Ghidra #
The operating system file we’re using is probably raw machine code: literally the instructions and data interpreted by the CPU itself. To make any sense of this whatsoever, we’re going to need to disassemble it, to turn it back into assembly code - and hopefully eventually decompile it back into C-style code.
To do this, let’s use a tool called Ghidra: an open-source reverse-engineering program built, maintained, and released by the United States National Security Agency. (Yes, that one. Really.) To start, let’s import the .KOS
file directly into Ghidra and analyze it with the default settings, which will search for instructions.
Scrolling through the file shows that parts of the data have been analyzed by Ghidra as valid 68k instructions, but much of the file remains unanalyzed. Strangely, scrolling further through the files shows that Ghidra has correctly identified a number of human-readable strings in the file (great!) but the code seems to be referring to the strings offset by some amount, showing up as cut-off strings in Ghidra.
This is because we just loaded the entire .KOS
file into Ghidra, ignoring the fact that it has a header and likely some other extra bytes. This is a pretty big problem. Any cross-references between functions will be inaccurate as we continue to reverse-engineer the data, sending us in the wrong direction nearly every time we try to follow a reference. We need to fix this first.
Reverse Engineering the Bootloader #
To reverse-engineer the .KOS
file, it would be extremely useful to dig into the code that creates or consumes these files. We don’t have the creation code, but we do have access to the code that consumes these files: the bootloader for the synth itself, which is also still available online (edit: it appears they’ve taken this down, as of August 2022)! Let’s load it into Ghidra and make an assumption to make our lives easier: let’s guess that the first 8 bytes of the file are part of a header.
Where did that number come from? Well, I tried 0, +4, +8, +12, +16, and +20 byte offsets, and +8 disassembled the most correctly. Yes, this took a while. In hindsight, all of this also happens to work because the code in the file gets loaded into address
0x0
in memory. If it was loaded somewhere else, we’d have to figure out what that location is before we could effectively disassemble the code.
Just like before, let’s look for something human-readable first. Searching through the strings brings up a couple error strings that seem like they might get thrown by the code we care about:
Ghidra has identified what it calls XREFs here - cross-references, indicating that these strings are called from a certain place. Let’s follow this reference:
Aha! Now we’re getting somewhere. This looks an awful lot like a switch statement, decompiled by Ghidra here as an if
tree. It seems like there are a series of error codes (0x100
through 0x105
, then 0x200
, 0x201
, etc.) that each correspond with an error string that presumably gets printed on the screen. Let’s keep pulling on this thread. Using Ghidra’s “Find References” function, we end up at this function:
We’re getting closer! Ghidra’s done something great for us here: the decompiled code includes some variable names, automatically determined based on the strings that those variables point to. Given that we know that some of these variables are strings, we can take some guesses and use Ghidra’s “Rename” and “Retype” tools to make this function read a lot more clearly:
It looks like we have a two-stage process here: first, the new operating system file is checked by calling ActuallyCheckOrFlashTheOS?(0, ?)
. If the check passes, then the same function is called again with 1
. It seems like that function probably reads the .KOS
file format we’re investigating: let’s dig in there.
This doesn’t have any of the hints we saw before; there are no human-readable strings we can read, nor are there any function names. Instead, we can look at the structure of this function to understand what it does. Even without variable names, the structure of this code looks pretty similar to opening a file in C! It looks like we have an fopen
-style call, followed by an fread
, followed by an fread
again in a while loop. Let’s add comments to make this clearer.
With the comments, it seems we now have a couple questions answered:
- The
.KOS
file starts with a 4-byte header:SYS0
- After the header, the file is divided into fixed-size chunks
- Each chunk starts with a single 4-byte integer
- An unknown number of bytes of actual data are read
- Each chunk ends with a single 4-byte integer which seems to be some sort of checksum
However, there’s one new question we need to answer as well: why are certain constants and functions referenced at very high addresses in memory? (i.e.: 0x021317ac
seems to contain the number of bytes in each chunk of the .KOS
file, but the data in the ROM doesn’t reach that high!)
To better understand what’s at those high addresses, let’s turn to the service manual for this unit. (Huge shoutout to David Ryskalczyk for this idea!) Buried deep in a non-OCR’d PDF lies this useful tidbit of information, in a list of diagnostic procedures:
Thanks, service manual! It looks like 0x021317ac
lands directly in the middle of this synth’s “volatile RAM” - the RAM used by the processor while it’s running.
It’s great that we had the service manual as a reference here. Without this information, we could have made an educated guess based on address prefixes that show up often in the code. If that didn’t help, we could have tried to find an electrical schematic for the unit and traced the address lines coming from the various chips to the CPU. This stuff gets complicated fast.
Let’s tell Ghidra to treat this as RAM in its “Memory Map” window, and then jump to near the address we’re interested in: 0x021317ac
.
There’s no data here (as Ghidra knows this is RAM, which is randomly initialized when a computer starts) and it looks like the address in question is being read from ((R)
) , but never written to ((W)
). Maybe the writes are happening further up?
Aha! Ghidra shows us that a function is writing directly to the start of RAM. Given that we had to scroll up 6,060 bytes to find the first write, maybe this method copies a bunch of data into RAM. Let’s click through to see what’s there.
Uh, one sec. Let’s rename some stuff again.
Much better. It looks like we’re coping a bunch of data from ROM into RAM - specifically from 0x0001860a
to 0x02130000
. How much is a bunch? Well, 0x690
32-bit long words, which works out to 6,720 bytes. (This snippet of code also then zeros-out the next 0x1e1
32-bit long words, or 1,924 bytes.) Now that we know that the code is probably initialized with the same data as that part of the ROM, we can tell Ghidra to map that part of the ROM to this part of the RAM directly.
Now, going back to the part of RAM that we were reading, we can see that there are bytes present instead of question marks.
It looks like the value stored at 0x021317ac
is 0x20000
, which works out to 131,072 bytes! (We call this “128 kilobytes” because numbers are complicated.)
Great! So we’ve now figured out that each chunk of the .KOS
file format is 128kB in size. That’s all we need to know to build a decoder for this format, remove the chunk headers, and end up with a file that will have correct relative offsets. This allows Ghidra to properly disassemble and decompile the file, and allows us to actually poke around at the operating system code. (I’ve gone ahead and done this already, and that .KOS
file packer/unpacker is available on GitHub.)
Exploring the Operating System #
Alright, we’ve now got a “clean” dump of the operating system. Let’s open up that file in Ghidra, just like we tried to before. Let’s use Ghidra’s search function to find some interesting strings.
Let’s do a quick test to see if we can modify the operating system successfully. Ghidra lets you change instructions or data in a binary; so here, let’s change one of these strings to contain different text. (We’ll need to keep the length the same, to avoid moving other code around.)
After re-packing the operating system, let’s load it onto a floppy disk, install it on the real hardware, and…
What gives? Well, remember that “some sort of checksum” field we saw in the .KOS
format earlier? It turns out, that’s actually checked by the hardware when installing a new OS. Luckily, Ghidra can help us here too -let’s go back to the bootloader and click through to FUN_0x021302b2
, which looks like it computes some sort of checksum for us.
And again, after guessing at some variable names:
It looks like this checksum function is pretty simple: for each byte x
, the checksum is equal to x + checksum
shifted left by one bit, which is then bitwise OR’d with x + checksum
shifted right by 31 bits. That’s a neat checksum I hadn’t seen before at all, and which advanced checksum reversing tools like the wonderful delsum can’t figure out either.
With this checksum, we can now change our .KOS
file dumping script to properly re-pack new data with correct checksums. And once that’s done, let’s try flashing the OS again:
We can now flash a new operating system onto this hardware, modifying or extending its capabilities however we’d like. (That part, however, is left as an exercise for the reader.)
What We Learned #
Wow, that was a bit of an ordeal. I’d never used Ghidra before trying this project, and now I feel comfortable enough to use it for future absurdly-obscure retrocomputing reverse engineering. The techniques that seemed to work the best were:
- Look for human-readable strings first.
- Don’t be afraid to take a “side quest” (like reverse engineering a bootloader) to make your primary effort (patching an operating system) more successful.
- Use Ghidra’s decompiler. It’s really amazing.
- Look for structure in the decompiled code.
- Rename functions, variables, and data types once you even have a guess at what they might do.
- Look up documentation and resources for the system if they’re available.
- Sometimes, you just have to manually step through dozens of possible examples to find what you’re looking for. (It gets easier the more you do it!)
(And thanks in part to this reverse engineering, the K2000 emulation in the MAME project now boots!)
Special thanks to David Ryskalczyk for unblocking my work half way through this project, and to David Ryskalczyk and Zameer Manji for reviewing drafts of this post.