Peter Sobot

How to Buy Property in New York City

2025-02-23T16:38:53-08:00

So, it’s come to this. You’re thinking about purchasing real estate in New York City.

Dear reader, I did this in 2023-2024. Below is a comprehensive guide on how I did this as a tech worker and transplant, after living in New York for 7 years.

Take all of this with a grain of salt. My fiancée and I are both mid-career tech workers and got to this point after each working full-time for 15+ years. Your story may not match ours, and this is not financial advice.

Step One: Are You Sure? #

Buying real estate is likely one of the biggest single financial transactions you will make in your life. Buying your “primary residence” - where you live - is a decision that involves huge financial, emotional, and logistical components. There is no one reason to buy property.

What convinced my fiancée and me, though, was that we prioritized stability above all else. For the next stage of our lives, we wanted a permanent home that we wouldn’t have to worry about being forced out of due to rent hikes, the whims of landlords, or chaos in the financial markets. The rent on our Manhattan apartment had jumped 23% the previous year, and while we realized that was an extreme outlier, we wanted to insulate ourselves from that risk as we planned our next 5-10 years.

Depending on your risk tolerance and your life plans, you may not share this reason for buying. Perhaps you want to combine your need for housing with a financial investment vehicle - that’s a valid reason to buy. Perhaps you want to stop “throwing away money” on rent; that’s fine too (although that traditional wisdom may not be accurate in NYC in 2025).

The New York Times published an excellent interactive calculator that compares renting and buying from a financial perspective. This is a great tool and an excellent place to start, although you may want to do more calculations to compare renting or buying given your particular financial situation.

I ended up doing way more spreadsheeting than this calculator (more details on that below) to determine exactly what kind of mortgage we should get, what price we could comfortably afford, and so on. The general rule that I came to strongly believe, however, was that if you know you want to live in the same NYC apartment for more than 6 years, buying makes more sense financially; even in the current high-interest-rate environment.

Where to Buy #

For the past 100 years, real estate agents have said that the top three reasons to buy a property are always “location, location, location” - and this is even more true in New York than in most other places in the U.S.

Let’s say you’ve lived in your current neighborhood for 3-4 years. You love the area, but now you want to buy property instead of renting. So, you load up Zillow, Redfin, or your app-of choice and look at listings.

And everything is way out of your budget. You’re confused, because your rent - while high - is still lower than the total monthly cost of these other units.

In NYC, it’s extremely common for an area to be relatively affordable to rent in, but extremely expensive to buy in. Why does this happen?

Different housing stock: “Rentable” and “ownable” properties differ substantially in size, shape, and amenities. Many high-end rental buildings are owned by massive corporate landlords - with names you may recognize like Avalon, Brookfield, Two Trees - who often design their buildings specifically as rentals with relatively high turnover. Your 600-square-foot rental unit on 32nd Street may be relatively affordable, but the only properties for sale nearby are completely unaffordable 1,800-square-foot condo lofts.
Low Turnover: rental units turn over way more quickly than sales, so opening StreetEasy to see what you can rent will paint a very different picture of a neighborhood than opening Zillow. So even if there are more affordable properties in the neighborhood, they’re much less likely to be for sale.
Few Condo Rentals: Assuming that rental costs must be higher than ownership costs would make sense if your unit was being rented to you by a condo owner; after all, renting out a condo only makes sense the landlord can charge a premium. But in New York, the vast majority (85%+) of rental housing stock is owned by institutional landlords, rather than individuals. The assumptions fail to hold in this case; and even so, anecdotally, it can be very hard for individual landlords to rent out their units economically.

So, if you’ve become really attached to your neighborhood, and especially if you live in a big corporate-owned rental building, there’s a very good chance that you won’t be able to buy something there in your price range on the timeline you want.

Defeated by the prices you see on Zillow, you’ll start branching out. And that’s when you really need to start to explore the city a bit more.

The New York Times’ Upshot published one of my all-time-favourite data visualizations in 2023: An Extremely Detailed Map of New York City Neighborhoods. This spectacularly beautiful crowd-sourced map shows neighborhoods you may have never heard of, and is invaluable to consult when a realtor assures you that a property is “definitely in Fort Greene, not Downtown Brooklyn” (they’re lying to you, they’re usually lying to you).

How, then, do you pick a neighborhood? Here’s what worked for us:

Start by narrowing down neighborhoods by transit time. Consider how you can get to your current job, the next job you might have, areas you usually hang out in, etc.
- Make sure you’re not just off one transit line if you can avoid it; if your train line goes down for the weekend and you feel stuck at home, you’ll hate living there.
Then, narrow down by vibes: does this neighborhood fit your lifestyle? Have you spent time here before? Do friends live in the area, or have friends lived in the area before?
- We bought in a part of Brooklyn that we had each spent enough time in to get it, but neither of us had lived in before.
Then, narrow down based on housing stock: there will be many neighborhoods with good transit access and good vibes. Which of those have high-quality property for sale in your price range?

This worked very well for us; we chose three neighborhoods that would have been great to live in, and ended up zig-zagging across the city for viewings. (This approach can lead to some confusion with realtors, though; more on that below.)

To properly understand the vibes of an area, you probably need to spend enough time in the city to explore tons of neighborhoods and learn about each of them. As an example, friends of mine once moved to NYC from Toronto and immediately started looking to buy an apartment together. This, in no uncertain terms, was an absolutely terrible idea; not for financial reasons, but because they had just landed in the city. I asked them to name their three favourite neighbourhoods, and when they said “Yorkville and Lenox Hill” but couldn’t come up with a third. (They ended up renting and never bought.)

How to See Property #

At this point, you’ve narrowed down the city to a list of neighborhoods you’re considering buying in. Excellent. To actually see those properties, you’ll want a buyer’s agent.

I say want here because using a buyer’s agent makes the process much easier. Technically, no, you can buy a property without one. Practically, for a first-time buyer, you will find it exceedingly difficult.

A buyer’s agent will:

Book viewings so you can see properties you’re interested in
Suggest properties that you may not have seen, sometimes before they’re listed publicly
Handle communication with the listing agent(s), who are selling the property
Attend viewing with you (often to reassure the listing agent that you’re trustworthy)
Walk you through the process of submitting an offer
Not get paid until the deal is done

So, how do you find a buyer’s agent?

If you already have your eyes on a property, Zillow will semi-surreptitiously connect you with a buyer’s agent automatically when you click “Schedule a Viewing” on a specific property listing. A lot of effort goes on behind the scenes to choose which agent gets your business. You can work with this person, but you do not need to go through this agent.

You can, instead, find an agent on your own to represent you. You can find this agent by word-of-mouth (asking your friends, coworkers, etc) or by comparing agents you find online, etc; just like any other professional service. However, this is made more complicated by the fact that agents usually work in a specific area.

When we started looking, we used “the Zillow button” to automatically find us an agent in our area. It turns out that this agent was the same one who rented us the property, and we already had rapport with him. He made it clear that he focused on that area of Manhattan, so when we started looking in Brooklyn instead, we found a different agent entirely.

If you’re open to looking in multiple neighborhoods, you might have to find yourself an agent willing to travel outside of their usual area-of-expertise. You might even consider using multiple agents, but this can be dicey, as agents obviously don’t like having their time wasted; they don’t get paid unless you buy something they show you. This is a conversation you can have up-front, but agents do often interpret “openness to multiple non-adjacent neighborhoods” as a sign of unseriousness.

Once you have an agent; you’ll book viewings. Viewings often happen either by open house (5-6 simultaneous potential buyers in the place at once) or as dedicated, 1:1 viewings. Viewings are scheduled in advance to allow the current owners to be out of the house and to ensure the space is clean. It’s not uncommon to schedule a handful of viewings back-to-back on the same day. Before buying, we went to roughly 10 viewings.

What should you focus on during a viewing?

Get a good look at the space, and use the fact that you’re there in person to imagine how you might fill the space.
Take pictures and video.
Look for potential problems up front; damage, poor maintenance, or signs of repair.
Carefully consider how the space might look in different seasons or in different light; viewings are often scheduled at times that make the place look as good as possible. Will it still be comfortable when the sun is rising or setting? What about in the winter?
Feel free to bring a measuring tape. (A full inspection will come later.)

When we went to viewings, I started by asking “could I live here” - which is a good question, but not quite enough, as you could live in a lot of places that aren’t optimal. I eventually switched to asking “would I like waking up here every day” and “for how long would I be happy with this place?”

If you’re buying alone, your agent will be there with you. If you’re buying as a couple, your agent will be there with you; but you should both be there. (I saw the place we ended up buying solo, by taking a good video and sending to my fiancée; we then did a second viewing days later.)

Your agent will collect some details for you, but you’ll want to ask a number of questions yourself while you’re there:

Is the property standalone, part of a condominium, or part of a co-op?
Is there a Homeowner’s Association (HOA)? How does the HOA work? Is there a management company? What fees do they charge?
Is there a vent in the kitchen?
Is there a 421-A new-build property tax abatement?
Is there a Cooperative and Condo property tax abatement?
What kind of heat/AC does the building have? Central? Per-unit? Window units?
Is the building in a flood zone?
Which appliances are included?
How long has this been on the market?
If the unit is a new build (i.e.: still under construction):
- Who is the developer/sponsor?
- When is the CO (Certificate of Occupancy) expected?
- When is the forecasted date for move-in?
- Is there an “outside date” (after which - if the property is not ready - the buyers can back out of the contract)

Collect answers to these questions, take your pictures and videos, say thank you to your agent, and go home.

Then do something else for an hour.

Then, revisit your photos and think critically; would I like waking up here every day? How long would I be happy living in this place?

If it’s a no, tell your agent quickly so you can pick a next property to view. If it’s a yes, move on to the next section.

Wait, What’s a Condo? What’s a Co-Op? #

In New York City, most apartments you can purchase are part of either condominiums or housing co-operatives. Both allow owning a stake of part of a larger building. Condos allow for legal ownership of your actual unit and a portion of the common areas, whereas with co-ops, you own a share of the legal entity that owns the entire building. In co-ops, you pay no property tax directly, which makes their maintenance fees appear higher, as they include your share of the building’s taxes.

Both condos and co-ops have boards, which decide who can live in the building and what rules they have to follow. Co-ops are known to be generally more strict, especially when it comes to your financial standing - some boards want to see that you have many years’ worth of maintenance payments available in cash.

Many smaller (2-5 unit) buildings in New York can also be self-managed condos, where the building is managed by the residents themselves, generally resulting in lower fees but also sometimes neglectful maintenance.

We bought in a building with a self-managed condo, which has turned out to be - for lack of a better word - “chill.” The articles of incorporation for the condo board lay out ground rules for what the group should do, but functionally, it’s an LLC, a WhatsApp group, and a Venmo account.

So, You Like A Place #

After taking some time (ideally about a day) to think about a place you’ve just viewed, you’re now interested in buying it. That’s great! Now things get tricky.

Unlike on Amazon, you can’t just click “Buy Now” on a property. Virtually all desirable properties in NYC get multiple offers shortly after their first viewing. The seller of the property has control, here: they can choose to accept any offer at any time. It’s common, however, for the listing agent to advertise a “highest and best” due date; a moment in time at which all potential buyers are asked to have their best possible offers submitted.

To submit an offer, each listing agent will often send out an offer form. These forms are usually templates given to them by their brokerage. The point of an offer form is for the seller to get some assurance that you’ll actually be able to pay for the property in some way; either in cash, or via financing (i.e.: a mortgage). You’ll be asked for your employment information, income, and an enumerated list of all assets and liabilities; how much money you’re willing to pay for the property, and how much of that you have in cash. (People call this a “down payment,” but technically it’s just “the difference between the purchase price and what you can convince a bank to lend you.”)

As part of convincing the seller that you can pay for the property, you’ll need to show proof that you can obtain financing if you don’t have the entire purchase price in cash. This is called a pre-approval letter, and is something you can accomplish in five minutes online by going to nearly any mortgage lender’s website. Technically, this lender does not matter, as you will have plenty of time to choose a better lender if your offer is accepted; sellers just want to see that a lender will lend you the remaining money, and that they’ve done their cursory due diligence on you.

For this purpose, we used Better.com, an online mortgage lender. We did not actually take a loan from Better, but they offered a completely online pre-approval process that allowed us to submit our first offer within 30 minutes of deciding to do so. This resulted in a hard credit check, which is unavoidable at this stage of the process. We were also able to change our pre-approval letter for each subsequent offer without additional hard credit pulls. (Full disclosure: many former colleagues have worked at Better. I asked them for advice on this after they no longer worked at Better.)

Your offer form will also contain a number of line items called contingencies. A contingency means what you think it means: it’s something that your offer is contingent on, meaning that your offer can only proceed if this other thing happens. Common contingencies are mortgage/financing contingencies (“can only move forward with this offer if our mortgage lender actually funds the mortgage”), inspection contingencies, or contingency on the sale of your existing property. Contingencies are not deal-breakers, but they always make your offer less competitive than other offers without contingencies.

Once you’ve filled out your offer form and you have a pre-approval letter handy, you’re ready to make an offer. You send it to your agent, who sends it through to the listing agent. You might hear back immediately, within a couple hours, or the next day, as the seller is likely considering your offer along with others. (While this happens, you should be emailing real-estate lawyers.)

In NYC, you’ll likely be competing against at least one all-cash offer. All this means is that someone else is willing to pay cash right now for the property, rather than waiting for a mortgage lender to do all the paperwork to get the funds ready. Three things you should know here:

All-cash offers are more appealing to sellers, as they offer a guaranteed faster sale

All-cash offers don’t technically mean that the prospective buyer doesn’t have a mortgage; they may have just taken a less-favourable loan in order to make a more favourable offer

Competing against an all-cash offer doesn’t always mean you won’t get the property; the sellers may still like you and your offer more.

At this stage, you’ll hear one of three things from your agent:

The seller took a higher offer
The seller countered your offer, with either a higher price or different contingencies
The seller took your offer

Of the five offers that we made, three were lost to higher, all-cash offers. One was countered a couple of times (which we lost) and one was countered twice (which we won).

Countering is very common; the seller, of course, wants to get the best deal possible, so you do have an opportunity to make your offer more attractive to try to win. This is often framed as a high-stakes, high-urgency decision; usually a phone call from your agent saying “we need an answer right now.”

To avoid making this a stressful negotiation, write down your absolute highest offer price on a piece of paper. Factor in everything you can - “will I be able to afford this comfortably,” “is this price reasonable given other comparable properties in the area,” etc. When your agent calls to ask if you want to increase your offer, look at the piece of paper and do not say a number bigger than what you’ve written down.

Ignore this advice if you want to; everybody has different negotiation tactics. All I know is that this strategy helped me justify my heat-of-the-moment decisions and reduced regret.

But, once you’ve done this dance of “viewing, offer, counter” enough times, you’ll eventually have an offer accepted (or stop trying). Congratulations! This is when things get complicated.

Post-Acceptance #

Having an offer accepted means that the seller intends to sign a contract of sale with you. This means five important things:

You’ll have to engage a real estate lawyer to help you at this point
You’ll have a lot of paperwork to read through, very quickly
You may still negotiate over the finer points of this contract
You’ll need to prepare to wire a deposit to the sellers, often called earnest money
Until the contract is signed, everything could still fall through

The seller is within their rights to entertain other offers during this time; they may even accept a different offer at this time if they have any reason to think that you aren’t moving forward quickly enough with the contract.

The contract of sale is a fairly long (15-20 page) document that details how the sale will happen; who pays what to whom and when, what exactly is being sold, when the transfer of ownership takes place, and so on. Most of these details are negotiable through your lawyer, but at this stage, the purchase price and contingencies will likely be non-negotiable.

The earnest money deposit is a sign to the seller that you’re committed to making this deal go through, once signed. It’s expected that you’ll wire them this money upon signing, and in New York City, the standard amount is a staggeringly high 10% of the purchase price. (In most other places in the U.S., earnest money is only 1-3% of the purchase price.) If you sign the contract of sale and then fail to follow through with the purchase, your earnest money is forfeit. Also, in many cases, the seller will sue you in civil court for nonperformance.

Inspections #

Another crucial thing to do in this time is an inspection of the property. In especially hot markets (mid-summer in NYC, for instance) some people are bold enough to waive inspections.

DO NOT SKIMP ON INSPECTION. You are about to make one of the largest financial transactions of your life. Find the most frustratingly meticulous, thorough, and paranoid inspector you can, and hire them as soon as you’ve had your offer accepted.

Your agent will suggest that you use their inspector - they “know a guy” and have “worked with him in the past.” This inspector will be friendly, reasonably-priced, quick, and will identify things that you would have also noticed. This inspector will also “miss” anything that might cause you to back out of the deal. No, I am not being paranoid - this happened to us and cost me a ridiculous amount of stress.

If you learn one thing from this blog post, it’s that you must choose your own home inspector and they must be frustratingly thorough. Piss people off. Worry that you’re going to be told that the inspection is taking too long. Go over time. It doesn’t matter; the alternative is that you buy a property that you’ll have to repair, and that repair may cost you tens or hundreds of thousands of dollars.

The inspector will give you a detailed report, containing photos and recommendations for repair work. This report can be used to inform amendments to the contract (i.e. “fix water damage”) that you demand before signing. If these are reasonable and backed by evidence, they will usually be agreed to. If the report contains anything to scare you away (i.e.: “compromised structural integrity” or “evidence of recurring flooding”) then you should go back to searching for properties. You will want to ask your agent if these are normal things to hear from an inspector; do not do this. They are not acting in your best interest, no matter how friendly they seem.

Okay, back to The Contract #

This period of contract review can last for roughly up to two weeks. Once you’ve decided that the inspection looks good, the contract is sound, and your lawyer agrees, you’ll sign it. This will probably happen online via DocuSign, and you’ll simultaneously wire the money to a layer’s escrow account. (This is normal; handling medium-sized amounts of money like this is part of a real estate lawyer’s job.)

This will likely be the first of many large wire transfers you need to make during this process, so get comfortable doing so with your bank. We opened a separate account just for this purpose with a bank that allows free domestic wires; doing so is not unusual, although may slightly confuse mortgage underwriters.

You will likely see many people send you emails that include warnings like:

This is true, wire transfer fraud is rampant in real estate transactions. The irony, however, is that some emails will ask you to call a number provided in the email to verify that the wire transfer numbers in the email are correct; providing no additional security. If you want to be extra-sure, call your lawyer on the phone using a number you’ve previously used, then get the wire details directly or get the phone number of the seller’s lawyer to call directly.

You’re “In Contract!” #

Congratulations, you’re now “in-contract” to purchase property! This has stronger guarantees than having your offer accepted, and there’s now money on the line. At this point, your primary goal until closing (i.e.: completing the transaction) is to secure the funds required to pay for the property.

For most people, this means shopping for a mortgage lender to use.

I use the word “shopping” here literally, as you are shopping for a product - a financial service, more specifically. Mortgage lenders are competing for your business, because they make most of their money up front. Even calling them lenders is not quite accurate; often, the company you get your mortgage from is a mortgage originator, who creates your loan and deals with you, the customer. Once you get a loan from them, they will usually sell that loan to another company you’ve never heard of. This is entirely normal, albeit quite confusing.

So, how do you compare different vendors when you’re buying a product for the first time and don’t really understand what they offer?

There are only two things you need to compare when considering a mortgage lender:

What mortgage rate do they offer?
Are they competent enough to underwrite (i.e.: perform their risk analysis, read your bank statements, communicate via email in English) your application without delaying your sale?

Some banks may have reputations for being particularly competent (or incompetent) during the underwriting process, but beyond that their “customer service” does not matter. They’re going to sell the loan to another company, and you will not deal with the bank after that happens.

Lenders offer the ability to buy “points” off of your rate - paying an additional amount up front (usually “1 point” = 1% of the purchase price) to get a reduced rate. This allows you to reduce your overall interest and monthly payment in a way that differs mathematically from just taking a smaller loan. Points can legitimately be useful; if you think interest rates are going to stay high for a while, buying a point can be cheaper than refinancing (i.e.: getting a new mortgage) later. Do your own math here. Also, note that you can write off some portion of any points you buy on your federal tax return.

We shopped around for mortgages for about a week before making a choice. We considered Better.com, Guaranteed Rate, and PNC Bank. They would have all given us very similar mortgages, and it came down to who we thought would get the job done faster. Rates were very similar for all three.

What’s a good mortgage rate? This is probably the most important question to ask. At the top of this article, I linked to an excellent New York Times calculator that helps decide if renting or buying is cheaper. But, independently, I often hear people saying things like “but a mortgage is good debt!”

Mortgages are insanely consumer-friendly financial products. The government wants people to buy property, so they let you write off a significant portion of your mortgage interest from your taxes. Mortgages in the U.S. are also phenomenal compared to other countries; in my native Canada, every mortgage is variable-rate and prepayment often incurs heavy penalties. When rates are low compared to stock market returns, taking on a mortgage is a no-brainer.

The traditional advice that “buying is always better than renting” does start to break down in a high-interest-rate environment, and especially in an expensive city like New York. At time of writing, interest rates are above 7%. That’s a higher rate than what (admittedly-conservative) Vanguard estimates can be expected in the market for 2025. When your mortgage rate is higher than your expected rate of return, the opportunity cost of owning property with a mortgage is harder to justify, and the debt - while favourable - still has very significant cost.

My own personal strategy for dealing with an expensive mortgage is to just aim to pay it off as quickly as possible to minimize the cost of servicing the loan. But again - do your own math here.

Once you’ve engaged a mortgage lender, they’ll ask for all of your documents - and this means all of your documents, like your unredacted bank statements for all accounts, prior years’ tax records, and much more. These will be used by underwriters (risk analysis employees) to decide if you’re likely to pay your mortgage. Make sure these are provided quickly and completely, and in a consistent format, to make the underwriters’ jobs as easy as possible. Even so, expect that the underwriters will not be able to read what seems like plain English.

I would not describe mortgage underwriters as “creative thinkers” by any stretch; they’re looking for keywords and specific numbers in each PDF, and will email you back with random questions that seem almost nonsensical. Just answer them quickly, clearly, and politely to get this stage over with as fast as possible.

While underwriting is happening, you’ll end up in a separate email thread with a high-touch salesperson with a title like “VP of Mortgage Lending,” who wants to lock in a rate with you. You can do so whenever you want, as mortgage rates fluctuate literally every day; depending on how long underwriting takes, you can delay this up until 4-5 days before closing if you believe interest rates will fall. Ultimately, though, this is a financial timing decision akin to picking stocks; it’s largely out of your control. Rates are usually “lockable” for 60 days, so you’ll want to lock your rate no longer than 60 days before you expect to actually close on the sale.

Your mortgage lender will also probably send an independent appraiser to the property, to make sure that it exists and that it is actually worth roughly what you’re paying for it. This will happen without your direct involvement, but you may get a copy of this appraiser’s report.

A very New York aside: the laws of the state of New York allow for buyers, sellers, and lenders of residential real estate to enter into something called a Consolidation Modification Extension Agreement, or CEMA (often pronounced “see-ma”). This sounds esoteric, but allows both the buyer and seller to save substantial amounts of money on taxes - often tens of thousands of dollars - as long as both the buyer and seller have mortgages. Functionally, you only pay mortgage recording tax on the difference between the seller’s mortgage balance and your new mortgage balance. (The seller also benefits, by paying less in real estate transfer taxes.) Note that CEMAs are not supported by all lenders, and they do add delays to the closing process; but the seller may be okay with this if they save a substantial amount too. Check with your agent and lawyer.

When you’re ready, you’ll probably have a quick call with your mortgage salesperson to verbally confirm that you want to lock in your rate (and optionally, how many points you want to buy). After that rate lock, you just need to wait for your lender to issue you a Closing Disclosure outlining the terms of the loan that you’re about to sign, followed by Clearance to Close, which indicates that the lender is ready to go.

Double check with your lawyer before committing to anything on the mortgage - not because your lawyer should be relied upon to prevent you from making any financial missteps, but because the timing of these commitments could be crucial.

The Closing Disclosure is a standard form that shows a lot of very crucial information, including:

Your exact loan amount
Your exact interest rate
Your expected monthly payment
If your loan has a prepayment penalty (it should not)
Your exact closing costs
The amount of cash (not literally bills, just in an account) that you need in order to close - which will include your down payment and all fees

You’ll notice that in your Closing Disclosure, there will be as many as 17 fees listed, with names like “Appraisal Fee” and “Title Search Fee.” You can shop for some of these if you want to, while the lender chooses the vendor for some other fees. Of particular interest might be the Title Insurance and Title Search fees, as they’re fairly high. I would recommend talking with your lawyer about finding a reputable company for these services, as the ones that we blindly used spelled my name wrong on the cover page of the mortgage documents they filed with the city. They had literally one job, and they did it incorrectly and then refused to fix it. (The inimitable Patrick McKenzie has an amazing post on the racket that society has agreed to call “the title insurance industry.”)

Closing #

Closing on a real estate transaction is maybe the only time you’ll meet most of the parties involved. Closing is, effectively, an in-person meeting of 5-6 parties (buyer, seller, lawyers, lenders, vendors, etc) in which dozens of documents are physically passed around for signing.

To be ready to close, you need all parties to agree on the date, and you’ll need each party to be prepared. If you have a good lawyer, they should handle most of this for you. You just need to show up to a lawyer’s office and sign a lot of stuff.

Closing is also when your down payment is required in full; you’ll wire it to the seller’s attorney’s escrow account, or via your lawyer.

Closing will feel like a barrage of confusing paperwork that - initially - you understand and recognize, and by the end, you sign without thinking too hard about it.

And then you get the keys.

But wait, what could go wrong? #

So many things can go wrong in this process. A couple things that happened to us, but which I left out of the many paragraphs above for expediency:

Between our loan’s underwriting and the conclusion of our CEMA, our lawyer helpfully explained that the unit we were buying had dozens of outstanding violations on file with the New York City Office of Housing Preservation and Development (HPD). These violations predated the building’s gut-renovation, and were all obsolete, but their presence on the metaphorical books at City Hall would have caused us real and substantial fines at some point in the future. Getting these corrected required asking the condo board to hire a third-party HPD violation removal company as a contingency of the sale.
Our closing was delayed by many hours while waiting for a crucial document to be sent via courier from a lawyer’s office. We (shockingly) went home before we were “fully closed” because this document had been seemingly lost in transit, and we were called back multiple weeks later to sign that document.

And what could go wrong afterwards? #

After we moved in:

Many problems not identified by our inspector - which later had obvious signs - cost us more than $30,000 in repairs, and took more than 6 months to fix. Get your own inspector and make sure your inspector is frustratingly thorough.
The gas-powered clothes dryer caught fire due to years of lint build-up in the back of the unit, requiring complete replacement (and a mild panic throughout the building). Luckily, there was no damage to anything but the dryer.
The building’s fire alarm control panel didn’t work. The condo board had never tested it. It worked so well, in fact, that it got the fire department to show up within 2 minutes after the smoke detector in the basement was accidentally set off by repair work (and the “cancel” button on the control panel didn’t work).
I started receiving a ton of physical mail spam related to the purchase. Only afterwards did I realize that any and all real estate purchases in New York City are publicly visible on ACRIS, including mortgage documentation and deeds containing plenty of personal information. (There is a way to avoid this public disclosure, by registering your ownership via an LLC; unfortunately, doing so in New York City makes you ineligible for certain tax credits. I preferred the tax credits.)
The building’s front staircase was found to be a great hiding spot for local vagrants. (I installed a motion-activated siren and camera, and they have not been back.)
Local graffiti artists took a liking to the front planter. Home Depot sells graffiti remover for $14.97.

Was it worth it? #

More than a year after buying property in New York City, I can confidently say: I don’t know if it was worth doing so. But that’s to be expected, as our reasons for buying were about long-term stability; having space to grow without worrying about rent hikes. So far, we’ve invested roughly 2% of the purchase price in a combination of repairs, improvements, and modernizations; and while we expect that number to be slightly lower next year, spending 1-2% of the property value per year would not be unexpected. The true value of most home purchases only becomes evident in year 3, 4, or 5.

But overall, buying should not be strictly a financial move. Buying property in any city - but especially New York - is a hedge against risk. I now have a reliable home base for us to use to start the rest of our lives in. And that’s worth all of the hassle. (Or, at least that’s what I tell myself.)

Special thanks to Lynn Root, Zameer Manji, Matt Levin, Alessia Bellisario, and Gillian Lau for reviewing drafts of this post.

Reviving an Analog Polysynth with an Arduino, Ghidra, and Python

2022-11-28T12:20:47-08:00

About a year ago, smack in the middle of the pandemic, I turned to the internet for some retail therapy. I’m a musician, so my usual retail therapist of choice is Reverb.com; a sort of fancy Craigslist or eBay just for musicians. Every day, I would see new listings pop up for instruments I might want, like a 6-string fan-fret bass, a nice electronic drum kit, or my “holy grail” - a super-rare, 22-year-old analog synthesizer: the Alesis Andromeda.

The Andromeda is a 16-voice polyphonic analogue synthesizer; basically a keyboard that sounds very lush, human, and organic, and can play a lot of notes at once. (That combination is expensive, for reasons.) Every function is controllable by a separate knob on the front panel, making it extremely interactive; every knob makes the sound change in some way.

As a kid, I remember playing one of these at my local music store in the early 2000s. It was the biggest, most expensive, and most intimidating thing in the shop. Here’s an extremely kitschy ‘90s demo video:

Synthesizers with similar capabilities cost over $9,000 today. The Andromeda was discontinued in 2010, and since then, prices for working units have shot up to around $6,000. There’s no way I could justify spending that kind of money.

Until one day, in October 2021, a listing popped up on Reverb. The seller was explicit: this unit was used, broken, and non-functioning.

I’m selling this synth for parts. It turns on but hangs on the splash screen. It’s missing side trim pieces, pitch/mod assembly, several knobs and several screws. The casing has nicks and scratches. The metal sides are a bit bent. The cable that connects the analog board to the main board has a cracked tensioner so maybe that’s part of the issue? I don’t have the tools or knowledge to fix this one so I’m passing it on. I can’t get it to do any tests so I can’t tell if anything is working. No returns on this one.

I was tempted. I looked up the service manual online, and found that there were many debugging steps one could take to try to fix problems like this. I’ve also had plenty of experience dealing with hardware. My computer hardware classes at university even dealt with the same CPU used by this synth - the Motorola Coldfire (which uses a variant of the M68k architecture) - and I had a small cache of tools that might be useful. Feeling bold, and desperately bored after 18 months of working from home, I sent an offer.

After two weeks of eager waiting, the synth arrived to my apartment in New York from Portland in a massive box. As described, it was in bad shape. The service manual provided a list of debugging functions that could be accessed by holding down one of eight buttons on the front panel during boot:

I had hoped that the previous seller just hadn’t discovered this information, but it turned out that these functions did not work at all. Time to dig deeper. No combination of buttons would do anything, nor would any other tips or tricks from the official manual.

RTFM (Read The Fancy Manual) #

Luckily, back in February of 2015, users on the popular forum GearSpace started a 15-page thread about how to debug a non-booting Andromeda. This thread included links to a confidential service manual that contained more debugging tips, intended for distribution only to Alesis-approved service centers. This service manual also included full schematics for the entire synth, showing how all of the components were logically connected together.

This service manual revealed a couple of important things: while this is an analogue synthesizer, meaning that sound is generated via non-digital, analogue circuitry, its brain is entirely digital. It uses a Coldfire CPU (an MCF5307 running at 90MHz), has 2MB of Flash memory to store its upgradeable operating system, 1MB of RAM for use at runtime, and 512kB of battery-backed RAM for persistent storage of user settings.

The great people in this thread also suggested a number of fixes to try:

Replacing the resonator on the LCD panel: the Andromeda will often fail to boot if the front panel controller can’t communicate with the LCD. The LCD panel uses a 3MHz ceramic resonator that can sometimes fail, and when the LCD’s clock is unstable, serial communication with it can fail. Luckily, a new crystal oscillator costs about $1, and that part is very easy to replace. (Unluckily, that didn’t seem to help.)
Adding a pull-up resistor to the SRAM chips: the Andromeda uses two external static RAM chips, providing a total of one megabyte of RAM. These external RAM chips are connected to the data and address busses of the CPU. During boot time, each SRAM chip may accidentally be enabled by default. If this happens, other devices on the bus (like the Flash memory chip that stores the operating system) will have their data overwritten by the data coming from each SRAM chip until they are disabled. (Think: too many people talking at once.)

This problem is called bus contention. The solution to this problem is to “tie” the Chip Enable pin of each chip to its “off” value (+3.3 volts) by using a resistor. This resistor is called a pull-up, as it pulls up the voltage on the pin when no other devices are controlling the line. By using a resistor, other devices are still able to pull the pin high or low; the resistor essentially sets a default value.

This resistor is a cheap and plentiful part - a 4.7kΩ resistor costs only pennies. Adding the resistor requires very careful soldering, though, as the pins on the SRAM chip are extremely close together. More on that later.
Replacing the Flash memory chip that stores the operating system. This logically made sense; Flash memory is reprogrammable, and it’s possible that the Flash may have been corrupted somehow, preventing the system from booting. Unfortunately, the Flash memory on the board is an extremely small part and is difficult to replace without advanced soldering skills or the proper equipment.
Replacing the entire CPU. I was incredulous about this; I’ve never heard of entire CPUs failing, but many people suggested that a whole-CPU replacement was necessary to get their synthesizers working again. This was a last-resort issue, as the CPU had 208 extremely fine pins that would be difficult to solder.

Breaking out the Soldering Iron #

At this point, I thought the next step was to start making changes to the hardware to try to fix one or more broken parts. I’ve been using a soldering iron on and off since I was about 10 years old, so I thought I had the dexterity, patience, and steady hand required to solder a single resistor across the two pins.

I did not.

In my effort to add a resistor between chip U12 and capacitor C50, I managed to short out multiple pins of U12. Then, when trying to fix my mistake by replacing the chip, I accidentally tore off at least 12 of the 44 solder pads that connect the chip to the circuit board.

If the synth wasn’t working before, it definitely wasn’t working now. I had to concede defeat and call in someone to help.

I started emailing around to local, NYC-area electronics repair shops - including the famed Rossmann Repair Group only blocks away, but none of them said they were able to fix a problem like this. After some more searching, I found a blog post from Edmonton-based VideogameRepairs.ca in which they had replaced the CPU and Flash chips of an Andromeda in the past, and sent them photos, asking if they would be able to fix my self-inflicted soldering damage. To my surprise, they said they’d be able to repair the board and replace its CPU, although had no means to test it.

Two months later, after buying a replacement CPU on eBay and shipping my main board from New York to Edmonton and back, I finally had a repaired board with a new SRAM chip and CPU. I had opted not to replace the Flash memory, as I didn’t know if it was good or bad, or how to go about reprogramming it. As part of the repair, a pull-up resistor was also added to the SRAM chips’ chip-select pin, just like the folks on Gearspace.com had suggested. (The repair job was amazing; huge thank you to Daniel Wynne at VideogameRepairs in Edmonton for such intricate rework - and for only $300 USD, too!)

…but the machine still refused to boot. Time to step things up a bit.

Breaking out the Debugger #

As I was waiting for my repaired circuit board to arrive in the mail, I pored through the service manual carefully. Surely there must have been some way to get more insight into what was going wrong during the boot process. The design engineers at Alesis included many test points in the synth where it was possible to hook up an oscilloscope or logic analyzer to ensure the system was behaving as expected.

Along with those test points, I noticed that the Coldfire CPU exposed a number of pins to a 26-pin header, conveniently labelled DEBUG PORT.

Some searching for some of the keywords on the circuit diagram - including DDATA and PST0 - led me to discover that this was a proprietary (but well documented) debugging interface specific to Coldfire processors. This is a form of debug interface known as Background Debug Mode, or BDM; which provides much of the functionality required by today’s software debuggers, like GDB or LLDB.

I spent a couple of days searching more to find any existing software and hardware that could connect to this debug port. Unfortunately, each solution was a pain, for different reasons:

BDM interfaces exist on eBay for less than $20, but they target a slightly different BDM protocol than the one used by this CPU.
Open-source projects like USBDM exist, but require custom hardware interfaces that don’t seem to be for sale anywhere, only seem to work on Windows and Linux, and require proprietary IDEs like CodeWarrior.
PEMicro sells debug probes that are pin-compatible with exactly this Coldfire debug interface, but the cheapest hardware options cost $300. (This would have probably worked, to be honest.)

Building a BDM Interface #

Given that it was fairly difficult or expensive to use existing tools, I looked to see if maybe I could build my own using common parts, like an Arduino. The CPU’s 484-page user manual goes into tons of detail about how its debug port works: it’s really just a serial interface where the debugger sends one bit of information at a time over a single wire, while toggling a clock line to indicate when information is ready to be read. The CPU can then send data back to the debugger, also one bit at a time, by putting either 0V or 3.3V on its output line when the debugger toggles the clock line.

One very nice part of this serial interface is that it’s completely asynchronous: there are no timing requirements on either the debugger or the CPU. If the Arduino is busy doing something else, or is just slow (as Arduinos are) then the CPU doesn’t care - it just waits for the next bit to come in, one at a time.

On top of this serial interface, the Coldfire encodes its debug data into 17-bit packets - one bit (called the “status”) to indicate if an error has occurred, and 16 bits to indicate the data in that packet:

Then, on top of this packet format, different commands can be sent to the CPU to ask it to do things - like read or write memory addresses, read or write processor registers, continue processor execution, or put the processor into step mode.

With these operations, there’s enough there to build a rudimentary debugger: we can halt the processor, move the program counter where we want, read registers and memory, and watch as the operating system tries to boot.

So, I wrote (and published!) a simple Python library called arduino-coldfire-bdm that encodes data and provides an interface to the Coldfire’s BDM port. A tiny Arduino program allows using pretty much any Arduino as a serial bridge between my laptop and the Andromeda’s CPU, so that we can send commands directly from Python to the board.

With that, we’re able to capture an execution trace to see what the processor is doing when it tries to boot, and it’s kinda neat: we can watch the program counter tick up!

And then the whole thing stops:

Turning to Ghidra #

Alright, now we’ve got an execution trace. We can watch the processor try to boot. And we can see that the processor gets a certain amount of the way through the process, and then halts.

To actually make sense of this without having to read assembly directly, I turned once again to Ghidra, the NSA’s open-source reverse engineering tool, which includes good support for the Coldfire architecture and allows us to decompile assembly code into C.

Unlike the last time that I wrote an extended blog post about using Ghidra, this experience was much simpler: the bootloader of the Andromeda is quite readable. I’ve annotated the boot code below, which also (pretty much) corresponds with the execution trace above:

Based on the execution trace, it seems the initial code runs for a bit - and then enters the loop at the bottom of this initial function, which copies the bootloader into RAM. Then, immediately after jumping to the code that was just copied into RAM, the processor halts immediately.

Well, that sounds suspicious. The code in RAM should be executable, but it seems that it’s either incorrect or didn’t get copied correctly. Let’s see if we can re-flash the bootloader firmware, to ensure that the code is correct.

Flashing the Flash in a Flash #

Flash memory seems like it’s all around us today; it’s what you find in SD cards, in your SSDs, and so on. Flash memory was novel when it first came out in the 1980s and 90s, as it was able to hold its contents without power, whereas other kinds of memory (like static RAM, or SRAM) required power to avoid having its contents fade away.

However, flash memory (or at least NAND flash, cheapest version) has a couple unexpected quirks that make it more complicated to use than regular RAM. A static RAM chip allows for reading and writing to any address just with a single read or write - functionally, by setting the address lines of the chip to the desired address, asserting one of the “write enable” or “output enable” signals, then either asserting the data on the data pins, or reading the data off of the data pins. Flash memory can be read the same way, but can only be written after sending special commands to the chip first.

Worse yet, bits in flash memory can only be switched from 1 to 0. To change a 0 to a 1, an entire block of memory (usually many kilobytes in size) must be “flashed” at once, setting all of that block’s bits to 1. After that erase operation is complete, individual bytes or words of memory can then be written one-at-a-time; but only by flipping 1 bits to 0.

All of this complexity means that if we want to reprogram the flash memory in the Andromeda, we’ll need to send a special sequence of commands to the CPU, rather than just asking it to write to memory. These commands are listed in the datasheet for each specific flash memory chip (although many chips share the same command sequences). The chip on the Andromeda main board responds to the commands in the following table from its datasheet:

What this somewhat hard-to-read table suggests is that to “program” (write) data to the flash memory, we need to send four individual writes to the memory chip: 0x555 = 0xAA, 0x2AA = 0x55, 0x555 = 0xA0, followed by a write directly to the address we want to place the data at. (This is presumably to prevent errant writes, as writing to flash memory would almost certainly result in corrupted data due to its inability to flip individual bits from 0 to 1.)

This is super slow, though. Sending four writes per word means that our writes actually go four times slower than they could, which - given how slow our custom BDM interface is - would result in us writing to the flash chip at a rate of only about 400 bytes per second. (It would take just over an hour to write just the bootloader at that rate.)

Luckily, this flash chip supports a feature its manufacturer calls “Unlock Bypass.” By sending a specific command to enter “Unlock Bypass” mode, writes can be performed by sending only two individual write commands instead of four. This doubles our writing speed, and allows us to upload the entire bootloader in only about half an hour.

To do so, though, we have to send commands in a very specific sequence:

# Send full-chip erase
write(0x555, 0xAA)
write(0x2AA, 0x55)
write(0x555, 0x80)
write(0x555, 0xAA)
write(0x2AA, 0x55)
write(0x555, 0x10)

# Wait the 30 seconds it takes the chip to actually erase itself:
time.sleep(30)

# Unlock the flash for writing:
write(0x555, 0xAA)
write(0x2AA, 0x55)
write(0x555, 0x20)

# Send one word at a time:
for i in range(0, len(data), 2):
    # Send the "write a word" command
    write(0x555, 0xA0)  # note: address here could be anything
    # Send the actual data
    write(i, (data[i] << 8) | data[i + 1])

# Exit "Unlock Bypass" mode
write(0x90, 0x90)
write(0x00, 0x00)

After running this command with a copy of the latest bootloader found on the internet, I was able to verify that the code had been uploaded correctly and the contents of the flash should allow it to boot. Then when trying to boot…

Still nothing. What happens if we try to run a quick RAM test, to ensure that the code being written to RAM is correct? Let’s use the Python debugging library I wrote to write data to RAM, then read it back over and over again:

Huh. It seems that if we write a value to the RAM, that value isn’t “sticky” - the RAM, random access memory, isn’t remembering what we’ve written. Something’s off. What if we print out the bits themselves, and show how they change over time?

That looks an awful lot like something is wrong with the RAM! The bits are fading away to 0 quickly; which implies that either the bits weren’t written correctly, or they were written but they’re being read incorrectly, or maybe the chip is slowly losing power.

Reading More Closely #

As it turns out, one thing that could cause SRAM chips to behave like this is that one of three pins could be at the wrong voltage:

the power pin, which is used to provide +3.3 volts
the ground pin, which is used to provide, well, ground
the “chip select” pin; which should be controlled by the CPU, but may have a pull-up resistor on it (as mentioned way at the top of this blog post).

With my oscilloscope, I was able to measure and find that:

the chip was getting +3.3v on the power pin
the chip’s ground pin was, indeed, ground
the “chip select” pin was high at +3.3v all the time.

That last one was a bit suspicious; we’d expect that when memory accesses were happening, the chip select pin should go low, to indicate that the chip should be selected, even if only for nanoseconds at a time.

I took a peek at the resistor that had been installed, and took a look at its colour bands, which indicate the value of the resistor. I plugged them into an online calculator, and found:

4.7Ω.

After all of this debugging, it turns out that the SRAM chip was properly connected and working; it was just never being enabled, because its chip-select pin was being held at +3.3V all the time. This resistor should have been something like 4.7kΩ, which would provide more resistance - enough resistance to allow the CPU to overcome this resistor when enabling the chip. I must have missed a single “k” when indicating the resistor value.

I pulled out a pair of snips, clipped the resistor off the chip, and:

It lives!

A Bunch of Knobs #

Now that the synth worked, there were a couple more problems to fix. Turning the knobs on the left side of the synth’s panel all caused the synth to “glitch out” - values on the screen would jump around wildly. This wasn’t a complete dealbreaker, but definitely made the synth hard to use. To figure out why this was happening, I had to go back to the schematics once more. The knobs that were glitchy seemed to share one thing in common: they were all connected to one chip: a CD4051 analog multiplexer, labeled U27.

The Andromeda might be controlled by a digital CPU, but a lot of it is surprisingly analog. In fact, the analog signal from each knob on the front panel is sent all the way through to the main circuit board via a neat method called an analog multiplexer. Each potentiometer (knob) is connected to one multiplexer chip, which is essentially a controllable digital switch. The main CPU drives seven signals: POT_MUX_SEL[0-3], along with POT_ADDR[0-2]. Only one of the four POT_MUX_SEL signals are active at once, while the three POT_ADDR lines encode 3 bits of data, and thus have 8 possible values; putting these together allows the CPU to select between 32 different potentiometers whose values can be sampled. The analog multiplexers that sit on these lines are, well, analog, which means that even though they’re controlled by a digital address bus, their output is completely analog, allowing for extremely high fidelity.

A quick aside; what’s the difference between an analog and a digital synthesizer? Alesis claims in the Andromeda manual:

An analog instrument uses electronic circuitry for sound creation and filtering that is not dependent on its computer chip. While the instrument’s processor provides many control and memory functions, the basic sound path is in the hardware that is separate from the microprocessor.

This is true; but the line is somewhat blurred in the Andromeda, as the analog circuitry is controlled by a digital processor, whose inputs and outputs are 16-bit numbers.

In the 1990s when this synthesizer was developed, most synthesizers used 8-bit resolution for their parameters; each knob only had 2⁸= 128 “steps,” which caused noticeable stair-stepping when turning knobs. (To think of this geometrically: if you turn a knob by less than about 3º, its value wouldn’t change due to this low resolution.) This led people to associate “digital” synthesis with “audible stair-steps when turning a knob.”

However, the designers of the Andromeda took a lot of care to keep all of the signals as analog as possible for as long as possible. As such, these front panel knobs send their analog values to the main board, where they’re then turned into digital values at the fairly high resolution of 16 bits, providing 65,536 possible steps. To put this in geometric terms again: if 8-bit synthesizers provide one step per 3º of rotation, the Andromeda provides one step per 0.0055º of rotation. That’s enough resolution to only be noticeable if you were to attach a 100-meter stick to each knob; at that scale, the far end of the stick would still move only about one centimeter per step. And with the number of parameters available on the synth, this level of detail means that there are approximately 2.4x10⁴⁶² different unique combinations of sounds that could be made - 1.5x10²³¹ times more than if the designers had used 8-bit parameter resolution.

So. If the Andromeda uses analog multiplexers to send its front panel values on to the main CPU, what could be causing those values to be glitchy? Well, let’s take a look at the analog value using an oscilloscope.

Wow! This is kind of neat - we can see the multiplexing happening visually. Each of the “towers” in this visual “skyline” are the values of different knobs; with knobs turned all the way up being higher on the graph. If this was working correctly, we’d expect to see solid, flat values all along the graph. Seeing spikes, slopes up or down, or noisy values all indicate that something is wrong here; and sampling any of those values will probably result in the CPU thinking that some knobs are moving, even when they’re not.

Of particular interest here are the solid yellow sections of the graph, which indicate that values are moving up and down so quickly that they look like noise. Let’s zoom in on one of those:

Oof, that’s pretty bad. The value of this signal seems to be oscillating, which will make the CPU think that we’re turning this knob back and forth all the time. It’s hard to tell why this might be happening: this could be a broken multiplexer chip, or it could be one or more other broken chips causing bad signals to go into a multiplexer chip.

To debug this, I went ahead an ordered a brand new multiplexer chip for the low price of $0.66 (plus shipping). Unlike my last soldering job, this chip was big enough that I could replace it myself. (One gotcha: the chips on this board weren’t just soldered in place, but were also glued in place from underneath, causing me to rip up a couple traces despite my best attempts to be careful.)

However, after installing this new chip, the problem wasn’t quite resolved: the new signal was even dirtier than before! The oscillation hadn’t stopped, the signal had more overall noise, and some values are now sloping down instead of remaining constant:

Let’s go back to the drawing board a bit. The schematic shows that this multiplexer is connected to four knobs on the front panel, as well as two other signals that I hadn’t tested, labeled PITCH and MOD.

These two signals come from the pitch and modulation wheels at the left side of the keyboard; they’re both potentiometers, but attached to large vertically-mounted wheels that can be used more easily during performance. Let’s trace the schematic a bit more to find out where those signals actually come from, and how they’re generated.

It looks like the “raw” signal from the pitch and modulation wheels goes through another chip - an operational amplifier, or op-amp - which then amplifies its value to the 5V output by the other potentiometers.

This is where I’d show you a screenshot of my oscilloscope to illustrate how high the voltage was, or what signal was coming off of the op-amps here. However, I don’t have that screenshot, because instead, I touched the op-amp while the synth was powered up, and gave myself a mild burn. It was red-hot.

The op-amp chip - a TL082 - was supplied by two voltage rails: one at -15V, and one at +15V, making the maximum voltage across the chip a huge 30 volts. (Huge is relative here; but for a synth with many components that operate at 3.3V, this is a problem.) An op-amp has no business getting this in a correctly-functioning circuit. My best guess is that this component failed on its own, or may have failed catastrophically when I accidentally plugged in the cable between the front panel and main board backwards at one point.

Either way, this chip had to go. Not only was it causing instability in other chips, it was also sinking a ton of power and could have been a fire hazard or a danger to other parts of the circuit. Rather than trying to desolder it this time, though, I just cut it off with a pair of snips.

And with that, even without a new op-amp in place, the glitches were gone! Soldering in a new op-amp was pretty simple, but the result was great: all knobs worked again, including the pitch wheel and ribbon controller. The moral of the story: don’t plug cables in backwards when working on delicate analog electronics.

The Conclusion #

Well, thirteen months and hundreds of dollars later, my impulse buy is now a fully-working, beautiful-sounding, ultra-rare synthesizer. I still have a couple things left to do - replace some missing knobs, get replacement side panels made, fix some dead LEDs, fix the mod wheel, and replace some yellowed and scratched keys - but the hard parts of the project are done.

What was the root cause of the failure? Well, despite the many twists and turns along the way, it certainly looks like the Andromeda’s CPU was just dead. A full CPU replacement was enough to kick it back into life. Second to that, the data in the flash ROM may have been bad, but it’s very hard to tell if that would have been a blocker. The other issues (broken SRAM, blown op-amp, bad multiplexer) were all caused by my own attempts to repair the Andromeda without having its CPU replaced.

What if I’ve got a broken A6 Andromeda?

Having gone through this ordeal, I would suggest trying the following repair tips in order:

If you’ve got an Arduino and are handy with software, open up your Andromeda and use my arduino-coldfire-bdm Python library to try to connect to your Andromeda’s CPU over its debug port. From there, you’ll be able to see if the CPU is working and will be able to re-flash the firmware without buying any expensive equipment.

If that fails, try the simple fixes listed above: replace the oscillator on the LCD (a $3 part that’s easy to solder) or try turning the power off and on quickly to see if a pull-up resistor across the SRAM chip would make a difference.

If that fails, buy a new MCF5307AI90B CPU online and replace it. It’s very difficult to do that without advanced soldering skills; you should send your Andromeda’s main board to Daniel Wynne at VideogameRepairs.ca. My repair bill came out to about $200 USD, but yours would likely be cheaper.

Whatever you do, don’t flip the cable that connects the front panel to the main board. This will blow an op-amp and maybe an analog multiplexer on the front panel - at the very least - and you’ll wind up with some non-functioning and glitchy knobs.

If you’ve got an Alesis A6 Andromeda that’s in need of repair, stuck at the splash screen, not booting up, glitching out, or otherwise in a bad state: feel free to get in touch, as I’m apparently a qualified Andromeda repair technician now.

Was it worth it? I definitely came out ahead, ignoring my own labour costs. As of 2022, Andromedas are selling on the second-hand market for somewhere between $3,000 and $5,000, according to Reverb:

But does that mean I’ll be selling this synth? We’ll see. It’ll take a while to decide if this one-of-a-kind synth, which I put so much time into restoring, is worth getting rid of. (Maybe I’ll make a VST out of it instead. 👀)

Thanks to Paul Lamere, Zameer Manji, Eric Evenchick, and Sudara for reviewing drafts of this post.

Patching an Embedded OS from 1996 with Ghidra

2021-04-23T03:47:59-07:00

For reasons I won’t get into, I’ve been working on a tricky reverse engineering puzzle recently: how to patch the operating system of a 26-year-old synthesizer. To be specific, the Kurzweil K2500, a sample-based synthesizer released in 1996.

As with many digital musical instruments, this synthesizer is really just a computer with some extra chips. In this case, it’s a computer based around the CPU that was popular at the time: the Motorola 68000, which was also famously used in the original Macintosh and the Sega Genesis. I want to patch the operating system of this beast to do all sorts of other things, most of which which I’ll leave to the imagination in this already-very-long post.

Finding the Operating System #

Modifying the operating system sounds great, but how do we get access to the code in the first place? Luckily, the K2500 operating systems are still provided by the manufacturer on what looks like an old FTP site. Downloading and unzipping the operating system gives us a .KOS file, which seems to be a custom format. Opening the file in Hex Fiend shows its bytes directly:

Unfortunately, nothing stands out here. There seems to be a human-readable 4-byte header at the top: SYS0, possibly followed by other header bytes, but it’s really hard to tell. Regardless, we already know that this operating system runs on a Motorola 68000 CPU. Let’s just try interpreting the data as a binary, and see how far we can get.

Enter Ghidra #

The operating system file we’re using is probably raw machine code: literally the instructions and data interpreted by the CPU itself. To make any sense of this whatsoever, we’re going to need to disassemble it, to turn it back into assembly code - and hopefully eventually decompile it back into C-style code.

To do this, let’s use a tool called Ghidra: an open-source reverse-engineering program built, maintained, and released by the United States National Security Agency. (Yes, that one. Really.) To start, let’s import the .KOS file directly into Ghidra and analyze it with the default settings, which will search for instructions.

Scrolling through the file shows that parts of the data have been analyzed by Ghidra as valid 68k instructions, but much of the file remains unanalyzed. Strangely, scrolling further through the files shows that Ghidra has correctly identified a number of human-readable strings in the file (great!) but the code seems to be referring to the strings offset by some amount, showing up as cut-off strings in Ghidra.

This is because we just loaded the entire .KOS file into Ghidra, ignoring the fact that it has a header and likely some other extra bytes. This is a pretty big problem. Any cross-references between functions will be inaccurate as we continue to reverse-engineer the data, sending us in the wrong direction nearly every time we try to follow a reference. We need to fix this first.

Reverse Engineering the Bootloader #

To reverse-engineer the .KOS file, it would be extremely useful to dig into the code that creates or consumes these files. We don’t have the creation code, but we do have access to the code that consumes these files: the bootloader for the synth itself, which is also still available online (edit: it appears they’ve taken this down, as of August 2022)! Let’s load it into Ghidra and make an assumption to make our lives easier: let’s guess that the first 8 bytes of the file are part of a header.

Where did that number come from? Well, I tried 0, +4, +8, +12, +16, and +20 byte offsets, and +8 disassembled the most correctly. Yes, this took a while. In hindsight, all of this also happens to work because the code in the file gets loaded into address 0x0 in memory. If it was loaded somewhere else, we’d have to figure out what that location is before we could effectively disassemble the code.

Just like before, let’s look for something human-readable first. Searching through the strings brings up a couple error strings that seem like they might get thrown by the code we care about:

Ghidra has identified what it calls XREFs here - cross-references, indicating that these strings are called from a certain place. Let’s follow this reference:

Aha! Now we’re getting somewhere. This looks an awful lot like a switch statement, decompiled by Ghidra here as an if tree. It seems like there are a series of error codes (0x100 through 0x105, then 0x200, 0x201, etc.) that each correspond with an error string that presumably gets printed on the screen. Let’s keep pulling on this thread. Using Ghidra’s “Find References” function, we end up at this function:

We’re getting closer! Ghidra’s done something great for us here: the decompiled code includes some variable names, automatically determined based on the strings that those variables point to. Given that we know that some of these variables are strings, we can take some guesses and use Ghidra’s “Rename” and “Retype” tools to make this function read a lot more clearly:

It looks like we have a two-stage process here: first, the new operating system file is checked by calling ActuallyCheckOrFlashTheOS?(0, ?). If the check passes, then the same function is called again with 1. It seems like that function probably reads the .KOS file format we’re investigating: let’s dig in there.

This doesn’t have any of the hints we saw before; there are no human-readable strings we can read, nor are there any function names. Instead, we can look at the structure of this function to understand what it does. Even without variable names, the structure of this code looks pretty similar to opening a file in C! It looks like we have an fopen-style call, followed by an fread, followed by an fread again in a while loop. Let’s add comments to make this clearer.

With the comments, it seems we now have a couple questions answered:

The .KOS file starts with a 4-byte header: SYS0
After the header, the file is divided into fixed-size chunks
Each chunk starts with a single 4-byte integer
An unknown number of bytes of actual data are read
Each chunk ends with a single 4-byte integer which seems to be some sort of checksum

However, there’s one new question we need to answer as well: why are certain constants and functions referenced at very high addresses in memory? (i.e.: 0x021317ac seems to contain the number of bytes in each chunk of the .KOS file, but the data in the ROM doesn’t reach that high!)

To better understand what’s at those high addresses, let’s turn to the service manual for this unit. (Huge shoutout to David Ryskalczyk for this idea!) Buried deep in a non-OCR’d PDF lies this useful tidbit of information, in a list of diagnostic procedures:

Thanks, service manual! It looks like 0x021317ac lands directly in the middle of this synth’s “volatile RAM” - the RAM used by the processor while it’s running.

It’s great that we had the service manual as a reference here. Without this information, we could have made an educated guess based on address prefixes that show up often in the code. If that didn’t help, we could have tried to find an electrical schematic for the unit and traced the address lines coming from the various chips to the CPU. This stuff gets complicated fast.

Let’s tell Ghidra to treat this as RAM in its “Memory Map” window, and then jump to near the address we’re interested in: 0x021317ac.

There’s no data here (as Ghidra knows this is RAM, which is randomly initialized when a computer starts) and it looks like the address in question is being read from ((R)) , but never written to ((W)). Maybe the writes are happening further up?

Aha! Ghidra shows us that a function is writing directly to the start of RAM. Given that we had to scroll up 6,060 bytes to find the first write, maybe this method copies a bunch of data into RAM. Let’s click through to see what’s there.

Uh, one sec. Let’s rename some stuff again.

Much better. It looks like we’re coping a bunch of data from ROM into RAM - specifically from 0x0001860a to 0x02130000. How much is a bunch? Well, 0x690 32-bit long words, which works out to 6,720 bytes. (This snippet of code also then zeros-out the next 0x1e1 32-bit long words, or 1,924 bytes.) Now that we know that the code is probably initialized with the same data as that part of the ROM, we can tell Ghidra to map that part of the ROM to this part of the RAM directly.

Now, going back to the part of RAM that we were reading, we can see that there are bytes present instead of question marks.

It looks like the value stored at 0x021317ac is 0x20000, which works out to 131,072 bytes! (We call this “128 kilobytes” because numbers are complicated.)

Great! So we’ve now figured out that each chunk of the .KOS file format is 128kB in size. That’s all we need to know to build a decoder for this format, remove the chunk headers, and end up with a file that will have correct relative offsets. This allows Ghidra to properly disassemble and decompile the file, and allows us to actually poke around at the operating system code. (I’ve gone ahead and done this already, and that .KOS file packer/unpacker is available on GitHub.)

Exploring the Operating System #

Alright, we’ve now got a “clean” dump of the operating system. Let’s open up that file in Ghidra, just like we tried to before. Let’s use Ghidra’s search function to find some interesting strings.

Let’s do a quick test to see if we can modify the operating system successfully. Ghidra lets you change instructions or data in a binary; so here, let’s change one of these strings to contain different text. (We’ll need to keep the length the same, to avoid moving other code around.)

After re-packing the operating system, let’s load it onto a floppy disk, install it on the real hardware, and…

What gives? Well, remember that “some sort of checksum” field we saw in the .KOS format earlier? It turns out, that’s actually checked by the hardware when installing a new OS. Luckily, Ghidra can help us here too -let’s go back to the bootloader and click through to FUN_0x021302b2, which looks like it computes some sort of checksum for us.

And again, after guessing at some variable names:

It looks like this checksum function is pretty simple: for each byte x, the checksum is equal to x + checksum shifted left by one bit, which is then bitwise OR’d with x + checksum shifted right by 31 bits. That’s a neat checksum I hadn’t seen before at all, and which advanced checksum reversing tools like the wonderful delsum can’t figure out either.

With this checksum, we can now change our .KOS file dumping script to properly re-pack new data with correct checksums. And once that’s done, let’s try flashing the OS again:

We can now flash a new operating system onto this hardware, modifying or extending its capabilities however we’d like. (That part, however, is left as an exercise for the reader.)

What We Learned #

Wow, that was a bit of an ordeal. I’d never used Ghidra before trying this project, and now I feel comfortable enough to use it for future absurdly-obscure retrocomputing reverse engineering. The techniques that seemed to work the best were:

Look for human-readable strings first.
Don’t be afraid to take a “side quest” (like reverse engineering a bootloader) to make your primary effort (patching an operating system) more successful.
Use Ghidra’s decompiler. It’s really amazing.
Look for structure in the decompiled code.
Rename functions, variables, and data types once you even have a guess at what they might do.
Look up documentation and resources for the system if they’re available.
Sometimes, you just have to manually step through dozens of possible examples to find what you’re looking for. (It gets easier the more you do it!)

(And thanks in part to this reverse engineering, the K2000 emulation in the MAME project now boots!)

Special thanks to David Ryskalczyk for unblocking my work half way through this project, and to David Ryskalczyk and Zameer Manji for reviewing drafts of this post.

Machine Learning for Drummers

2018-07-22T19:24:12-07:00

TL;DR: In this post, I build an app that classifies whether an audio sample is a kick drum, snare drum, or other drum sample with 87% accuracy using 🎉machine learning. 🎉

First and foremost, I’m a drummer. At my day job, I work on machine learning systems for recommending music to people at Spotify. But outside my 9-to-5, I’m a musician, and my journey through music started as a drummer. When I’m not drumming in my spare time, I’ll often be creating electronic music - with a lot of percussion in it, of course.

If you’re not familiar with electronic music production, many (if not most) modern electronic music uses drum samples rather than real, live recordings of drummers to provide the rhythm. These drum samples are often distributed commercially, as sample packs, or created by musicians and shared for free online. Often, though, these samples can be hard to use, as their labeling and classification leaves a lot to be desired:

Various companies have tried to tackle this problem by creating their own
proprietary formats for sample packs, such as Native Instruments’ Battery or Kontakt formats. Both use explicit metadata and allow users to browse samples by a variety of tags. However, these are all (usually) expensive software packages and require you to learn their workflows.

In an effort to better understand how to use machine learning techniques, I
decided to use machine learning to try to solve this fairly simple problem:

Is a given audio file a sample of a kick drum, snare drum, hi-hat, other percussion, or something else?

For example, which drums do these two samples sound like?

Humans have no trouble classifying these two sounds, as we’ve likely heard them tens of thousands of times before. The human brain is great at this kind of problem - computers, however, require some training.

In machine learning, this is often called a classification problem, because it takes some data and classifies (as in chooses a class for) it. You might think of this as a kind of automated sorting system (although I’m using the word “sorting” here to mean “sort into groups” rather than “to put in a specific ranking or order”).

For those unfamiliar with machine learning, you might say:

Why not just train the computer to learn what a kick drum is (and so on) by giving it a whole bunch of data?

This is mostly correct already! (Hooray, you’re a machine learning
engineer!)

The trouble comes from deciding what data means in the above sentence. We could:

Give the computer all of the data we have and let “machine learning” figure out what’s important and what’s not.
or give the computer all of the data we have, but do a bit of pre-processing first to hint at parts of the data that might be important, then have “machine learning” classify our samples for us.

Option 1 above is tricky, as our data comes in many different forms - long audio files, short audio files, different formats, different bit depths, sample rates, and so on, which would add a ton of complexity to our algorithm. Throwing all of this at a machine and asking it to make sense of it would require a lot of data for it to figure out what we humans already know.

Instead of making the computer do a ton of extra work, we can use option 2 as a middle ground: we can choose some things about the audio samples that we think might be relevant to the problem, and provide those things to a machine learning algorithm and have it do the math for us. These things are known as features.

(If this word is confusing, think of a feature just like a feature of, say, a TV - only instead of “42-inch screen” and “HDMI input”, our features might be “4.2 seconds long” and “maximum loudness 12dB”. The word means the same thing in both contexts.)

This process of figuring out what features we want to use is commonly known as feature extraction, which makes sense. Given our input data (audio files), let’s come up with a list of features that us, as humans, might find relevant to deciding if the file is a kick drum or a snare drum.

Overall file length is one simple feature - it’s easy to measure, and it’s possible that maybe a snare drum’s sound continues on for longer than a kick drum’s sound. (To prevent us from getting false positives here, let’s only count the length of time that the sound is not silent, or not quieter than -60dB, in the file.)
Overall loudness might sound like a great feature to use (as maybe kicks are louder than snares?) but most samples used in electronic music are normalized, meaning their loudness is adjusted to be consistent between files. Instead, we can use maximum loudness, minimum loudness, and loudness at middle (that is, loudness at the 50% mark through the file) to get a better idea for how the loudness changes over time. Drum hits should be loudest at the start of the sample, and should quickly taper off to silence.
Humans can tell the difference between kick drums and snare drums intuitively, and we do so by listening to the frequencies present in the sound. Kick drum samples have a lot more low-frequency content in them, as kick drums sound low and bassy due to their large diameter. To teach this to a machine learning algorithm, we can take the average loudness in several frequency ranges to tell the algorithm a little more about the timbre of the sound as humans might hear it. (To better represent how this changes over time, we might take this loudness-per-frequency-band feature at regular intervals throughout the sample - 0% length, 5%, 50%, and so on.)
Drums, while being very percussive instruments, can still be tuned to various pitches. To quantify this tuning and help our algorithm use it as input, we can take the fundamental frequency of the sample to help the algorithm distinguish between high drums and low drums.

These are just some of the many features that might be useful for solving our classification problem, but let’s start with these four and see how far we get.

As with all machine learning problems, to teach the machine to do something, you have to have some sort of training data. In this case, I’m going to use a handful of samples - roughly 20-30 from each instrument - from the tens of thousands of samples I have in my sample collection. When choosing these samples, I want to find:

samples that are representative of the different types of each instrument (e.g.: a few acoustic kick drums, some electronic kick drums, some beatboxed kicks, and so on)
samples from different sources that might have different biases that humans have a harder time picking up on (e.g.: are all samples from one sample pack the exact same length? what about the same fundamental frequency?)
samples of things that aren’t drums, so that the algorithm can learn when a sample falls into the “something else” bucket

I put together a list of these samples - 100 files, roughly 50 megabytes of sample data, in five separate folders: kick, snare, hat, percussion, and other. (Most of these samples are from freesound.org and are licensed under a Creative Commons Attribution License, so special thanks to waveplay, Seidhepriest, and quartertone for making their samples available for free!)

Now that we’ve got some data to train on, let’s write some code to perform the feature extraction mentioned earlier. These features aren’t super hard for us to calculate, but they’re also not super simple, so I’ve written some code below to extract them by using librosa, a wonderful Python library for audio analysis by the wonderful Brian McFee et al.

(All of the code in this blog post is available on Github - feel free to download it and try running it on your own machine if you’re interested.)

# from feature_extract.py
def features_for(file):
    # Load and trim the audio file to only the parts that aren't silent.
    audio, rate = load_and_trim(file)

    # Use poorly_estimate_fundamental to figure out what the rough
    # pitch is, along with the standard deviation - how much it varies.
    fundamental, f_stddev = poorly_estimate_fundamental(audio, rate)

    # Like an equalizer, find out how loud each "frequency band" is.
    # In this case, we're just splitting up the audio spectrum into
    # three very wide sections, low, mid, and high.
    low, mid, high = average_eq_bands(audio, 3)

    return {
        "duration":              librosa.get_duration(audio, rate),
        "start_loudness":        loudness_at(audio, 0),
        "mid_loudness":          loudness_at(audio, len(audio) / 2),
        "end_loudness":          loudness_at(audio, len(audio)),
        "fundamental_freq":      fundamental,
        "fundamental_deviation": f_stddev,
        "average_eq_low":        low,
        "average_eq_mid":        mid,
        "average_eq_high":       high,
    }

Now we’ve got a number of features extracted from each sample. We can save these as one large JSON file for use later by our machine learning algorithm. (We haven’t done any learning yet, just figured out the data that we want to learn with.)

You can think of these features as measurements we’re taking of the samples, without having to use the entire contents of the samples themselves. (And that’s very true in this case - we started with over 50 megabytes of samples, but the features themselves are only 150 kilobytes - that’s more than 300 times smaller!)

Now, we can take these features and give them to a machine learning
algorithm and have it learn from them. But hold on a sec - let’s get specific about which algorithm we’re talking about, and about what learning means in this context.

We’re going to use an algorithm called a decision tree in this post, which is a commonly used machine learning algorithm that doesn’t involve some of the buzzwords that you may have heard, like “neural networks,” “deep learning,” or “artificial intelligence.” A decision tree is a system that splits data into categories by learning thresholds for each feature in a recursive way. (If that’s confusing, don’t worry too much about it - but checkout R2D3’s amazing visual example of how decision trees work if you’re curious).

# from classifier.py

def train_and_evaluate_model():
    # First, let's read the features that we got from feature_extract.
    features, classes, sample_names, _, _ = read_data()

    # Let's use this percentage of the data to train, and the rest for
    # testing. Why not just train on all the data? That would result in
    # a model that is overfitted, or overly good at the data that it's
    # seen and does poorly with data that it hasn't seen.
    training_percentage = 0.75
    num_training_samples = int(len(features) * training_percentage)

    # Here we separate all of our features and classes into just the
    # ones we want to train on...
    train_features = features[:num_training_samples]
    train_classes = classes[:num_training_samples]

    # ...and we do the training, which creates our model!
    # vvv MACHINE LEARNING HAPPENS ON THIS LINE BELOW vvv
    model = DecisionTreeClassifier().fit(train_features, train_classes)
    # ^^^ MACHINE LEARNING HAPPENS ON THIS LINE ABOVE ^^^

In this case, classifier.py trains a model by creating a decision tree -
which is our model - whose weights are statistically determined by the data that we pass in. Again, the specifics aren’t necessary to understand for the rest of this post, but here’s what a similar model looks like when visualized:

Each new sample is passed into this tree, and the features that we provided are evaluated from the top down. For example, if a new sample has average_eq_2_10 ≤ -56.77, as the top block in the diagram shows, the decision tree would move to the left and then check its fundamental_5 feature. It would continue to do so until it reaches the bottom of the tree, or a “leaf” (ha, tree, leaf, get it?), where it would declare that the given sample is whatever class (or colour, in this diagram) that the leaf is.

Now, if we run classifier.py, we should see two lists: one of the training
accuracy (how well the model predicted the kind of sample for samples that it saw during training) and the test accuracy (now well the model predicted samples that it hadn’t seen before). Our training accuracy is 100%, which is not surprising - that data was used to create the model in the first place! And thanks to the features we selected, of the samples that the model hadn’t seen before, it got most guesses (~87%) correct. This is pretty good for a first try! (If you run this code on your own laptop, you should find that it takes roughly 12 seconds to train on the provided example data.)

Our 87% performance is decent, but that 13% error rate might be considered an example of what’s called overfitting - our model has been trained to be overly specific and be completely accurate for data that it’s seen before, but it has trouble when it sees data that’s new to it. In some sense, this is similar to how humans learn; when someone sees something new that they hadn’t seen in school or heard about before, they’re bound to make mistakes.

To avoid overfitting our model, we could take a number of approaches:

We could tune the algorithm’s parameters to try to force it to be less specific. This is a good place to start, especially with decision tree algorithms.
We could change our feature calculation to give more data to the algorithm, possibly introducing data that seems unintuitive to humans but would mathematically help solve our classification problem.
We could add more (and more varied) data so that the decision tree algorithm can create a more general tree, assuming that the existing set of data isn’t complete enough.

All three of these are valid approaches, and they’re also left up to the reader to investigate. We could also try other classification methods instead of using a decision tree, although surprisingly a naïve decision tree works pretty well for this problem.

So! We’ve built a machine learning classifier for drum samples. That’s kinda cool. There are a couple things to note about this system:

We do our machine learning training on features, rather than the audio data itself. This means that if we wanted to write a program to classify new, unknown samples against this model, it would first have to run the sample through the same logic that’s in feature_extract.py before it would be compatible with the model.
The current model is held in memory and never written out to disk. This is somewhat impractical, and in a real-world machine learning system, you’d likely save the model as a separate file that you could then pass around and use in different situations. (In many popular machine learning systems, models are trained routinely on up to terabytes of input data, rather than the 40 megabytes we used here, so storing the outputted model on disk is very necessary.)
We’re currently training this model on around 150 samples, which gives okay results and allows us to test this model training in seconds rather than minutes or hours. We could try training this on literally all of the samples available to us, which might give much better results. (In tests on my entire sample library, I was able to get up to 90% accuracy, which is pretty good for a simple decision tree.)
This model is a classifier, which means that while it can put samples into buckets of sorts (and even give probability of a sample being in a bucket) it can’t tell you how much, say, a snare sounds like a kick. If you want to place your sounds along a continuous scale rather than into buckets, you’ll need another kind of machine learning algorithm.
The algorithm used by scikit uses a random variable to choose how to create its decision tree. If this model was to be used in production, this random number generator should be seeded to allow for exactly reproducible results, which makes it easier to test, debug, and use the model.

If you’ve got your own sample library, or want to give this problem a try with samples you’ve found online, go for it! All of the code from this blog post is available here on Github, and you can pop in your own sample packs and have fun. Some other things to try:

Try using different features. librosa is very advanced and exposes many parameters about the audio it’s analyzing - choose as many features as you’d like and try to improve your accuracy!
Try tuning the algorithm used for machine learning. Scikit’s DecisionTreeClassifier has a lot of options that might improve accuracy by a lot. (If you end up trying to optimize this automatically, that’s called hyperparameter optimization and is its own field of study within machine learning.)
Try throwing new kinds of audio files at this system to see what breaks. My training and test datasets didn’t include any longer audio files, full songs, podcasts, or other audio files that you might find. See how those files work with this model and see if you can improve it to handle those cases better.

Special thanks to Jamie Wong, Zameer Manji, Isaac Ezer, and Mark Koh for their proofreading and feedback on this post.

Echo Dot vs. Chromecast Audio: An Evaluation

2017-06-25T08:30:26-07:00

I recently came into possession of both an Amazon Echo Dot and a Google Chromecast Audio, two devices that can both stream music to speakers. While the Echo Dot includes voice control features and does much more than just play music, both devices can stream Spotify, which is basically all I use them for. So which sounds better?

Disclaimer: As of time of posting, I am employed as a software engineer at Spotify, but this post does not reflect the views, opinions or position of my employer.

A number of forum posts around the web feature audiophiles claiming that one device clearly sounds better, even after enabling “Full Dynamic Range” (really just turning off a built-in compressor) on the Chromecast. As I had already biased myself by reading these posts, I decided to perform an objective test.

To test this, I connected the 3.5mm audio outputs from each device to a USB audio interface and streamed the same song via each device’s Spotify integration.

I took the resulting audio files and ran them through a spectrogram, followed by a spectral analyzer to get an estimate at the real-world frequency response of each device.

The Spectrograms #

Chromecast Audio #

Echo Dot #

The Frequency Responses #

Chromecast Audio #

Echo Dot #

The Conclusions #

Both devices performed very similarly in this simple test, but the Echo Dot seems to have a visible roll-off at around 16.5kHz, which is just barely within the audible range for most people. The Echo Dot also seemed to have imperceptibly worse stereo performance, with the left channel being about 0.25dB quieter than the right.

As a result of this test, I’m going to continue to use both the Echo Dot and Chromecast Audio, as that gives me the best of both worlds - convenient Spotify streaming with voice control, as well as casting arbitrary, high quality audio content from Google Cast-enabled devices. (It doesn’t hurt that the Echo Dot was free via Amazon’s June 2017 “Publish a Skill, Get an Echo Dot” promotion.) And to have both devices connected to a pair of powered studio monitors at the same time, I’m going to build a passive summing stereo mixer.

Further Research #

If I were to repeat this test, I’d take care to set my audio interface to 96kHz instead, as it’s possible that each device used a different sample rate, and that it’d be possible to see the difference when testing with a higher sample rate. It’s also possible that the quality difference comes from different source material - i.e.: the Chromecast Audio might use the 320kbps Ogg Vorbis stream from Spotify, while the Echo Dot might stream the 160kbps version. (I’d expect to see a more dramatic change in the frequency response in that case, though.)

Debugging an Empty Spam Email

2016-10-12T09:52:51-07:00

Despite the best efforts of modern spam filters, we all still receive spam once in a while. When I see a spam email pop up in my main inbox, I often wonder what magic the spammer has discovered that allowed them to bypass Gmail’s spam filtering. (Often times, this translates into me being much more suspicious of a spam email than usual, as it must be “more advanced” in some way to have landed in my inbox.)

Just this past week, I received one such email. It had no subject, no body, was addressed to no one, but was cc’d to myself and 29 other Peters.

(The “…” box provided by Gmail did not expand or collapse any content when clicked.)

A side note on the recipients - it looks like the other unlucky email addresses either contained the string “peter” in the local part or the domain part. Interestingly, some of the recipients’ addresses did not contain the string “peter,” but visiting their domains revealed that they belonged to people named Peter. I suspect some other metadata was involved in choosing this list.

The return path of the email was a free account at a Russian webmail provider, bk.ru. It’s hard to tell if the spammer owns this email address, or compromised its credentials and is using it to send out spam, but I’m guessing the latter is true.

This email confused me for a few reasons. Why would a spammer waste time sending out an empty email? What’s the point of a spam email that has no content? To dig deeper into what’s in this email (and it’s not empty, that’s for sure) we’re going to have to look at the raw email body itself. Gmail provides access to the raw message body with the “Show original” option in its drop-down menu:

Clicking on “Show original” will show a summary of the original message, as well as the original message body itself:

If you’re not familiar with raw email message bodies, they’re not unlike HTTP requests. They start with headers, one header per line (with header names separated from values by colons). The end of the headers is indicated by a double-newline (“\n\n”, or “\r\n\r\n” depending on character encoding). These headers contain everything from the sender’s email address to the recipients, to the servers in between that received and forwarded messages. Of particular importance, though, is the Content-Type header:

Content-Type: multipart/alternative; boundary="--ALT--FP504ntv5azlR7xUQktA3MxnXkgct5eW1475692425"

As in HTTP, this header denotes the MIME type of the content. This email, like most nowadays, is a multipart email (as defined by RFC 1341), which means it can contain multiple distinct parts. The multipart/alternative type is a particular kind of multipart message that specifies its parts are semantically equivalent, but presented in different formats. This is how most HTML emails work, to preserve backwards compatibility with email clients that can’t (or are configured not to) display HTML emails. From StackOverflow:

The last entry is the best/highest priority part, so you probably want to put the text/html part as the last subpart. Per RFC 1341.

By specifying both text and HTML parts, older email clients can display the text part that they know how to render, while newer clients can display the HTML.

So is that it? Does this mysterious empty email contain multiple parts that should be semantically equivalent (i.e.: contain the same message) but aren’t? Well, kind of. The first part of the email looks like this:

----ALT--FP504ntv5azlR7xUQktA3MxnXkgct5eW1475692425
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: base64

CgoKLS0gCldpbHNvbiBEYXZpZA==

Note that this is a plain-text content section, but with a content-transfer-encoding of base64. We can decode the base64 string with Python:

In [49]: base64.decodestring('CgoKLS0gCldpbHNvbiBEYXZpZA==')
Out[49]: '\n\n\n-- \nWilson David'

And as it turns out, the plain text part of the email contains only the email signature. This is roughly what we’re seeing in Gmail, so one hypothesis would be that Gmail is skipping the HTML part and only displaying the text/plain part. But what about the HTML part? What’s in there?

----ALT--FP504ntv5azlR7xUQktA3MxnXkgct5eW1475692425
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: base64

CjxIVE1MPjxCT0RZPjxicj48YnI+PGltZyBzcmM9ImRhdGE6aW1hZ2UvcG5nO2Jhc2U2NCxpVkJP...
...50,000 more bytes...

Hmm. So the email body actually contains 50kb of data, but Gmail’s only displaying a handful of bytes. Let’s run that base64-encoded string through our Python string decoder again:

In [56]: base64.decodestring(a)
Out[56]: '\n<HTML><BODY><br><br><img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAgAAAAGAC.../4DgZXPSWUmsAAAAASUVORK5CYII="><br>-- <br>Wilson David</BODY></HTML>\n'

Aha! So it’s HTML, and not very much HTML at that. In fact, there’s another base64-encoded string within the message, used to encode a data-URI for an embedded image. If we look at what Gmail renders in its DOM, we can actually see that it’s rendering the HTML part, but stripping out the src attribute from the image:

So, what’s this image? For the third time, let’s use Python to decode it:

In [38]: png = base64.decodestring(b[53:-40])
In [39]: png
Out[39]: '\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR

Well, it looks like a PNG. Many PNG decoders have been exploitable^{(1, 2)}, but I figure a fully-updated and patched Chrome should be impervious to any PNG exploits. After writing the decoded string to a file, I opened it in Chrome to find:

Success! As expected, a Togolese lawyer is offering me $9,580,000. So it looks like the email does contain some spam content - in particular, a PNG of some text. As most spam filters don’t bother doing deep inspection of image attachments (save for scanning for viruses), the text rendered in this particular image made its way through Gmail’s spam filter. However, Gmail’s failure to render the data-uri image resulted in an empty email, unexpectedly removing the spam in a different way.

A DeepDream Web Service for $5 a Month

2015-07-25T11:22:50-07:00

Google’s DeepDream neural net image processing library is a stunning application of advanced technology. If you haven’t heard of it, DeepDream uses an image recognition system in reverse - instead of trying to identify which objects are in a photo, it accentuates what it sees, producing extremely trippy visuals:

While DeepDream is cool, it’s also notoriously difficult to set up, as it was built by researchers with exceedingly complex software tools. Shortly after its launch, Matthew Ogle and myself decided to put together a web interface - http://deepdre.am to make the process simpler.

The site itself is pretty trivial - one page, with three options and one upload button. The fun part wasn’t the visual design or the user experience, but rather the scalable backend services that adapt the system to varying amounts of load without costing much more than a fancy coffee each month.

Really Tiny Microservices #

“Microservice” is a buzzword that’s used exceedingly often in today’s Hacker News. The concept is simple - to break apart disparate functions of your application as a whole into small services that can be updated, scaled and managed independently. In theory, this reduces the “blast radius” of a single system failure (be it a hardware failure, network instability, logical error, or any other problem) to at most a single service. In practice, microservices often become tightly intertwined with one another, preventing this goal from being achieved.)

When building deepdre.am, I decided to naïvely try out this concept and split out each logical component of the application into its own service. This resulted in 7 services, 5 of which are front-facing web services:

upload, which accepts image uploads, validates their format and adds them to the queue of images to be processed
progress, which provides progress updates (via long polling)
email, which allows users to be notified when their image is ready
abuse, which allows users to report images that violate the TOS
monitor, which provides an administrative status dashboard
process, which:
- removes images and metadata from a queue
- runs the DeepDream algorithm on each image
- emails the uploader to notify them that their image is done
scale, which observes the queue and spins up cheap Amazon EC2 spot instances as necessary to process images

All of these microservices are implemented in Go, Google’s light, simple, and highly concurrent programming language. Using Go ensures some modicum of type safety, allowing me to catch trivial errors at compile time. Go is also trivial to deploy (binaries are generally statically linked and dependency-free) and extremely lightweight.

Each of the services listed above consumes around 4MB of memory when serving HTTP requests - less than half the memory used by Ruby just to load the interpreter, not to mention loading Rails or Sinatra. When running on a bare-bones web server to keep costs down, this tiny memory footprint makes an extremely noticeable difference in website performance. On average, both static assets and API requests are served within 30 milliseconds - unheard of when using a large framework like Rails.

While microservices are generally supposed to be fairly isolated, each with their own data stores and infrastructure, this project is small enough that I opted to share data stores. In this case, Redis is used for ephemeral data (queueing tasks to be processed¹, progress updates, and notifications between processes) while MySQL² is used for more permanent data. This approach allowed me to keep many of the advantages of microservices - including independent scalability, quick development, and tiny codebases - without using too many resources by spinning up multiple databases.

Hey Amazon, can you spot me $5? #

As with all of my side projects, my primary goal when building deepdre.am was not just to create a service, but to do so at absolutely minimal cost. My target is to spend under $10 each month on everything - instance hosting, amortized domain costs, S3 usage, and bandwidth. (I’ve found DigitalOcean³ to be powerful enough to support tens of thousands of monthly active users for $5/month, but there are countless hosts at similar price points.)

Hosting Redis, MySQL, and a handful of Go-based microservices on a $5 cloud host is trivial and speedy enough to support thousands of hits per minute. DeepDream, however, is a computationally taxing algorithm that requires a lot of processing power, and - ideally - a GPU to execute on.

This presented me with a hard problem. How do you provide quick response times without paying $468/month for a g2.2xlarge instance on Amazon EC2?

The answer turned out to be simple: use a combination of a task queue and EC2 spot instances. When load on the system is low, the $5 VPS can slowly process images, saving money. When load on the system grows, however, spot instances are used to speed things up:

A g2.2xlarge spot instance can process approximately one image per second, and costs on the ballpark of $0.10/hour. However, as spot instances can be terminated at any time, applications must be termination-aware. (Amazon has a new “Spot Instance Termination Notice” feature that can come in handy here, but processes can also simply respond quickly to SIGTERM signals to clean up before an instance is terminated⁴.)
As EC2 instances are billed rounding up to the hour, committing to spawning an instance will cost at least $0.10⁵ and process at most 3600 images per hour.
To maximize value, an instance should be spawned when the number of images waiting in the queue to be processed approaches 3600⁶.
If an instance is spawned but is no longer necessary (due to the queue being emptied quickly) then it should remain running until its age reaches 59 minutes, as Amazon bills for instances by the hour and rounds up.⁵

To spawn spot instances, I used Mitchell Hashimoto’s goamz package, which is a thin Go wrapper around Amazon’s AWS APIs. A combination of a custom private AMI (that includes all of the required software) and cloud-init user data (to force a code update from source control) allows an instance to boot, connect to its data stores, and begin processing tasks in less than 120 seconds.

Practically, this combination of queueing and spot instances keeps expenses for deepdre.am extremely low - somewhere between $5 and $10 per month, depending on system load. (Amazon’s Billing Alerts also allow me to keep an eye on my usage, to avoid unexpected spending - and to respond to spikes in traffic as necessary.)

Don’t forget about taxes #

The title of this post is mostly true - when load on the system is low, costs are around $5 each month. However, small additional costs do add up:

The domain, deepdre.am, costs $73/year, or $6/month. (Special thanks to Matthew Ogle for buying an expensive Armenian domain name on a whim in response to a tweet.
I lazily used Amazon S3 to store and serve images, which costs $0.0007 per image in both storage and transfer fees. (That’s approximately 1,428 images per dollar, which can go away once the S3 dependency is removed altogether.)
Assuming that the site waxes and wanes in popularity in a given month, it’s reasonable to expect about 20 hours of g2.2xlarge spot instance usage, which costs approximately $2.

Give it a try! #

While the code’s not (yet) open source, the site is currently up and running - give it a try at deepdre.am and transform your images, if for no other reason than to stress test the system!

Special thanks to Malcolm Ocean for reviewing this post.

As one does, I built my own Redis-backed queueing library with Go bindings that came in very handy here. Many better alternatives exist - I would recommend using something more well supported like Github’s Resque or Salvatore Sanfilippo’s disque. ↩
Yup, MySQL. I had a Puppet manifest laying around for a well-configured MySQL instance, and saved a grand total of 30 minutes by bolting together existing components rather than switching to Postgres. Such is the nature of quick hack projects. ↩
Yes, this link does contain a DigitalOcean referral code. You caught me. ↩
Terminating an instance seems to result in the normal ACPI shutdown process, which sends SIGTERM to all processes, allowing processes to finish their tasks or put them back into queues if necessary. This is a terrible practice, as instances could suffer non-graceful failures at any time, and should not be relied upon to put their tasks back into queues - but for an application as frivolous and simple as deepdre.am, the 2-second delay between receiving SIGTERM and losing power to an instance seems to allow for enough cleanup. ↩
Instances that are terminated by Amazon before their first hour has elapsed are free, so it’s also possible that an instance could cost $0.00. This means that it’s advantageous to wait until the 59 minute mark before terminating any spot instance, to increase the likelihood that Amazon will terminate the instance for you, making the entire hour free. ↩
This is a knob that can be tweaked - the two extremes are “spend lots of money and have images process very quickly” and “spend very little money and use a spot instance only when the queue becomes huge.” ↩

The Cost of Waterloo Software Engineering

2014-09-08T20:00:25-07:00

This past June, I graduated from the University of Waterloo’s Software Engineering program. After 5 long and difficult years, I’m extremely proud to say that I’m a Waterloo grad, and very proud of my accomplishments and experiences at the school. Somewhat surprisingly, myself and most of my classmates were able to graduate from a top-tier engineering school with zero debt. (I know this might sound like a sales pitch - stick with me here.)

Waterloo is home to the world’s largest cooperative education programs — meaning that every engineering student is required to take at least 5 internships over the course of their degree. Most take six. This lengthens the duration of the course to five years, and forces us into odd schedules where we alternate between four months of work and four months of school. We get no summer breaks.

One of the most important parts of Waterloo’s co-op program is that the school requires each placement be paid. Without meeting certain minimum requirements for compensation, a student can’t claim academic credit for their internship, and without five internships, they can’t graduate. This results in Waterloo co-op students being able to pay their tuition in full (hopefully) each semester. In disciplines like Software Engineering, where demand is at an all-time high and many students are skilled enough to hold their own at Silicon Valley tech giants, many students end up negotiating for higher salaries at their internships.

To help visualize this financial situation and aid younger Software Engineering students in planning their future, I decided to create a little tool: the SE Calculator.

This simple, free, open-source in-browser tool allows you to calculate and visualize how much money you’ll earn or owe at the end of a five-year Waterloo Software Engineering degree. While it’s not rigorous (and should not be used as a financial advisor) it has helped me visualize how much money I’ve earned and spent during my academic career.

By default, the site assumes you’re a student that pays average Software Engineering tuition and average Software Engineering fees, earns one scholarship in your first year, and spends each internship working at software companies in Waterloo. The calculator includes a bunch of preset values, taken from personal experience and that of classmates, to simulate what you might make and spend when working in different regions or industries. (For example, the San Francisco Bay Area preset has a ridiculously high housing cost, but a similarly high salary.)

The site also stores your data in the URL string, because – well, simply – I wanted to store the data somewhere quick and easy. Bookmark the page once you’ve plugged in some values and store multiple datasets in your bookmarks bar.

If you’re a Software Engineering student (or will soon be one), I hope you find the tool useful to you. If you’re a student in some other Waterloo Engineering discipline, or in Computer Science, hopefully most of the fields still apply to you and you might get some utility out of the tool as well.

If you’re interested in customizing the tool - to add new presets, to adapt it to your own academic situation, or just to fix bugs - please feel free to fork it on GitHub. The tool runs almost entirely in-browser with Angular.js and uses Gulp as a build tool. Happy hacking!

The Holiday Party Hack

2013-12-14T10:45:14-08:00

For this year’s holiday party at The Working Group, I helped build something special to spice up the party - a live, music-synced slideshow of the evening, powered by a nearby photo booth. Take a photo with your friends and loved ones, then see it show up on the big screen seconds later.

The Hardware #

To take the photos, we mounted a Canon Rebel T2i with an [Eye-Fi card](www.eye.fi) on a tripod in front of a great backdrop. A generous serving of props was provided for people to play with, and the room was well lit.

Also significant - the photo booth had a glass wall on one side, making it easy for partygoers to notice the fun to be had inside, while still allowing for a little bit of separation from the cacophony outside.

Finally, to allow partygoers to trigger their photos themselves without needing someone behind the camera, Brian Gilham and I built a huge, industrial-looking remote with a massive green button. In reality, we just wrapped the camera’s tiny remote in a larger enclosure and physically lined up the remote’s button with the plunger of a larger button.

The Eye-Fi card in the camera synced its photos automatically with a nearby Macbook Pro.

The Software #

To get the photos on the screen, a ridiculous number of steps were used. Hazel, running on the Macbook Pro, copied the photos from the Eye-Fi card’s folder into a dedicated folder in Dropbox. A Node.js app running on a Rackspace cloud server connected to the Dropbox API and received real-time updates whenever new photos were placed in the Dropbox folder. This app downloaded the high-res photos from Dropbox, used Imagemagick to crop, scale, and rotate them appropriately, and streamed them down to all connected browsers.

A Macbook Pro connected to the projector ran a client-side JavaScript app and received real-time photo updates via Socket.io. This app also used the Web Audio API to run BeatDetektor, an open source JS beat detection library, on the audio received by the laptop’s microphone. Finally, Scott Schiller’s 2003-era snowstorm.js library provided the wonderfully tacky snow falling in-browser.

This complicated chain of events made it super simple to build the software - by piecing together pre-made components like Dropbox, Hazel, and BeatDetektor, most of the work was already done. Some extra functionality even came for free - for example, by sharing the Dropbox folder with select people at the party, candid photos could be uploaded from people’s phones directly to the projector screen.

The Results #

By the end of the night, more than 350 photos - 1.5GB of data - had been processed by the hack and made it to the big screen. At one point, so many photos were taken in quick succession that the server load spiked to 38 and crashed hard - bringing with it forever.fm, my “infinite” radio station. Despite the small technical hiccups, the hack turned out wonderfully and was a huge success.

–

Huge thanks go out to Chris Mudiappahpillai, Brian Gilham, Derek Watson and Shiera Aryev and many more for making the hack - and the evening - a resounding success.

The Architecture of an Infinite Stream of Music

2013-11-05T09:22:13-08:00

Nearly a year ago, I launched forever.fm - a free online radio station that seamlessly beat matches its songs together into a never-ending stream. At launch, it was hugely popular - with hundreds of thousands of people tuning in. In the months since its initial spike of popularity, I’ve had a chance to revisit the app and rebuild it from the ground up for increased stability and quality.

(Grab the free iOS and Android apps to listen to forever.fm on the go.)

Initially, Forever.fm was a single-process Python app, written with the same framework I had built for my other popular web app, The Wub Machine. While this worked as a proof of concept, there were a number of issues with this model.

Single monolithic apps are very difficult to scale. In my case, Forever.fm’s monolithic Python process had to service web requests and generate the audio to send to its listeners. This task is what’s known as a “soft real-time” task - in which any delays or missed deadlines cause noticeable degradation of experience to the user. As the usage of the app grew, it became difficult to balance the high load generated by different parts of the app in a single process. Sharding was not an option, as Forever is built around a single radio stream - only one of which should exist at the same time. Unlike a typical CRUD app, I couldn’t just deploy the same app to multiple servers and point them at at the same database.
Single monolithic apps are very difficult to update. Any modifications to the code base of Forever required a complete restart of the server. (In my initial iteration and blog post, I detailed a method for reloading Python modules without stopping the app - but ran into so many stability issues with this method that I had to abandon it altogether.) As with any v1 app, Forever had a constant stream of updates and fixes. Restarting the app every time a bug fix had to be made - thereby stopping the stream of music - was ridiculous.
Memory usage and CPU profiling were both difficult problems to solve with a one-process app. Although Python offers a number of included profiling tools, none of them are made to be used in a production environment - which is often the environment in which these problems occur. Tracking down which aspect of the app is eating up gigabytes of memory is critical.

To solve all of these problems in one go, I decided to re-architect Forever.fm as a streaming service-oriented architecture with a custom queueing library called pressure.

Usually, service oriented architectures are strongly request/response based, with components briefly talking with each other in short bursts. Forever does make use of this paradigm, but its central data structure is an unbounded stream of MP3 packets. As such, a lot of the app’s architecture is structured around pipelines of data of different formats. To make these pipelines reliable and fast when working with large amounts of streaming data, I constructed my own Redis-based bounded queue protocol that currently has bindings in Python and C. It also creates really nice d3 graphs of the running system:

Forever.fm is broken down into multiple services that act on these pipelines of data:

The brain picks tracks from a traditional relational database, orders them by approximating the Traveling Salesman Problem on a graph of tracks and their similarities, and pushes them into a bounded queue.
The mixer reads tracks from this queue in order, analyzes the tracks and calculates the best-sounding overlaps between each track and the next. This is essentially the “listening” step. These calculations also go into a bounded queue.
The renderer reads calculations from this queue and actually renders the MP3 files into one stream, performing time stretching and volume compression as required. This step pushes MP3 frames, each roughly 23ms long, into another bounded queue.
The mp3_server reads mp3 frames from this queue at a precise rate (38.28125 frames per second, for 44.1kHz audio) and sends them to each listener in turn over HTTP. (It also keeps track of who’s listening to help produce a detailed report of how many people heard each song.) There are a number of other services that come together to make Forever.fm work, including the excitingly-named web_server, info_server, social_server, manager, tweeter, relay and playcounter. Each of these services consists of less than 1000 lines of code, and some of them are written in vastly different languages. At the moment, they all run on the same machine - but that could easily change without downtime and without dropping the music. Each service has a different pid and memory space, making it easy to see which task is using up resources.

To help achieve an unbroken stream of music and more easily satisfy the soft real-time requirements of the app, pressure queues have two very important properties: bounds and buffers.

Each pressure queue is bounded - meaning that a producer cannot push data into a full queue, and may choose to block or poll when this situation occurs. Forever uses this property to lazily compute data as required, reducing CPU and memory usage significantly. Each data pipeline necessarily has one sink - one node that consumes data but does not produce data - which is used to limit the data processing rate. By adjusting the rate of data consumption at this sink node, the rate (and amount of work required) of the entire processing chain can be controlled extremely simply. Furthermore, in Forever, if no users are listening to a radio stream, the sink can stop consuming data from its queue - implicitly stopping all of the backend processing and reducing the CPU load to zero. By blocking on IO, we let the OS schedule all of our work for us - and I trust the OS’s scheduler to do a much better job than Python’s.

In addition, each queue has a buffer of a set size that is kept in reliable out-of-process storage - Redis, in this case. If a process were to crash for any reason, the buffer in the queueing system would allow the next process to continue processing data for some amount of time before exhausting the queue. With current parameters, nearly all of the services in Forever could fail for up to 5 minutes without causing an audio interruption. These buffers allow each component to be independently stopped, started, upgraded or debugged in production without interrupting service. (This does lead to some high-pressure bug hunting sessions where I’ll set a timer before launching GDB.)

Most of the services involved in this pipeline are backend processors of data - not front-facing web servers. However, I’ve applied the same service-oriented philosophy to the frontend of the site, creating separate servers for each general type of data served by the app. In front of all of these web servers sits nginx, being used as a fast, flexible proxy server with the ability to serve static files. HAProxy was considered, but has not yet been implemented - as nginx has all of the features needed, including live configuration reloads.

With this combination of multiple specialized processes and a reliable queuing system, Forever has enjoyed very high availability since the new architecture was deployed. I’ve personally found it indispensable to be able to iterate quickly on a live audio stream - often in production. The ability to make impactful changes on a real-time system in minutes is incredible - and although somewhat reckless at times, can be an amazing productivity boon to a tiny startup.

Partially thanks to this new architecture, I’ve also built free iOS and Android clients for forever.fm. Download them and listen to infinite radio on the go!