Improving Where's it Up by slowing it down
One of the common complaints with Where's it Up has been that the results are only available for a few minutes. A link you get now will only work for a little while before being useless. While this fits in with its design goal of telling you if something is up right now, it tends to be less useful in real life. As an example, I run a work URL through the system, see some anomalous results, and send the link to a co-worker. That co-worker happens to be in the middle of rocking out to a killer drum solo on Pandora, so he doesn't get to it for a minute. By the time he does, the link is broken and he calls me bad words.
As an added bonus, our site doesn't handle non-existent records well.
The reason this happens is that Where's it Up stores results in Memcache. Memcache is super fast (which is incredibly helpful when you get a huge traffic spike), but is also a temporary data store. While I could have extended the timeout on data (and hoped it lasted that long under load), I still wouldn't have been in a position to say you could email that link to someone, and it would work tomorrow, or next week when you do a post-mortem on the issue.
To solve the issue I decided to start storing the results in MongoDB.
Despite this being my first production use of a NoSQL solution, I felt like NoSQL was a good choice. The data we're generating isn't without form, but the form varies wildly. There could be 1-50 steps in each trace route, and each step could have taken different routes on the various packets. HTTP requests can have 0-5 redirects. The result of the dig command can yield wildly varying information. While it certainly would have been possible for me to normalize all of this across several tables, the query to obtain results would have been hairy, and probably pretty slow. I could have simply treated a relational database as a key-value store, and shoved everything in a TEXT field, but I'd never be able to go back and query based on anything other than the key. So, for example, if I later wanted to go back and find out how many trace routes we've done through some network I'd be left with a full text search. Generating average connection times for HTTP requests would have been even hairier. Going with NoSQL lets me shove the data into a document in the format my application is already using and query that data later without losing its structure.
I've gone with MongoDB over other options simply because I have a network of friends and colleagues who have enough experience with it to answer my questions.
It took a few iterations to convert the system over from Memcache to MongoDB: at first I was just falling back on MongoDB when memcache failed. That added a lot of complexity to the codebase, and didn't feel like it was adding much. Next I got the system working using the same storage as the original memcache implementation (one master document outlining the details of the request, and a series of child documents providing results to a given portion, e.g. a traceroute from milan to example.com). Then finally (at the behest of my MongoDB expert friends) to a single cohesive document that contains all the information from the request. It looks like this: Sample JSON
I've got a few more clean up items on my list for this project, but I'm really happy with this progress. I also feel like this conversion will make it much easier for us to offer API access to this data (another frequently requested feature).