Where's it Up?
WonderProxy is proud to present Where's it Up?, a new tool to help system administrators determine whether or not their site is up from around the world. The tool accepts a URL, and allows you to select global locations. It then attempts to connect to the given server and issue a HEAD request from the global locations you selected, and reports the results.
- Servers around the world
- Local DNS resolution on each server
- Reports IP and timing information
- Follows a reasonable number of redirects
- PHP with pecl_http
How it Works
Building a reasonably robust application was trivial, thanks to being able to leverage the great technology built by others. When a request is received, an intermediate script does some basic checking, then passes off a number of jobs to Gearman (one per location requested), the job ID is handed back to the user and results will be displayed via ajax. In the background Supervisord keeps five workers running all the time, they basically spin waiting for new work. When a job arrives they first check to see if there's a recently cached result in memcached. Failing that an SSH tunnel is used to connect to the requested server, the request is issued with curl, and the results parsed. This result information is pushed into memcached.
On the client side, requests are made back to the server including the job id, and already obtained results. New results are handed off as they become available.
The system also uses memcached to throttle incoming requests from a given IP based on the example given by Simon Willison. In this case I'm actually using Swatch Beat time to manage the count; the edge case over midnight is quite easy.
Gearman is helpful in this case for three reasons: it provides great separation between the client requesting the work, and the workers actually completing it. It also trivially allows for multiple workers to be used in the resolution of requests. Finally, it provides a very effective work queue; when the site is under heavy load it will just take a bit longer to resolve requests. A small side benefit is the ability to pull down all the workers, update the worker script, then restart them, all without affecting the jobs in queue (except for the momentary delay).
Memcached is helpful as it's a quick place to store the values that handles expiry for me automagically. In this particular case, as there's a single server involved, something like APC would provider a thinner solution, however I have a lot of experience building apps with Memcached as a storage engine, and not much using APC.
Supervisord is helpful as it keeps the workers running, starts them up after they crash (hasn't happened yet) and on machine boot. The dAmonize your PHP post by Sean Coates was incredibly helpful in getting this going.
SSH is helpful as it allows us to display accurate timing information from that machine to the requested location. We could use our own proxies directly, but then we're dealing with the time to connect from Washington - Sydney - Your website. That extra bit of math in the middle is only moderately tricky, but more importantly: inconsistent. The tunnel lets us execute the commands directly on the remote server.
Why we built it
We've used tools in the past that allow us to quickly check if a site is up or down. They're nifty, but these days many sites are using anycast DNS to publish different IPs globally, with multiple servers and multiple data centers. Simple tools simply aren't able to adequately express whether or not a site is up, or down. By leveraging the network of servers we already had, building a more advanced and complete tool was easy.