News Archive
PhpRiot Newsletter
Your Email Address:

More information

Share and Enjoy

Note: This article was originally published at Planet PHP on 12 December 2010.
Planet PHP

aoShare and Enjoya is the motto of a certain division of the largely successful Sirius Cybernetics Corporation. I'm here to complain about it.

Data is precious. It is expensive to produce, time consuming to fetch, and eventually discarded. It makes sense to share it and save all the hassle that goes into herding it out of the remote servers it lives in. There can be, however, too much of a good thing.

In short, sharing is awesome, but you should care where you put your data.

API rage

The best kind of sharing, of course, is the type you don't even know exists. Transparent caches work better than the regular user API kind, because the abuse, when it happens, is systematic and universal. Sooner or later, you'll need an API, but leaving the guts of the cache open to use will constantly have you saying, aoYou're doing it wrong.a

Good caching APIs are hard to design. Making them foolproof is nearly impossible, because you end up punishing the users who access them properly. As a API designer, the hardest lesson to learn is to never make promises you don't want to keep.

I've learned that lesson with APC.

The return value trap

One hand taketh what the other hand giveth. When either one's empty, it's up to the documentation to explain why we ended up with a NULL, where data was supposed to appear. Since you want a cache to be fast rather than completely honest, it could be anything from aoI'm busya to aoI haven't got what you asked fora and everything in between.

Everything revolves around the clarity of documentation here. In the above situation, most people using the cache would assume the latter. The problem is that the cache might have actually meant was aoI got bored of lookinga uh, I mean ETIMEOUT.a

The story is the same when putting data into the cache. A successful return value means so little when the next queued request is about to discard the data. The delays that ensure that data was stored in its proper condition are not unlike waiting at the post office until your letter is delivered. In reality, sharing data should not be the responsibility of the one who has stored it.

Smoke signals

For a language like PHP, there is no built-in system to communicate between requests. APC's mechanisms to share data among multiple requests provides an easy shortcut to set flags and signal events between them. Unlike the good old-fashioned aobest efforta type cache, these new methods are meant for immediate visibility and reliably across the entire system. It is for these reasons that the system goes about doing everything slowly and steadily.

I die a little every time I see apc_add() used as a mechanism to ensure exclusivity, especially if the key includes _lock.

Take me to the cleaners

Most caches are fast if you don't count the slow parts.

Because APC is used for signaling, cleaning up is the slowest part; the entire cache needs to be nuked for most of the code to work properly. But, just like any other portable PHP extension, this has to be done inside a manual invocation of a function. There is no implicit way to run an independent cleanup job in the background. Sooner or later, a request is going to get hit by a cleanup routine.

The standard issue with cleanup during a request is that the request can be aborted by a usera-aPHP can kill a partially-complete request if ignore_user_abort is not set. When this happens, the entire cache can become deadlocked with its memory ending up in an inconsistent state.

Fortunately, most caches rarely need a cleanup.

Fill 'er up

As you might have guessed, caches do overflow. People will cache whatever gives them a performance boosta-aas they shoulda-abut not all scenarios are created alike.

Scalability turns this into a strange and dangerously-abused concept. Throwing user data into a shared space is an excellent strategy if you're building a system with a single server. You need it now, you need it later, and of course, you'll need it repeatedly. Throw in six hundred servers and have your users rotate among them, however, and this type of caching turns into a complete waste of time. The same cache that turned your system up to eleven in QA is suddenly becoming a little CPU-eating monstera-aand not the cute kind.


Fundamentally, all of

Truncated by Planet PHP, read more at the original (another 1163 bytes)