News Archive
PhpRiot Newsletter
Your Email Address:

More information

Out with the Old

Note: This article was originally published at Planet PHP on 20 April 9600.
Planet PHP

Jetpacks. Flying cars. Databases able to handle infinite amounts of data. Breakthrough after breakthrough, computer engineering forges a shiny future, and yeta

One of the first interview questions I was asked was to describe, in my own words, the difference between an inner join and an outer join. It is a question I have adopted, and I now ask it to every candidate. Over time, I've been interviewing more and more senior developers, most of whom know the correct answer, so this question has become a way to gaze into the developer soul instead, providing a brief glimpse of the developer's professional experience. Typically, answers take one of the following paths:

  • A dry, textbook answer. Bonus points for using table EMPLOYEES in the example.
  • A MySQL-specific answer, illustrated with a set of tables to support a blogging app.
  • Data relationships (and therefore joins) explained as a set of properties of objects. Most answers here include User objects and Friend relationships.

I see the last answer more frequently these days, which makes sense. NoSQL is making huge inroads in the world of software engineering, and frameworks are popping up left and right divorcing developers from the need to interact with databases (sorry, aodata storesa) directly. The combination of Moore's Law and developers' access to as much computing power as they needa-aor wanta-acreates a situation where developers can get away with abandoning a schema-first approach to app design. This is great for those who have never learned SQL properly, or those who don't want to feel shackled by a relatively inflexible schema and prefer to focus on business priorities instead.

Not designing the schema first is cool, but data still has to live somewhere and continue to be managed and monitored. We have not skipped over databases; we've just changed, quite drastically, how we think about them. Over time, relational databases battled each other, pushing features and vertical designs (I'm looking at you here, Oracle) that promised end-to-end data management. In the meantime, NoSQL data stores thought about separation of function and specialization. Instead of trusting relational databases to solve the problems behind the CAP theorem, they went all postmodern on data-driven software and deconstructed how data is stored, retrieved, synchronized, and persisted within an app.

Gone are the days where a web app was a database with some PHP in front of it. Web apps are now layers and layersa-aa lot of them independent, many of them using completely different technologies, and some interacting with each other via HTTP-based APIs. The art of scaling web apps is the art of shifting bottlenecks around, and in data-first design, the bottleneck was frequently the monolith database. These days, you might encounter some of the following within your app:

  • A caching layer from which most stored data is retrieved, usually implemented using Memcached or Redis.
  • Decoupling the volume of user activity from the expectations of app responsiveness by delaying the computation of some tasks. This is sometimes implemented by using a queue-specialized data storea-aeither by using a great key/value store like Redis or an off-the-shelf product like Gearman.
  • Data that is separated horizontally across many databasesa-aperhaps a few MySQL instances with sharded/replicated data.
  • Data that is separated based on natural parameters, such as user data in a graph database like Neo4j for easy relationship digging, and articles the users write in a MongoDB instance for quick reads and writes.
  • A standalone search index of the content of your app in a search-specialized data store, such as Solr.
  • A layera-ainvisible to the end usea-athat siphons samples data from designated data store locations, then crunches it until it's transformed into information and, later on, knowledge.

The bottlenecks still exist, but they are isolated explicitly in a way that can be either easily refactored or replaced with a new technology when it is invented. Your search might be slow, but in a few months, Solr might get replaced with X. Your articles might require more error-tolerant write mechanism, but in a few weeks, your MongoDB instances might get replaced with Y.

From the standpoint of an SQL aficionado, I enjoy seeing how relational databases continue to be used but are f

Truncated by Planet PHP, read more at the original (another 2345 bytes)