News Archive
PhpRiot Newsletter
Your Email Address:

More information


Note: This article was originally published at Planet PHP on 1 December 2010.
Planet PHP

When creating a web site, it is important to consider the audience. Regardless of whether the web site is selling goods, providing a service, or making information available, one should consider how the site will be displayed to a visitor from another country. If visitors want to buy something, use the service, or read the information provided, will the web site be able to accommodate them? Should it? Localization allows the user interface to accommodate a user's expectations for the display of dates, currency, and text specific to their locale.

Many native English speakers don't realize that, despite the far reach of their language, it is still a small piece in the spoken language pie. This article states that while English is definitely the most distributed language in the world, it is not the most spoken (currently third). There is so much useful information on the Web, and plenty of it is not in English. Solving language-related problems with computers is a difficult thing to do, but making the web sites we create more internationally-friendly is not as hard as it may seem. While it may not be suitable for a site to cater to ten different languages, it may very well be useful for it to do so for a couple. As an example, in the United States, the increase in Spanish-speaking people is apparent, and having a commerce site that can communicate with that growing subset of the country would mean expanding your potential customer base.

Before showing an example of how to start implementing locales for your web site, I'd like to take a moment to talk about character encoding. Even if you don't foresee needing to provide for localization in your web site, character sets (charsets) and encoding are important aspects to consider with or without localization. If you are unfamiliar with character encoding, it is used to help software know which characters to expect from the user. For example, the common ISO 8859-1 character set is intended for characters common to Western Europe. So, I can use German umlauts along with the English alphabet within this charset. If, however, I were to try and display an Arabic character, such as U, when using ISO 8859-1, what may be displayed instead could be or A, meaning that under the ISO 8859-1 charset, we don't know what that Arabic character is.

A very cross-language-friendly character encoding is the ASCII-compatible UTF-8, which is used to represent the Unicode character set, especially in legacy apps. Using UTF-8 for a web site provides a large list of supported characters that will best accommodate your users. To implement a character encoding properly, you want to make sure it is the same across your entire app. If you are using UTF-8, then Apache, your database tables, PHP and its applicable functions, and the document type should all be set to UTF-8. If these are not consistent, you may end up with a mix of data in different character encodings, which can be very difficulta-aif not impossiblea-ato remedy.

// Apache httpd.conf or .htaccess // This will add a charset to the Content-Type response header. AddDefaultCharset UTF-8 // php.ini default_charset = "UTF-8" // Example PHP function htmlentities($data, ENT_COMPAT, 'UTF-8'); // XML// html

For more information about UTF-8 and why it is important, check out these links:

Now that we've spoken about character encoding, let's jump back to implementing localization. How do we determine which locale to load for a user? How do we know where they are coming from? One could figure it out with the user's IP address, but the simplest way would be to let the browser tell us what locale to use. The Accept-Language HTTP header is sent with an HTTP request and is something we can see in PHP's $_SERVER superglobal. It looks something like this:

Accept-Language: en-us;en;q=0.5

While this is a simple way to detect the starting language in which to display the web site, it does not mean that it is

Truncated by Planet PHP, read more at the original (another 4785 bytes)