PhpRiot
News Archive
PhpRiot Newsletter
Your Email Address:

More information

Internationalized domain names, are you ready?

Note: This article was originally published at Planet PHP on 21 October 2010.
Planet PHP

Since may 11 TLD's (top-level domainnames) have been added. In order for this to work successfully, a lot of applications will have to be fixed.

Many email-validation scripts might use an approach like this:

  1. $ok = preg_match('/^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,6}$/i', $email);

This one is pretty simple, it matches the most common address formats, as long as the tld (.com, nl, .uk, etc) is under 6 characters. For a bit more sophistication you might want to ensure that the tld is a bit more valid:

  1. $ok = preg_match('/^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.(?:[A-Z]{2}|com|org|net|edu|gov|mil|biz|info|mobi|name|aero|asia|jobs|museum)$/i',$email);

Note: both these regexes were taken from regular-expression.info. The top google hit, and decent examples.

The new TLD's use non-ascii characters, and they might become aliases for existing top-level domains, or new tld's altogether. Here are the currently working examples:

At first sight these look like regular utf-8, characters, but if you look at the sourcecode of this page, you'll notice that it's actually encoded differently.

The korean url http://ie.i...OiSiS, is actually encoded as http://xn--9n2bp8q.xn--9t4b11yi5a/. This is called Punycode.

If you want support for these new urls (and thus domainnames in emails), you should have support for punycode. You will likely receive UTF-8 encoded domainnames for email address (example@ie.i...OiSiS), but internally you must make sure that you only deal with the punycode representation.

This translating is also what modern browsers do. If you were to paste "http://xn--9n2bp8q.xn--9t4b11yi5a/" directly in the firefox address bar, it will show you the UTF-8 characters instead. Firefox will re-encode to punycode though and use that format for HTTP requests.

The best way really to check for valid email addresses is to use a very liberal regex, but verify with a simple MX record lookup if a mailserver exists for the given domain. This example is an expansion on the first regex.

  1. $email = 'example@xn--9n2bp8q.xn--9t4b11yi5a';
  2. A
  3. if(preg_match

Truncated by Planet PHP, read more at the original (another 2579 bytes)