Internationalized domain names, are you ready?
Since may 11 TLD's (top-level domainnames) have been added. In order for this to work successfully, a lot of applications will have to be fixed.
Many email-validation scripts might use an approach like this:
- $ok = preg_match('/^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,6}$/i', $email);
This one is pretty simple, it matches the most common address formats, as long as the tld (.com, nl, .uk, etc) is under 6 characters. For a bit more sophistication you might want to ensure that the tld is a bit more valid:
- $ok = preg_match('/^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.(?:[A-Z]{2}|com|org|net|edu|gov|mil|biz|info|mobi|name|aero|asia|jobs|museum)$/i',$email);
Note: both these regexes were taken from regular-expression.info. The top google hit, and decent examples.
The new TLD's use non-ascii characters, and they might become aliases for existing top-level domains, or new tld's altogether. Here are the currently working examples:
- http://U...OOU.OYOOOOO - Arabic.
- http://aa.aue - Chinese (simplified)
- http://aa.ae - Chinese (traditional)
- http://IIIIIIuIIII.IIIIII - greek
- http://aaaaaa.aaaYaaYaa Hindi
- http://aa.aaa - Japanese
- http://ie.i...OiSiS - Korean
- http://U...OOU.OOU...OUOOUO - Persian
- http://DNDDDuN.DNDNNDDDDu - Russian
At first sight these look like regular utf-8, characters, but if you look at the sourcecode of this page, you'll notice that it's actually encoded differently.
The korean url http://ie.i...OiSiS, is actually encoded as http://xn--9n2bp8q.xn--9t4b11yi5a/. This is called Punycode.
If you want support for these new urls (and thus domainnames in emails), you should have support for punycode. You will likely receive UTF-8 encoded domainnames for email address (example@ie.i...OiSiS), but internally you must make sure that you only deal with the punycode representation.
This translating is also what modern browsers do. If you were to paste "http://xn--9n2bp8q.xn--9t4b11yi5a/" directly in the firefox address bar, it will show you the UTF-8 characters instead. Firefox will re-encode to punycode though and use that format for HTTP requests.
The best way really to check for valid email addresses is to use a very liberal regex, but verify with a simple MX record lookup if a mailserver exists for the given domain. This example is an expansion on the first regex.
- $email = 'example@xn--9n2bp8q.xn--9t4b11yi5a';
- A
- if(preg_match
Truncated by Planet PHP, read more at the original (another 2579 bytes)


