Tag Archives: uri

.EU Internationalised Domain Names

the .eu domain is now supporting ACE Internationalised Domain Name (IDN) names which means that most european characters can now appear in the domain name part of URIs and URLs !

Examples:

Note : type the second (ACE encoded) version in brackets if your keyboard doesn't have a à character !

See also:

And if you're interested in Arabic or Russian domain names see:

Read and post comments | Send to a friend

Advertisements

Unicode, UTF-8, UTF-16, UCS-2, UCS-4 and URIs

Unicode can be confusing !

For a start there are a number of different encodings such as :

  • UTF-8 (for example in UTF-8 is 0xE2 0x82 0xAC)
  • UTF-16 (which uses surrogate pairs to represent "characters" outside the Basic Multilingual Plane (BMP)
  • UCS-2 (a predecessor of UTF-16)
  • UCS-4

A RFC-2396 URI must be encoded / escaped using UTF-8 (and %hex-values) so if you want to acccess a web page called

the URI will be

and different browsers seem to work with Unicode URIs in different ways !

  • Safari works with both (.php and %E2%82%AC.php) and helpfully (?) redisplays %E2%82%AC.php as .php in the address bar
  • Firefox converts .php (sometimes incorrectly to %80.php) so you can only use / see %E2%82%AC.php
  • IE works with both (.php and %E2%82%AC.php) but leaves both versions unchanged in the address bar

Read and post comments | Send to a friend