I’m happy to announce that jython finally supports Internationalized Domain Names.
In May 2010, the Internet Committee for Assigned Names and Numbers (ICANN) permitted the first non-ASCII Internationalized Top-Level Domains to be made available.
An Internationalized Domain Name (IDN) is a domain name that includes characters from outside the ASCII character set. IDNs are specified in RFC 3490, and may contain any character from the Unicode character repertoire, meaning that domain names may now contain any writing symbol in use by human-beings anywhere on the planet.
There are lots of interesting things and potential issues to note about IDNs.
They open up a whole new frontiers for cybersquatters and phishers, by opening up the possibility of Homograph Attacks, where the user is intentionally misguided by the visual similarity of character glyphs from different character sets. For example, is this character β a greek lower case beta or a german eszett?
Will companies like Société Générale find themselves held hostage by cybersquatters who have registered the accented version of their brand name before they even realise that it was possible to register it? In the case of the SocGen, the answer is no.
Doubtless, various solutions will be adopted to solve these problems, as national registrars from around the world deal with how these problems relate specifically to the characters in use in their country. For example, the Russian national registrar Coordination Centre for RU TLD only permits Cyrillic characters to be used in the new .рф top-level domain.
Unfortunately, jython idna encoding is currently broken, due to shortcomings in the unicodedata module and lack of stringprep support.
But IDNs are now supported when you run jython on a Java 6 JVM, since that version of the JVM has built-in IDN support: I checked in the changes at revision 7198.
There is also a workaround for java 5, where you can use GNU LibIDN to convert domain and host names to punycode before passing them to domain names that expect to receive ASCII parameters. This workaround is documented on the jython socket wiki.