New adventures in Character Encoding

By and large, I’ve been able to ignore the whole character encoding issue in Java. During a move to a new server however I had to sit up and take notice. Part of our code is using the getBytes() method to prepare some text files. We’d been using the default method, whose documentation states:

Encodes this String into a sequence of bytes using the platform’s default charset, storing the result into a new byte array.

The behavior of this method when this string cannot be encoded in the default charset is unspecified. The CharsetEncoder class should be used when more control over the encoding process is required.

Our new machine of course defaults to UTF-8 unlike our other servers, so upon producing our files it duly screwed up all our £ symbols. A system-property change later and all was well, but it has just reinforced the fact that I should really pay more attention to all those little caveats in the API documentation.

This entry was posted on Wednesday, January 14th, 2004 at 12:45 pm and is filed under Java. You can follow any responses to this entry through the RSS 2.0 feed. You can skip to the end and leave a response. Pinging is currently not allowed.

Be the first to comment.

Have your say





Fields in bold are required. Email addresses are never published or distributed.

Some HTML code is allowed:

URIs must be fully qualified (eg: http://www.domainname.com) and all tags must be properly closed.

Line breaks and paragraphs are automatically converted.

Please keep comments relevant. Off-topic, offensive or inappropriate comments may be edited or removed.