Understanding M17n (Multilingualization)
Big changes are coming to Ruby in version 1.9 with regard to character encodings. Ruby is going from a language with some of the weakest character encoding support to arguably some of the best support out there for working with different encodings. We're all grown up now.
The downside is that the new code comes with a good size learning curve. I would know because I recently battled through figuring it out so I could add support to the standard CSV library for nearly all of the encodings. It was a battle too. It's brave new territory and there's not a lot of help out there yet for understanding Ruby's new features.
I'm hoping to change that.
This posting will be the start of a new series of blog articles designed to explain the character encoding support in Ruby 1.9. I'm going to assume you know absolutely nothing about character encodings though and begin by explaining in detail what they are and why we have them.
After that, we're going to examine the character encoding support in Ruby 1.8. There's a lot less support there to examine, but it's not well understood and I'm hoping that seeing it in detail will help with understanding how and why Ruby 1.9 is changing.
Finally, we will examine all the new encoding features of Ruby 1.9 in as much detail as possible. We will literally cover it all. Along the way, I'll talk strategy and give you all the helpful tips I know to successfully managing character encodings, in general as well as with Ruby specifics.
This message will serve as a table of contents for this series of posts, so you may want to bookmark it if this topic is of interest to you. Here are all of the posts, in order:
- What is a character encoding?
- The Unicode Character Set and Encodings
- General Encoding Strategies
- Bytes and Characters in Ruby 1.8
- The $KCODE Variable and jcode Library
- Encoding Conversion With iconv
- Ruby 1.8 Character Encoding Flaws
- Ruby 1.9's String
- Ruby 1.9's Three Default Encodings
- Miscellaneous M17n Details
- What Ruby 1.9 Gives Us