What is a Character Encoding?
The first step to understanding character encodings is that we're going to need to talk a little about how computers store character data. I know we would love to believe that when we push the
akey on our keyboard, the computer records a little
asymbol somewhere, but that's just fantasy.
I imagine most of us know that deep in the heart of computers pretty much everything is eventually in terms of ones and zeros. That means that an
ahas to be stored as some number. In fact, it is. We can see what number using Ruby 1.8:
$ ruby -ve 'p ?a' ruby 1.8.6 (2008-08-11 patchlevel 287) [i686-darwin9.4.0] 97
?asyntax gives us a specific character, instead of a full
String. In Ruby 1.8 it does that by returning the code of that encoded character. You can also get this by indexing one character out of a
$ ruby -ve 'p "a"' ruby 1.8.6 (2008-08-11 patchlevel 287) [i686-darwin9.4.0] 97
Stringbehaviors were deemed confusing by the Ruby core team and have been changed in Ruby 1.9. They now return one character
Strings. If you want to see the character codes in Ruby 1.9 you can use
Understanding M17n (Multilingualization)
Big changes are coming to Ruby in version 1.9 with regard to character encodings. Ruby is going from a language with some of the weakest character encoding support to arguably some of the best support out there for working with different encodings. We're all grown up now.
The downside is that the new code comes with a good size learning curve. I would know because I recently battled through figuring it out so I could add support to the standard CSV library for nearly all of the encodings. It was a battle too. It's brave new territory and there's not a lot of help out there yet for understanding Ruby's new features.
I'm hoping to change that.
This posting will be the start of a new series of blog articles designed to explain the character encoding support in Ruby 1.9. I'm going to assume you know absolutely nothing about character encodings though and begin by explaining in detail what they are and why we have them.
After that, we're going to examine the character encoding support in Ruby 1.8. There's a lot less support there to examine, but it's not well understood and I'm hoping that seeing it in detail will help with understanding how and why Ruby 1.9 is changing.