-
30
MAR
2009Ruby 1.9's String
Ruby 1.9 has an all new encoding engine called m17n (for multilingualization, with 17 letters between the m and n). This new engine may not be what you are use to from many other modern languages.
It's common to pick one versatile encoding, likely a Unicode encoding, and work with all data in that one format. Ruby 1.9 goes a different way. Instead of favoring one encoding, Ruby 1.9 makes it possible to work with data in over 80 encodings.
To accomplish this, changes had to be made in several places where Ruby works with character data. You're going to notice those changes the most in Ruby's
String
though, so let's begin by talking about what's changed there.All Strings are now Encoded
In Ruby 1.8 a
String
was a collection of bytes. You sometimes treated those bytes as other things, like characters when you hit it with aRegexp
or lines when you calledeach()
. At it's core though, it was just some bytes. You indexed the data by byte counts, sizes where in bytes, and so on.In Ruby 1.9 a
String
is now a collection of encoded data. That means it is both the raw bytes and the attachedEncoding
information about how to interpret those bytes. -
23
MAR
2009The Grays at MountainWest RubyConf 2009
Dana and I really enjoyed our first MountainWest RubyConf experience. The talks were very high caliber, the venue great, and the hosts generous. This conference also proves that not only does a single-track conference still work, it's just plain better.
Anyway, we've had some requests for our slides so I've made them available. You can view the slides for my LittleBIGRuby talk or Dana's slides from her Lightning talk on regular expression.
-
8
FEB
2009Pragmatic Thinking & Learning
I have a new standard by which all future reading material will measured. Any book that casually mentions lock picking and follows it up with a footnote reference to further reading that will improve your lock picking skills when restricted to an improvised toolset is an instant hit. Pragmatic Thinking & Learning does exactly that. While that description may be a bit of hyperbole (I had to look that word up Andy), the book really does deliver, both on the lock picking references and the great content.
If I had to sum up Pragmatic Thinking & Learning in one sentence it would be: it's a book about how to start thinking about thinking, with a moderate computer programmer slant. If that sounds a bit general, well, it is. A construction worker or anyone else could learn new things that would help them in their jobs and just day to day lives from this book. I know I would love for my teenage foreign exchange student to read it, because she could learn a lot from it. I'm pretty sure this book does exist in many other forms targeted at different groups of people. The advantage of this one is that I get the jokes and metaphors. Hooray for geek humor!
-
7
JAN
2009The Evils of the For Loop
I've never liked the
for…in
loop in Ruby. I cringe every time I see it in examples (Rails seems to put it in views a lot) and I tend to switch it to aneach()
call. It really bugs me.That's mostly just my gut reaction, but if I had to put it into words it's that I fell in love with Ruby's iterators early on and
for
just doesn't seem to fit in well with them. I don't think that's just my emotions talking either, it really doesn't fit in. I'll try to show you why I say that.First, let's see what I'm talking about. We are all pretty comfortable with
each()
, right?(1..3).each { |i| p i } # >> 1 # >> 2 # >> 3
I doubt that surprises anyone. Many of you probably also know that Ruby allows you to write that as:
for i in 1..3 p i end # >> 1 # >> 2 # >> 3
That's almost the same thing. It really does use
each()
under the hood, for example:class MyEachThing def each yield 1 yield 42 yield 2 yield 42 yield 3 end end for i in MyEachThing.new p i end # >> 1 # >> 42 # >> 2 # >> 42 # >> 3
-
12
DEC
2008RSS Upgrades
I've had an RSS feed for the entire blog up for some time now at http://graysoftinc.com/feed.xml. Yes, I know, it sucked. Several of you emailed me to tell me just how much it sucked.
In my defense, I wrote this blogging engine because I wanted to play around different aspects of how this software could work. Spam prevention was high on that list for me, because it affected me a lot. Not being a feed reader junkie, RSS was less critical to me. I know it was important to you though and I'm trying to make good on my promises, finally.
The number one complaint was that my RSS feeds did not include content, just descriptions. This is fixed. The full article is now in the feed.
I'm not sure how well that's going to work out yet. Regular readers know that I'm terribly wordy, so even just placing the last ten articles in the feed makes it quite large. I don't know how inconvenient that's going to be for you or me yet, but we can try it out for now.
As and added bonus, I've added category specific feeds and article specific feeds that show comments as they come in. Just click the feed link in the footer of any category or article page to try these out.
-
11
DEC
2008Ruby 1.8 Character Encoding Flaws
Now that we have toured the entire landscape of Ruby 1.8's encoding support, we need to discuss the problems the system has. These long standing issues are what pushed the core team to build the m17n (multilingualization) implementation for Ruby 1.9.
The main problems are:
- Not enough encodings supported
-
Regexp
-only support just isn't comprehensive enough -
$KCODE
is a global setting for all encodings
I imagine most of those are pretty straightforward, but let's talk through them just to make sure we learn from the mistakes of the past. I'm pretty sure this will make it easier to understand why things are the way they are in Ruby 1.9.
The "not enough encodings" complaint should be the most obvious of all. Ruby 1.8 supports four and one is just no encoding. That means you really only get UTF-8 and two Asian encodings. The UTF-8 support is how we've managed to make it this far, but there are a ton of common encodings that just aren't covered.
The most important thing to realize here though is that we can't just keep adding encodings to Ruby 1.8. The system wasn't designed with that in mind. We will run out of letters to tack onto the end of a
Regexp
very fast. It's just not practical. -
11
DEC
2008The Definitive Guide to SQLite
I'm a huge fan of SQLite. Every time I do something with the little database it always manages to impress me in new ways. Here's a pop quiz for you:
- Did you know SQLite is totally free? I mean really free. All the code is in the public domain, protected by affidavits, and you can literally do anything you like with it.
- Did you know SQLite uses "manifest typing" which is similar to Ruby's dynamic typing? The database engine will really allow you to handle field types in whatever way is best for your needs. Of course, you can do type checking in triggers if you prefer to be more strict.
- Everyone knows SQLite shoves an entire database in one file, but did you know that it can work with more than one of those files at once? Yes, SQLite can query across multiple databases.
I could go on and on. Really, I could. SQLite is that cool.
It's almost silly to use flat files these days. If you find yourself needing one, you can load one gem instead, stick a full database in the file, take advantage of transactions and locking (very multiprocessing friendly), gain a full query language for working with the data, and have a prebuilt human interface completely separate from your code (the command-line tool is great for debugging). It's hard to beat that.
-
9
DEC
2008XMPP and Metaprogramming Screencasts
I've mentioned some nice screencasts I've found in the past. Well, I've been watching quite a few more lately and I've uncovered some more hits.
First, PeepCode has another excellent screencast on using XMPP with Ruby. This video explains what XMPP is and isn't, why it's important, and shows a good deal of information about how you can work with the protocol to accomplish some real world server to server or human communication tasks. You don't need any prior XMPP knowledge going into this one.
It's hard to overstate exactly how much PeepCode got right with this video. For example, I've seen quite a few screencasts now that byte off more than they can chew for a short video. That's not the case here. XMPP turns out to be perfectly bite sized in that a one hour video can serve as a strong introduction to pretty much all you need to know when using it. This has other advantages too. Since the creator isn't trying to squeeze too much content into too short of time, he can afford to drop some truly stellar related tips. In the case of the XMPP video these are what IM client to use when debugging, because it allows you to see the underlying protocol, and how to easily combine XMPP with DRb for fire-and-forget messaging. These extras really push this screencast over the top.
-
8
DEC
2008Encoding Conversion With iconv
There's one last standard library we need to discuss for us to have completely covered Ruby 1.8's support for character encodings. The
iconv
library ships with Ruby and it can handle an impressive set of character encoding conversions.This is an important piece of the puzzle. You may have accepted my advice that it's OK to just work with UTF-8 data whenever you have the choice, but the fact is that there's a lot of non-UTF-8 data in the world. Legacy systems may have produced data before UTF-8 was popular, some services may work in different encodings for any number of reasons, and not quite everyone has embraced Unicode fully yet. If you run into data like this, you will need a way to convert it to UTF-8 as you import it and possibly a way to convert it back when you export it. That's exactly what
iconv
does.Instead of jumping right into Ruby's
iconv
library, let's come at it with a slightly different approach.iconv
is actually a C library that performs these conversions and on most systems where it is installed you will have a command-line interface for it. -
3
DEC
2008Browser CAPTCHA
I'm sure everyone has noticed that my blog posting has dramatically fallen off from the rate I was getting articles out. Unfortunately, I've been spending my blog time fighting the endless war against spam. I've made some progress there and thought I would share some details that others might find useful.
As I've covered previously this blog now requires me to approve all comments. I'm super happy with this decision. I approve posts promptly, so there's pretty much no downside for users and this means you have not seen a single spam message on this site since I made the change. This was literally the perfect solution… on the viewer's side of the fence.
What it didn't fix was the hassle on my side. I don't mind approving messages at all, as long as I have a reasonable pile to go through. However, the spammers really ramped up their efforts against me lately and this blog received 11,134 comment posts in the month of November alone. Six of those were legitimate comments. That exceeds my definition of reasonable.