Gray Soft / Rubies in the Rough / Delaying Decisions

11

MAY
2012

Delaying Decisions

I love playing with Ruby's Hash. I think it has a neat API and experimenting with it can actually help you understand how to write good Ruby. Let's dig into this idea to see what I mean.

The nil Problem

In Destroy All Software #9 Gary chooses to show an example in Python because, unlike Ruby's Hash, it will raise an error for a non-existent key. Ruby just returns nil, he explains.

What Gary said isn't really true, but I'm guessing he just didn't know that at the time. He was in the process of switching to Ruby from Python and I'm guessing he just didn't have a deep enough understanding of Ruby's Hash yet. I bet he does know how it works now.

But assume he was right. What's he saying and why does it matter? Consider some code like this:

class SearchesController < ApplicationController
  def show
    terms = params[:terms]
    SomeModel.search(terms)
    # ...
  end
end

This is what Gary doesn't like, and rightfully so. Because I indexed into params here with the []() method, I will indeed get a nil if the :terms key wasn't in params.

The problem with nil is that it's a ticking time bomb just waiting to go off. It has a very limited interface by design, so calling methods on it may raise errors. It's probably not the interface your code is expecting. For example, in the code above I likely intended to get a String in terms, but if I got nil instead and I later try to call String methods on it, it's going to blow up.

This gets worse because nil values tend to bounce around in the code a bit. This means the error may finally manifest far from the line where I actually assigned terms, which is the real problem that needs fixing. For example, let's say the model code I handed off to looks something like this:

class SomeModel < ActiveRecord::Base
  def self.search(terms)
    words = terms.scan(/\S+/)
    # ...
  end
end

This is where the error would show up. The call to scan() would toss a NoMethodError since nil doesn't have scan(). The stack trace would lead us here, but, as I said before, this isn't what needs fixing.

Now, we could fix it here. One way to do that would be to insert a conversion before the scan() call:

words = terms.to_s.scan(/\S+/)

That's probably not a good idea though. We've now said that calling SomeModel.search(nil) is OK and that doesn't really make much sense.

A slightly better fix would be to add a guard before the call to scan()

fail ArgumentError, "You must pass search terms" if terms.nil?
words = terms.scan(/\S+/)

This would get us a better error message that actually tells us what went wrong. But the stack trace still wouldn't be ideal. It's going to bring us here first even though the real issue is in the controller.

My Favorite Method to the Rescue

This whole problem started because I allowed the Hash to give me a nil. I didn't want a nil, so I shouldn't have said it was OK.

That brings us to fetch(). Watch this:

>> hash = { }
=> {}
>> hash[:missing]
=> nil
>> hash.fetch(:missing)
KeyError: key not found: :missing
from (pry):3:in `fetch'

You can think of []() as saying give me some key, if it exists. You get a nil if it doesn't. Now fetch(), on the other hand, is more like saying I need this key, no substitutions allowed. This forces fetch() to raise a KeyError when the key is missing, since nil isn't allowed.

The best fix to the problem I started with would be to replace the initial assignment in the controller with this:

terms = params.fetch(:terms)

You may also want to add some code to handle the KeyError, but the important thing is that the problem now triggers from the proper place. This is the most helpful error we could work with in this scenario.

I Don't Know What You Need

Let's take a step back for a moment to the overall Hash. What exactly is a Hash in your code? The only right answer is that I have no idea.

You might be creating a typical key-to-value mapping:

>> name = {first: "James", last: "Gray"}
=> {:first=>"James", :last=>"Gray"}
>> name[:last]
=> "Gray"

Alternately, you could be tracking some counts:

>> counts = Hash.new(0)
=> {}
>> counts[:one] += 1
=> 1
>> 3.times do counts[:three] += 1 end
=> 3
>> counts.values_at(:zero, :one, :three)
=> [0, 1, 3]

Or you could be keeping some named buckets for data:

>> buckets = Hash.new { |hash, key| hash[key] = [ ] }
=> {}
>> buckets[:one] << 1
=> [1]
>> buckets[:three] << 1 << 2 << 3
=> [1, 2, 3]
>> buckets
=> {:one=>[1], :three=>[1, 2, 3]}

Perhaps it's even a deeply nested structure that you wish to track:

>> tree = Hash.new { |hash, key| hash[key] = Hash.new(&hash.default_proc) }
=> {}
>> tree[:deeply][:nested][:structure] = 42
=> 42
>> tree[:deeply][:nested][:branch] = 1
=> 1
>> tree
=> {:deeply=>{:nested=>{:structure=>42, :branch=>1}}}

These are all very different, but valid, uses of a Hash in Ruby to model various data structures and there are other options too. Ruby can't know what you intend to do. But it was wisely designed so that it doesn't have to know.

Note the progression above that makes this possible. You can create a normal Hash. You can pass a default object, like an initialization value for counters. Or you can go all the way to defining the default behavior with custom code inside of a block as I did in the last two examples.

Ruby delays the decision of default behavior so that it can leave that decision to you. You know what you need better than Ruby does.

And it doesn't stop there.

Back to My Favorite

Being able to set some default behavior at Hash construction time is great, but what if I don't know everything I need to know even then? What if I need to delay the decision even more, until key lookup time? What if what I actually need is different at different times? The answer is that fetch() can handle those cases too.

>> hash = { }
=> {}
>> hash.fetch(:missing, 0)
=> 0
>> hash.fetch(:missing) { |key| key.to_s.split("").sort.join }
=> "giimnss"

You can see that fetch() also supports default return values for a missing keys and it will even allow you to provide custom code to handle the situation however you need to.

If you only get one thing out of this article, make it Hash#fetch(). It's a powerhouse tool that enables you to do all kinds of fancy tricks. (It's worth noting that Hash#delete() is similarly flexible. Look it up.) But that's not really why I'm showing you these methods.

The Pattern is the Thing

The real reason to play with these methods is that they represent a great tactic for Ruby programming in general. It's important to remember that you won't know exactly what the users of your code will need. Delay making that decision whenever you can.

In fact, delay it as long as you can, ideally just pushing the decision off on the user when they finally need to make it. They will know what they need better than you do. Let them raise an error, do a conversion, or whatever else makes sense for them.

In a lot of ways, this discussion is about Ruby's blocks. That's the great tool Ruby gives us to allow us to delay these decisions. Consider even a simple method like this:

def do_some_io_operations_for_me
  # ... IO code here...
rescue IOError, Errno::EACCES #, ...
  block_given? ? yield : raise
end

This is a pretty perfect setup, in my opinion. This code tries to push through some IO operations. Those could fail for any number of reasons, so the code intelligently traps the errors that apply. If a users gives a block, they can control what happens on error without even needing to know which errors the IO code could trigger. However, the decision of exactly what to do is left to the caller.

Of course, the biggest limitation on this strategy is that we only get one block in Ruby. Once you delay one decision with it, you won't have it for other purposes. Try not to let that stop you though! Ruby doesn't.

Consider the find() iterator. With find() the block is already tied up for the test, but Ruby wants to let you decide what happens when nothing is found. Returning nil is not enough for those cases, because nil could legitimately be what the block found. Because of that, find() cheats to expose kind of a second block:

>> (0..9).find(-> { fail ArgumentError, "Not found" }) { |n| n >= 10 }
ArgumentError: Not found
from (pry):28:in `block in <main>'

I think these kind of tricks come out a little cleaner with the new "stabby lambda()" (->) syntax that I used above.

Further Investigation

Fully getting the hang of when and where to use blocks in Ruby seems to take almost as long to get a feel for as the rest of the language does. Here are some sources that might shave a little time off of the journey though:

I wrote an article on blocks a long time ago. I think it has aged fairly well and still provides some insight into why blocks exist and what they really are.
Rake is a great example of the delayed decisions I am advocating in this article in many ways. When you think about it, Rakes is really just an executable dependency graph of delayed decisions. For a more concrete example though, check out Rake's FileTask. It's a delayed decision about how to build a file from one or more dependencies any time the content is no longer fresh.
The 2.10.0 release of RSpec brings with it some matchers for yielded control to a block. This can make testing all of those block methods I'm asking you to write a little easier.