Rubies in the Rough

This is where I try to teach how I think about programming.

21

MAR
2012

Learn to Love Mix-ins

The road to mastering Ruby is paved with understanding some key Ruby concepts. Mix-ins are one of those concepts. I'm sure everyone reading this knows the mechanics of how mix-ins work, but it pays to spend some time really thinking about all that mix-ins imply. Let's do just that.

Adding a Type

One of the primary reasons that Ruby needs mix-ins is that it does not support multiple inheritance. That leaves mix-ins as our only option for modeling hybrid objects. It's the way Ruby programmers can add another type.

That's a good way to think about it too: adding a type.

Take pagination, for example. Pagination methods are usually defined to return an object like this:

class PaginatedCollection < Array
  # ... paginated helpers defined here ...
end

That's never really felt right to me though.

First, inheriting from Ruby's core classes can come back to bite you in some scenarios. The reason is that Ruby makes some performance tradeoffs to keep the core classes fast, but those tradeoffs mean that those classes don't always perfectly follow Ruby's rules.

You can get around some of these issues by delegating to an Array, instead of inheriting from it. But that creates even more structure for this setup when it seems that Ruby has a totally workable built-in solution.

The name of the created Class is another hint that it's not ideal, in my opinion: PaginatedCollection. Collection is just another way to say Array, right? So why can't we just say Array? In truth, Collection implies a lot more than just an Array, but solutions like this don't generally adapt to other types.

That leaves just Paginated, which is used as an adjective here. Adjectives are generally better modeled by a Module than a Class. It's almost like Ruby is telling us the right way to model this concept, if we listen closely enough.

To me, it's telling us to build this:

module Paginated
  def paginate(page = 1, items_per_page = size, total_items = size)
    @page           = page
    @items_per_page = items_per_page
    @total_items    = total_items
  end

  attr_reader :page, :items_per_page, :total_items

  def total_pages
    (total_items / items_per_page.to_f).ceil
  end

  # etc...
end

collection = %w[first second third]

collection.extend(Paginated)
collection.paginate(1, 3, 10)

puts "There are #{collection.total_pages} pages."
puts "The collection is an Array."           if collection.is_a?(Array)
puts "The collection is a Paginated object." if collection.is_a?(Paginated)

That seems more natural. The item is an Array. It also happens to be Paginated. We can understand that duality and so can Ruby.

Semantics aside though, this has actual benefits. The code works just fine if I change this line:

collection = %w[first second third]

to:

require "set"
collection = Set.new(%w[first second third])

or even:

collection = {first: 1, second: 2, third: 3}

In other words, we can now paginate whatever we want, including our own classes. We can also mix Paginated into any result set we need to, without needing to use some special collection and a replace() method (as you sometimes see in Rails code using will_paginate).

There's even another plus to this approach though: you can make these type decisions as the code runs. Let's dig into what that can do for us.

What do I Need Now?

Another great aspect of using mix-ins to build up these types is that objects can be different things at different times. We'll start with a somewhat silly, but illustrative, example:

class Human
  def initialize(name, strength)
    @name     = name
    @strength = strength
  end

  attr_reader :name, :strength

  def move
    "#{name} walks."
  end
end

module Werewolf
  module HybridForm
    def strength
      super + 5
    end

    def move
      super.sub("walks", "lopes")
    end
  end

  module WolfForm
    def strength
      2
    end

    def move
      super.sub("walks", "runs on all fours")
    end
  end
end

human = Human.new("James", 3)
puts "Normally, #{human.name} is a #{human.class} " +
     "with a strength of #{human.strength}.  #{human.move}"
puts

werewolf = human.dup.extend(Werewolf::HybridForm)
puts "But when there's a full moon, " +
     "strength raises to #{werewolf.strength} and #{werewolf.move}"
puts

wolf = human.dup.extend(Werewolf::WolfForm)
puts "Some also claim to have seen him as a normal wolf, " +
     "with a strength of #{wolf.strength}."
puts "In this form #{wolf.move}"

That code outputs:

Normally, James is a Human with a strength of 3.  James walks.

But when there's a full moon, strength raises to 8 and James lopes.

Some also claim to have seen him as a normal wolf, with a strength of 2.
In this form James runs on all fours.

As you can see, we're able to change the human object as needed. We can add in the abilities of the Werewolf::HybridForm or the Werewolf::WolfForm. The object then has those behaviors.

Believe it or not, this does have practical value outside of building fantasy games. For example, I once worked on a Rails application that had some pretty complex reporting needs. The queries used to do the reporting were a bit sluggish, so we wanted to make sure that kind of thing never got run during a normal request. Thus we had a model and a separate Module for the reporting code. They looked something like:

class User < ActiveRecord::Base
  # no generate_report method here
end

module Reportable
  def generate_report
    # build and return a processing intensive report
  end
end

The Module wasn't even loaded by our normal application. Instead, we ran a nightly background process, on a different server, that used our backup dump of the database to do the reporting. That code is where the Module actually lived. The reporting process then used it in a loop like this:

User.find_each do |user|
  user.extend(Reportable)
  report = user.generate_report
  # report sending code here...
end

This has other benefits besides safety. For example, it really helps with code organization. The reporting code was quite complex and it just didn't apply to the normal workings of the application. With this setup, we only had to dig through that code when it was relevant to what we were doing. The rest of the time we could pretty much forget it was there. (We did have tests to ensure User kept the contract Reportable counted on, so we wouldn't accidentally break it.)

I would like to see Rails move towards supporting a system like this. I'll build a plugin at some point that lets you add various mix-ins for the models. They could be stored in the application like this:

app/models/user.rb
app/models/user/authenticatible.rb
app/models/user/reportable.rb
…

Then I'll add a method for mixing these into the query results they apply to. For example:

user_logging_in = User.behavior(Authenticatible).find_by_email(email)

and:

User.behavior(Reportable).find_each do |user|
  # ... generate reports...
end

You get the idea.

OK, let's consider modules in one more way to round out this exercise.

Monkey Patching is Generally Silly

We always say that we love how dynamic Ruby is. It allows us to rewrite the rules as we go. The ultimate expression of this is how we can just reopen a class and replace some of its methods wholesale. We lovingly refer to this practice as "monkey patching."

When you think about it though, the practice is almost silly for most of the cases we use it for.

To explain that claim, we need to take a quick detour and discuss Ruby's method lookup. Instead of explaining how it works, let's just ask Ruby to show us:

class Object
  def show_yourself
    puts "In Object."
  end
end

class Parent
  def show_yourself
    puts "In Parent."
    puts "Handing up the line..."
    puts
    super
  end
end

class Child < Parent
  def show_yourself
    puts "In Child."
    puts "Handing up the line..."
    puts
    super
  end
end

o = Child.new
o.show_yourself

That code prints:

In Child.
Handing up the line...

In Parent.
Handing up the line...

In Object.

Hopefully, that's pretty straightforward. We called a method on a Child instance. That method then handed up to the same method defined in Parent, because Child inherits from Parent. That method then hands up to Object, because anything that doesn't declare a superclass, like Parent, gets Object.

In Ruby though, that's not the full story. Let's change the last chunk of code to this:

# ...

o = Child.new
class << o
  def show_yourself
    puts "In the hidden singleton class."
    puts "Handing up the line..."
    puts
    super
  end
end
o.show_yourself

Now we get this output:

In the hidden singleton class.
Handing up the line...

In Child.
Handing up the line...

In Parent.
Handing up the line...

In Object.

Each instance in Ruby has its own Class, called the singleton class. (No, we're not talking about the Singleton design pattern here. This is a different use of the term.) This is why we can modify individual objects no matter what their Class is. The specializations end up in the singleton class. Ruby mostly hides this class from us as an implementation detail, which is why I had to open the object's singleton class explicitly to get it to show up above.

We can now see that method lookup is a straight line in Ruby. You start at the bottom, with the object's singleton class and go straight up until you find a matching method.

That leads us to a question though: how do mix-ins play into this? The answer is pretty simple: they are inserted into the call chain behind a Class. [Update: Ruby 2's prepend() gives us the ability to insert mix-ins in front of a Class as well.] To see what I mean, let's mix something into Child:

# ...

module Mixin
  def show_yourself
    puts "In Mixin."
    puts "Handing up the line..."
    puts
    super
  end
end

class Child < Parent
  include Mixin

  def show_yourself
    puts "In Child."
    puts "Handing up the line..."
    puts
    super
  end
end

# ...

That prints:

In the hidden singleton class.
Handing up the line...

In Child.
Handing up the line...

In Mixin.
Handing up the line...

In Parent.
Handing up the line...

In Object.

See how Mixin just hopped in line behind Child? That's what they do.

Well, that's what include() does. Isn't extend() special? Not really. The fact is that extend() is just a shortcut that translates to this:

def extend(mixin)
  singleton_class.class_eval do
    include mixin
  end
end

As you can see, extend() just does an include() under the hood, but it does it on the singleton class. Using that knowledge, we can modify our example one last time to show off a superior form of monkey patching:

# ...

module SuperiorMonkeyPatch
  def show_yourself
    puts "In SuperiorMonkeyPatch."
    puts "Handing up the line..."
    puts
    super
  end
end

o = Child.new
o.extend(SuperiorMonkeyPatch)
class << o
  def show_yourself
    puts "In the hidden singleton class."
    puts "Handing up the line..."
    puts
    super
  end
end
o.show_yourself

Now the code prints:

In the hidden singleton class.
Handing up the line...

In SuperiorMonkeyPatch.
Handing up the line...

In Child.
Handing up the line...

In Mixin.
Handing up the line...

In Parent.
Handing up the line...

In Object.

Note the SuperiorMonkeyPatch is behind the singleton class, because mix-ins are always inserted behind a Class. But that puts it in front of Child, which is right where we need it to be if we want to override some behavior.

To sum this up, the key insights are:

  • Ruby's method lookup is just a straight line
  • Using mix-ins, we can insert code at various points along that line

This is the ideal way to override behavior. Why? Well, if you monkey patch a Class, you have changed it for the whole world. We've already seen though that mix-ins give us the choice of when we want the changes and when we don't.

Furthermore, you have to be very careful when monkey patching if you want to keep the old code. This involves a dance of aliasing (really copying) the old method, replacing it, and then referring to the renamed code. (That's how the infamous alias_method_chain() in Rails works.) With a mix-in though, super can be used normally.

This makes using a Module safer. We're not all playing in one namespace and aliasing a bunch of names that could eventually collide.

To give a realistic example of how you can use this technique in practice, let's talk about a support email I received recently. The programmer who wrote me was working with a database that spit out some pretty goofy CSV. It would escape quotes with \", even though the CSV format calls for "". It also escaped other things, like null characters as \0. Because this created a non-standard data format, he couldn't find anything that could read it. That made him write me and ask if there were some way he could modify the CSV library to read it.

"No need," I told him. CSV is a simple wrapper over a normal Ruby IO object and it mostly just counts on one method gets(). When you call CSV#gets, it pulls some data from IO#gets, then parses it and returns it to you. That gives us a familiar method lookup line:

 IO#gets
   ^
   |
CSV#gets

All we need to do is get some code in the middle of that line. That code can translate from the broken CSV format to the correct format the library expects and everything else will just work. No monkey patching is required. Here's the code I sent to him:

module CSVNormalizer
  def gets(*args)
    result = super
    if result
      # FIXME:  Improve escape handling
      result.gsub!(/\\(["0])/) { $1 == "0" ? "\0" : '""' }
    end
    result
  end
end

if __FILE__ == $PROGRAM_NAME
  require "csv"

  abort "#{$PROGRAM_NAME} FILE_PATH" unless ARGV.first

  open(ARGV.first) do |io|
    io.extend(CSVNormalizer)
    csv = CSV.new(io)
    csv.each do |row|
      # use row here...
    end
  end
end

My unescaping code may not be perfect, if there are other scenarios than those he described, but that's not the point. Note how easy this was to setup: I defined a module that uses super to get the data, transforms it, and returns the fixed data; I manually setup the IO object, instead of letting CSV do it for me; I used my mix-in to make that object return good data; then I hand the now safe-to-use object on to CSV. This is a clean patch job that won't affect any other code, even if we use it in a project that also processes some normal CSV data. Perfect.

Keep Modules in Mind

Ruby gives us these different ways to construct types using mix-ins. We can combine them, use them conditionally, and even build up modified behaviors with them, as needed.

A lot of this power comes the elegance of how Ruby handles method lookup. Once you realize that message passing happens down a straight line and that you can make interceptions at points along that line, you gain a lot of control over what is happening in your programs.

I strongly encourage to play with these techniques in your programs, or just on the side. It can take a while for all of this to click, but it really expands your understanding of Ruby when you get there.

Learn More About Modules

If you enjoyed this discussion of how to make mix-ins work for you, I'm pretty sure you would also enjoy:

  • Module Magic is a presentation I gave at the Lone Star Ruby Conference in 2009. I showed some of the tricks discussed above (in less detail) and some other handy tricks you can use. My slides are also available.
  • The book Crafting Rails Applications is a detailed explanation of how Rails 3 works under the hood. One of the most significant changes from Rails 2 to Rails 3, was to rework the framework from using old style monkey patching (with alias_method_chain()) to the Module approach I've covered here. This had many benefits which this book covers very well. We discussed this book on the Ruby Rogues podcast.
  • Dave Thomas did a screencast series a while back on The Ruby Object Model and Metaprogramming. Though it uses an old version of Ruby, and misses out on some niceties added later (like the singleton_class() method), it's easily one of the best discussions of Ruby's method lookup system and a lot more. If you have struggled to understand the tricks we are using when metaprogramming, this is the place to get those questions answered.
Comments (1)
  1. Tim Rand
    Tim Rand March 27th, 2012 Reply Link

    Thanks James,
    This was an inspiring article for me. I used the mix-in pattern to extend the behavior of an array to use operators the way that R does (vectorized functions)—applying the operator to each element in the array and giving you back the collected array. Of course, array.map{|item| item * 2 } works, but it feels like wasted keystrokes coming from the R world where c(1,2,3) * 2 suffices. The metaprogramming approach to "vectorize" the operator methods might be interesting to others so I thought I'd post and share.

    vector = [0,1,4,5,19]
    
    module VectorizedOperators
      require 'mathn'
      #for any Fixnum instance operator methods
      for method in (Fixnum.instance_methods).reject{|i| i.to_s.match(/^[A-Za-z_]/)} 
        define_method(method) {|num| self.collect{|item| item.send(__method__, num) } }
      end 
    end
    
    vector.extend(VectorizedOperators)
    
    p vector * 2
    p vector ** 2 
    p vector - 40 
    p vector % 2 
    p vector == 4
    
    # >> [0, 2, 8, 10, 38]
    # >> [0, 1, 16, 25, 361]
    # >> [-40, -39, -36, -35, -21]
    # >> [0, 1, 0, 1, 1]
    # >> [false, false, true, false, false] 
    
    1. Reply (using GitHub Flavored Markdown)

      Comments on this blog are moderated. Spam is removed, formatting is fixed, and there's a zero tolerance policy on intolerance.

      Ajax loader
Leave a Comment (using GitHub Flavored Markdown)

Comments on this blog are moderated. Spam is removed, formatting is fixed, and there's a zero tolerance policy on intolerance.

Ajax loader