21
MAR2012
Learn to Love Mix-ins
The road to mastering Ruby is paved with understanding some key Ruby concepts. Mix-ins are one of those concepts. I'm sure everyone reading this knows the mechanics of how mix-ins work, but it pays to spend some time really thinking about all that mix-ins imply. Let's do just that.
Adding a Type
One of the primary reasons that Ruby needs mix-ins is that it does not support multiple inheritance. That leaves mix-ins as our only option for modeling hybrid objects. It's the way Ruby programmers can add another type.
That's a good way to think about it too: adding a type.
Take pagination, for example. Pagination methods are usually defined to return an object like this:
class PaginatedCollection < Array
# ... paginated helpers defined here ...
end
That's never really felt right to me though.
First, inheriting from Ruby's core classes can come back to bite you in some scenarios. The reason is that Ruby makes some performance tradeoffs to keep the core classes fast, but those tradeoffs mean that those classes don't always perfectly follow Ruby's rules.
You can get around some of these issues by delegating to an Array
, instead of inheriting from it. But that creates even more structure for this setup when it seems that Ruby has a totally workable built-in solution.
The name of the created Class
is another hint that it's not ideal, in my opinion: PaginatedCollection
. Collection
is just another way to say Array
, right? So why can't we just say Array
? In truth, Collection
implies a lot more than just an Array
, but solutions like this don't generally adapt to other types.
That leaves just Paginated
, which is used as an adjective here. Adjectives are generally better modeled by a Module
than a Class
. It's almost like Ruby is telling us the right way to model this concept, if we listen closely enough.
To me, it's telling us to build this:
module Paginated
def paginate(page = 1, items_per_page = size, total_items = size)
@page = page
@items_per_page = items_per_page
@total_items = total_items
end
attr_reader :page, :items_per_page, :total_items
def total_pages
(total_items / items_per_page.to_f).ceil
end
# etc...
end
collection = %w[first second third]
collection.extend(Paginated)
collection.paginate(1, 3, 10)
puts "There are #{collection.total_pages} pages."
puts "The collection is an Array." if collection.is_a?(Array)
puts "The collection is a Paginated object." if collection.is_a?(Paginated)
That seems more natural. The item is an Array
. It also happens to be Paginated
. We can understand that duality and so can Ruby.
Semantics aside though, this has actual benefits. The code works just fine if I change this line:
collection = %w[first second third]
to:
require "set"
collection = Set.new(%w[first second third])
or even:
collection = {first: 1, second: 2, third: 3}
In other words, we can now paginate whatever we want, including our own classes. We can also mix Paginated
into any result set we need to, without needing to use some special collection and a replace()
method (as you sometimes see in Rails code using will_paginate).
There's even another plus to this approach though: you can make these type decisions as the code runs. Let's dig into what that can do for us.
What do I Need Now?
Another great aspect of using mix-ins to build up these types is that objects can be different things at different times. We'll start with a somewhat silly, but illustrative, example:
class Human
def initialize(name, strength)
@name = name
@strength = strength
end
attr_reader :name, :strength
def move
"#{name} walks."
end
end
module Werewolf
module HybridForm
def strength
super + 5
end
def move
super.sub("walks", "lopes")
end
end
module WolfForm
def strength
2
end
def move
super.sub("walks", "runs on all fours")
end
end
end
human = Human.new("James", 3)
puts "Normally, #{human.name} is a #{human.class} " +
"with a strength of #{human.strength}. #{human.move}"
puts
werewolf = human.dup.extend(Werewolf::HybridForm)
puts "But when there's a full moon, " +
"strength raises to #{werewolf.strength} and #{werewolf.move}"
puts
wolf = human.dup.extend(Werewolf::WolfForm)
puts "Some also claim to have seen him as a normal wolf, " +
"with a strength of #{wolf.strength}."
puts "In this form #{wolf.move}"
That code outputs:
Normally, James is a Human with a strength of 3. James walks.
But when there's a full moon, strength raises to 8 and James lopes.
Some also claim to have seen him as a normal wolf, with a strength of 2.
In this form James runs on all fours.
As you can see, we're able to change the human
object as needed. We can add in the abilities of the Werewolf::HybridForm
or the Werewolf::WolfForm
. The object then has those behaviors.
Believe it or not, this does have practical value outside of building fantasy games. For example, I once worked on a Rails application that had some pretty complex reporting needs. The queries used to do the reporting were a bit sluggish, so we wanted to make sure that kind of thing never got run during a normal request. Thus we had a model and a separate Module
for the reporting code. They looked something like:
class User < ActiveRecord::Base
# no generate_report method here
end
module Reportable
def generate_report
# build and return a processing intensive report
end
end
The Module
wasn't even loaded by our normal application. Instead, we ran a nightly background process, on a different server, that used our backup dump of the database to do the reporting. That code is where the Module
actually lived. The reporting process then used it in a loop like this:
User.find_each do |user|
user.extend(Reportable)
report = user.generate_report
# report sending code here...
end
This has other benefits besides safety. For example, it really helps with code organization. The reporting code was quite complex and it just didn't apply to the normal workings of the application. With this setup, we only had to dig through that code when it was relevant to what we were doing. The rest of the time we could pretty much forget it was there. (We did have tests to ensure User
kept the contract Reportable
counted on, so we wouldn't accidentally break it.)
I would like to see Rails move towards supporting a system like this. I'll build a plugin at some point that lets you add various mix-ins for the models. They could be stored in the application like this:
app/models/user.rb
app/models/user/authenticatible.rb
app/models/user/reportable.rb
…
Then I'll add a method for mixing these into the query results they apply to. For example:
user_logging_in = User.behavior(Authenticatible).find_by_email(email)
and:
User.behavior(Reportable).find_each do |user|
# ... generate reports...
end
You get the idea.
OK, let's consider modules in one more way to round out this exercise.
Monkey Patching is Generally Silly
We always say that we love how dynamic Ruby is. It allows us to rewrite the rules as we go. The ultimate expression of this is how we can just reopen a class and replace some of its methods wholesale. We lovingly refer to this practice as "monkey patching."
When you think about it though, the practice is almost silly for most of the cases we use it for.
To explain that claim, we need to take a quick detour and discuss Ruby's method lookup. Instead of explaining how it works, let's just ask Ruby to show us:
class Object
def show_yourself
puts "In Object."
end
end
class Parent
def show_yourself
puts "In Parent."
puts "Handing up the line..."
puts
super
end
end
class Child < Parent
def show_yourself
puts "In Child."
puts "Handing up the line..."
puts
super
end
end
o = Child.new
o.show_yourself
That code prints:
In Child.
Handing up the line...
In Parent.
Handing up the line...
In Object.
Hopefully, that's pretty straightforward. We called a method on a Child
instance. That method then handed up to the same method defined in Parent
, because Child
inherits from Parent
. That method then hands up to Object
, because anything that doesn't declare a superclass, like Parent
, gets Object
.
In Ruby though, that's not the full story. Let's change the last chunk of code to this:
# ...
o = Child.new
class << o
def show_yourself
puts "In the hidden singleton class."
puts "Handing up the line..."
puts
super
end
end
o.show_yourself
Now we get this output:
In the hidden singleton class.
Handing up the line...
In Child.
Handing up the line...
In Parent.
Handing up the line...
In Object.
Each instance in Ruby has its own Class
, called the singleton class. (No, we're not talking about the Singleton design pattern here. This is a different use of the term.) This is why we can modify individual objects no matter what their Class
is. The specializations end up in the singleton class. Ruby mostly hides this class from us as an implementation detail, which is why I had to open the object's singleton class explicitly to get it to show up above.
We can now see that method lookup is a straight line in Ruby. You start at the bottom, with the object's singleton class and go straight up until you find a matching method.
That leads us to a question though: how do mix-ins play into this? The answer is pretty simple: they are inserted into the call chain behind a Class
. [Update: Ruby 2's prepend()
gives us the ability to insert mix-ins in front of a Class
as well.] To see what I mean, let's mix something into Child
:
# ...
module Mixin
def show_yourself
puts "In Mixin."
puts "Handing up the line..."
puts
super
end
end
class Child < Parent
include Mixin
def show_yourself
puts "In Child."
puts "Handing up the line..."
puts
super
end
end
# ...
That prints:
In the hidden singleton class.
Handing up the line...
In Child.
Handing up the line...
In Mixin.
Handing up the line...
In Parent.
Handing up the line...
In Object.
See how Mixin
just hopped in line behind Child
? That's what they do.
Well, that's what include()
does. Isn't extend()
special? Not really. The fact is that extend()
is just a shortcut that translates to this:
def extend(mixin)
singleton_class.class_eval do
include mixin
end
end
As you can see, extend()
just does an include()
under the hood, but it does it on the singleton class. Using that knowledge, we can modify our example one last time to show off a superior form of monkey patching:
# ...
module SuperiorMonkeyPatch
def show_yourself
puts "In SuperiorMonkeyPatch."
puts "Handing up the line..."
puts
super
end
end
o = Child.new
o.extend(SuperiorMonkeyPatch)
class << o
def show_yourself
puts "In the hidden singleton class."
puts "Handing up the line..."
puts
super
end
end
o.show_yourself
Now the code prints:
In the hidden singleton class.
Handing up the line...
In SuperiorMonkeyPatch.
Handing up the line...
In Child.
Handing up the line...
In Mixin.
Handing up the line...
In Parent.
Handing up the line...
In Object.
Note the SuperiorMonkeyPatch
is behind the singleton class, because mix-ins are always inserted behind a Class
. But that puts it in front of Child
, which is right where we need it to be if we want to override some behavior.
To sum this up, the key insights are:
- Ruby's method lookup is just a straight line
- Using mix-ins, we can insert code at various points along that line
This is the ideal way to override behavior. Why? Well, if you monkey patch a Class
, you have changed it for the whole world. We've already seen though that mix-ins give us the choice of when we want the changes and when we don't.
Furthermore, you have to be very careful when monkey patching if you want to keep the old code. This involves a dance of aliasing (really copying) the old method, replacing it, and then referring to the renamed code. (That's how the infamous alias_method_chain()
in Rails works.) With a mix-in though, super
can be used normally.
This makes using a Module
safer. We're not all playing in one namespace and aliasing a bunch of names that could eventually collide.
To give a realistic example of how you can use this technique in practice, let's talk about a support email I received recently. The programmer who wrote me was working with a database that spit out some pretty goofy CSV. It would escape quotes with \"
, even though the CSV format calls for ""
. It also escaped other things, like null characters as \0
. Because this created a non-standard data format, he couldn't find anything that could read it. That made him write me and ask if there were some way he could modify the CSV
library to read it.
"No need," I told him. CSV is a simple wrapper over a normal Ruby IO
object and it mostly just counts on one method gets()
. When you call CSV#gets
, it pulls some data from IO#gets
, then parses it and returns it to you. That gives us a familiar method lookup line:
IO#gets
^
|
CSV#gets
All we need to do is get some code in the middle of that line. That code can translate from the broken CSV format to the correct format the library expects and everything else will just work. No monkey patching is required. Here's the code I sent to him:
module CSVNormalizer
def gets(*args)
result = super
if result
# FIXME: Improve escape handling
result.gsub!(/\\(["0])/) { $1 == "0" ? "\0" : '""' }
end
result
end
end
if __FILE__ == $PROGRAM_NAME
require "csv"
abort "#{$PROGRAM_NAME} FILE_PATH" unless ARGV.first
open(ARGV.first) do |io|
io.extend(CSVNormalizer)
csv = CSV.new(io)
csv.each do |row|
# use row here...
end
end
end
My unescaping code may not be perfect, if there are other scenarios than those he described, but that's not the point. Note how easy this was to setup: I defined a module that uses super
to get the data, transforms it, and returns the fixed data; I manually setup the IO
object, instead of letting CSV
do it for me; I used my mix-in to make that object return good data; then I hand the now safe-to-use object on to CSV
. This is a clean patch job that won't affect any other code, even if we use it in a project that also processes some normal CSV data. Perfect.
Keep Modules in Mind
Ruby gives us these different ways to construct types using mix-ins. We can combine them, use them conditionally, and even build up modified behaviors with them, as needed.
A lot of this power comes the elegance of how Ruby handles method lookup. Once you realize that message passing happens down a straight line and that you can make interceptions at points along that line, you gain a lot of control over what is happening in your programs.
I strongly encourage to play with these techniques in your programs, or just on the side. It can take a while for all of this to click, but it really expands your understanding of Ruby when you get there.
Learn More About Modules
If you enjoyed this discussion of how to make mix-ins work for you, I'm pretty sure you would also enjoy:
- Module Magic is a presentation I gave at the Lone Star Ruby Conference in 2009. I showed some of the tricks discussed above (in less detail) and some other handy tricks you can use. My slides are also available.
- The book Crafting Rails Applications is a detailed explanation of how Rails 3 works under the hood. One of the most significant changes from Rails 2 to Rails 3, was to rework the framework from using old style monkey patching (with
alias_method_chain()
) to theModule
approach I've covered here. This had many benefits which this book covers very well. We discussed this book on the Ruby Rogues podcast. - Dave Thomas did a screencast series a while back on The Ruby Object Model and Metaprogramming. Though it uses an old version of Ruby, and misses out on some niceties added later (like the
singleton_class()
method), it's easily one of the best discussions of Ruby's method lookup system and a lot more. If you have struggled to understand the tricks we are using when metaprogramming, this is the place to get those questions answered.
Comments (1)
-
Tim Rand March 27th, 2012 Reply Link
Thanks James,
This was an inspiring article for me. I used the mix-in pattern to extend the behavior of an array to use operators the way that R does (vectorized functions)—applying the operator to each element in the array and giving you back the collected array. Of course,array.map{|item| item * 2 }
works, but it feels like wasted keystrokes coming from the R world wherec(1,2,3) * 2
suffices. The metaprogramming approach to "vectorize" the operator methods might be interesting to others so I thought I'd post and share.vector = [0,1,4,5,19] module VectorizedOperators require 'mathn' #for any Fixnum instance operator methods for method in (Fixnum.instance_methods).reject{|i| i.to_s.match(/^[A-Za-z_]/)} define_method(method) {|num| self.collect{|item| item.send(__method__, num) } } end end vector.extend(VectorizedOperators) p vector * 2 p vector ** 2 p vector - 40 p vector % 2 p vector == 4 # >> [0, 2, 8, 10, 38] # >> [0, 1, 16, 25, 361] # >> [-40, -39, -36, -35, -21] # >> [0, 1, 0, 1, 1] # >> [false, false, true, false, false]