21
MAY2012
Decorators Verses the Mix-in
It is a neat time to be involved in the Ruby community, if you ask me. A large portion of us are currently studying the techniques for doing good object oriented development. We are looking at the ideas that have come before and trying to decide the best ways to apply those ideas to our favorite language. This leads to blog posts, forum threads, and conference talks about what we are learning. No matter what, we all gain from explorations like this. Everybody wins as our collective knowledge grows. We all deserve gold stars.
So far, there's one point pretty much everyone agrees on: composition should typically be preferred to inheritance. The trickier part of that discussion though is deciding what composition looks like in Ruby. Generally you see Rubyists comparing the merits of decorators and mix-ins. [Note: the comments correctly pointed out that this was a bad use of the word "composition" on my part, to describe mix-ins.] There's a very representative thread on the excellent Objects on Rails mailing list.
Much of time, decorators seem to come out ahead, but comparing the two is pretty complex. In this article I want to add my opinions to the discussion. Let's start by looking at some examples.
Decoration
Let's say that I have a simple server that just receives messages and displays what was sent:
require "socket"
server = TCPServer.new("127.0.0.1", 61676)
Socket.accept_loop(server) do |connection, _|
while (object_sent = connection.gets)
puts "Received: %p" % [object_sent]
end
end
This is pretty crude. It doesn't even handle concurrent connections, but it will work fine for the purposes of this example.
We also need a simple client to send some messages to our server:
require "socket"
server = TCPSocket.new("127.0.0.1", 61676)
object_to_send = "A message."
server.puts object_to_send
At this point I can:
- Start the server in one shell
- Run the client in a separate shell
- See output in the server's shell
The end result looks like this from the server's shell:
$ ruby server.rb
Received: "A message.\n"
That's nice, but let's say it's not quite what I want. I can really just send String
messages using this simple setup, but perhaps I would prefer to be able to send other Ruby objects, like an Array
or Hash
.
Let's add that capability to our client and server. I'll start that process by creating a serialization.rb
file and defining a couple of decorators in there:
class Serializer
def initialize(socket)
@socket = socket
end
def puts(*objects)
objects.each do |object|
serialized = Marshal.dump(object)
@socket.write([serialized.length, serialized].pack("NA*"))
end
end
end
class Deserializer
def initialize(socket)
@socket = socket
end
def gets
if (message = @socket.read(4))
object_size = message.unpack("N").first
Marshal.load(@socket.read(object_size))
end
end
end
Both of these objects take a Socket
, store it away for later use, then wrap the simple IO
methods that my client and server are using to communicate. The wrappers manage a very simple serialization process where we convert a Ruby object to a binary representation and then send the length in bytes followed by those bytes. The other end can peek at the length, read that many bytes, and rehydrate the object.
I've skimped on the error handling here to keep the code simple. A more robust version of this code would need to make sure the serialized object length doesn't overflow the size of the number we're sending and handle the various networking errors that can occur.
Now let's update the client to support serialization:
require "socket"
require_relative "serialization"
server = TCPSocket.new("127.0.0.1", 61676)
server = Serializer.new(server) # decorate the socket
object_to_send = %w[A more complex message.]
server.puts object_to_send
Notice that, aside from the added require_relative()
call and the switch to a more complex object to show off serialization, there's really just one new line here. That line wraps the socket connection to the server in our decorator. This replaces the default puts()
logic with our more involved process. That change actually takes effect on the last line of the client, but we didn't need to modify that code. That's the whole idea behind decoration.
The server is just as easy to update:
require "socket"
require_relative "serialization"
server = TCPServer.new("127.0.0.1", 61676)
Socket.accept_loop(server) do |connection, _|
connection = Deserializer.new(connection) # decorate the socket
while (object_sent = connection.gets)
puts "Received: %p" % [object_sent]
end
end
Again, all we need to do is wrap the connection to our client with the decorator. That replaces the gets()
operation and we still didn't need to touch that code.
If I run the server and client again, we can see that the Array
does come through:
$ ruby server.rb
Received: ["A", "more", "complex", "message."]
It's time to admit that I purposefully selected this example to be something decorators are just perfect for. They really excel at these layering-on-functionality tasks. Java's IO class hierarchy is built much like the code I just showed for exactly this reason.
For comparison's sake though, let's switch to an example where I don't think decorators are as good of a fit, at least in Ruby.
Mixing It Up
I've shown an example very similar to this next one before, in an article about mix-ins, but let me flesh it out a bit more this time so we can really discuss why I handle it the way I do.
Let's say that I have some not-quite-CSV data in a file:
"Name (last, first)",Job
"Gray, James \"JEG2\"",Developer
"Gray, Dana",Full-time Mommy
"Gray, Summer","\"Cutie Pie\""
That data isn't really valid, according to the CSV standard. It should be doubling quotes to escape them (""
), not using backslashes (\"
).
I would like to read that data with the standard CSV
library so I can take advantage of the things it can do for me, like parsing out the headers. I want to use some code like the following:
require "csv"
CSV.foreach("not_quite_csv.csv", headers: true) do |row|
p row.to_hash
end
But the CSV
library doesn't recognize this format:
$ ruby read_csv.rb
…: Missing or stray quote in line 2 (CSV::MalformedCSVError)
…
To handle that, I would use a mix-in that fixes the data between the time when it is read and the time when CSV
tries to parse it. Let's code that up:
require "csv"
module CSVNormalizer
def gets(*args)
line = super
if line
line.gsub!(/"(?:\\\\|\\"|[^"])*"/) { |field|
field.gsub(/(?<!\\)\\"/) { '""' }
}
end
line
end
end
open("not_quite_csv.csv") do |file|
file.extend(CSVNormalizer)
csv = CSV.new(file, headers: true)
csv.each do |row|
p row.to_hash
end
end
CSVNormalizer
is pretty straight forward. It just wraps gets()
and gives me a place to do a few simple substitutions.
Don't lose too much sleep over understanding the regular expressions I used there. They just replace the escapes with the format CSV expects.
In the lower chunk of code, I switched how I do the hand-off to CSV
. This lets me get a hold of the File
object and extend()
it before CSV
tries to read from it.
This solution isn't really complete, because it doesn't yet handle the multi-line fields that real CSV data can contain. It does work for the data in my example though:
$ ruby read_csv.rb
{"Name (last, first)"=>"Gray, James \"JEG2\"", "Job"=>"Developer"}
{"Name (last, first)"=>"Gray, Dana", "Job"=>"Full-time Mommy"}
{"Name (last, first)"=>"Gray, Summer", "Job"=>"\"Cutie Pie\""}
The Decision
I said that the second example is better as a mix-in. But why?
I mean you could choose to solve it with a decorator, right? Of course. Let's do that to see how it goes.
At first glance, it looks like we could get away with a few minor changes:
require "csv"
class CSVNormalizer # switch to a class
# store the IO for later use
def initialize(io)
@io = io
end
def gets(*args)
line = @io.gets(*args) # forward to the IO
if line
line.gsub!(/"(?:\\\\|\\"|[^"])*"/) { |field|
field.gsub(/(?<!\\)\\"/) { '""' }
}
end
line
end
end
open("not_quite_csv.csv") do |file|
file = CSVNormalizer.new(file) # decorate
csv = CSV.new(file, headers: true)
csv.each do |row|
p row.to_hash
end
end
If you run that code, it will probably seem to run the same. But it's not equivalent. In order to show you why, I need to modify my data to use different line endings:
$ ruby -pi -e 'gsub "\n", "\r\n"' not_quite_csv.csv
I'm on Unix, so I've switched the line endings to something I might have received from a Windows machine. You would need to go the other way if you are on Windows, but the end result is that our decorator is now broken:
$ ruby read_csv.rb # using the decorator
…: Unquoted fields do not allow \r or \n (line 1). (CSV::MalformedCSVError)
…
That's interesting, because when I switch to the mix-in, it can read this data just fine:
$ ruby read_csv.rb # using the mix-in
{"Name (last, first)"=>"Gray, James \"JEG2\"", "Job"=>"Developer"}
{"Name (last, first)"=>"Gray, Dana", "Job"=>"Full-time Mommy"}
{"Name (last, first)"=>"Gray, Summer", "Job"=>"\"Cutie Pie\""}
To know why this happens, you have to understand a little about how CSV
works internally. As you can see, this whole line ending issue is kind of annoying. CSV
tries to save you from needing to worry about that. By default, it will attempt to guess your line endings. You can always specify the ending manually if you need to, but a lot of times the guessing means that your program will just work without you taking any action.
That's what's happening with the mix-in version. No matter which line endings I feed it, CSV
correctly guesses them and adapts. It just works.
Obviously we broke that in the decorator version, but how? Well, under the hood, CSV
has to peek ahead at the data to guess the endings. It expects to be passed an IO
object, so it calls on some IO
methods to fetch some data and then reset the position pointer to where it was.
Of course, this feature doesn't always work. Sometimes you can't skip around in a stream like this. Consider STDIN
or a Socket
for example. So, if CSV
cannot guess, it just defaults to the normal line endings for the current platform. That's the best we can do with no information to go on and the user can always specify the line ending manually as needed.
Now you can probably deduce why the decorator broken. It's not a full IO
stand-in. We wrapped the method CSV
needs to read data, but not the methods it uses to do the line ending guessing. That means it has to give up a go with the platform default.
All of this leads to the question: could we fix that? Sure. One option is that we could delegate the needed methods manually. Multiple methods are needed though, and if I wasn't the guy who wrote CSV
, I wouldn't really know what they are. Plus, they could change some day. (That has actually happened. The algorithm was changed at one point to use different methods so it would work with more IO
-like objects.) Given that, it's probably better to just forward any messages I don't want to wrap to the underlying IO
object. That's easy enough to do:
require "csv"
class CSVNormalizer < BasicObject # make sure we delegate most calls
def initialize(io)
@io = io
end
def gets(*args)
line = @io.gets(*args)
if line
line.gsub!(/"(?:\\\\|\\"|[^"])*"/) { |field|
field.gsub(/(?<!\\)\\"/) { '""' }
}
end
line
end
# forward all other messages to the IO object
def method_missing(method, *args, &block)
@io.send(method, *args, &block)
end
end
open("not_quite_csv.csv") do |file|
file = CSVNormalizer.new(file)
csv = CSV.new(file, headers: true)
csv.each do |row|
p row.to_hash
end
end
As the comments show, I only made two changes. First, I explicitly inherited from BasicObject
instead of accepting the default Object
parent. This means that my decorator responds to very few messages. Almost all incoming messages will hit method_missing()
because of that change. The second change was to provide that method_missing()
and have it push everything down to the underlying IO
. This does restore the line ending guessing functionality and get the decorator fully working:
$ ruby read_csv.rb # using the decorator again with tricky line endings
{"Name (last, first)"=>"Gray, James \"JEG2\"", "Job"=>"Developer"}
{"Name (last, first)"=>"Gray, Dana", "Job"=>"Full-time Mommy"}
{"Name (last, first)"=>"Gray, Summer", "Job"=>"\"Cutie Pie\""}
We now know what it takes build the equivalent decorator. If we compare the two, I say the mix-in is superior in this case. Here are my reasons:
- The whole point of the
BasicObject
andmethod_missing()
trick is to do our own method dispatching here. Why do that? Ruby is better at that kind of thing than we are, so I would rather leave that task to the language whenever we can. What do I mean by "better?" Well, for one thing, we can usesuper
when Ruby is doing the job. For another, Ruby is going to be much faster when dispatching. Rack (another good example of decorators) has a little of this latter problem. As frameworks that use Rack, like Rails, move more and more functionality into middleware, the call stack just gets longer and longer. This is the primary reason that they filter backtraces in Rails and it has slowed things down a little as incoming requests must pass through all of the layers. - The decorator object I ended up creating lies about what it really is. Consider methods like
class()
,is_a?()
, andancestors()
. These are all going to tell me that I have aFile
object in the case above. That's only sort of accurate though. I could probably dropBasicObject
, in this case, but then the lies just change to leave out any mention of the hiddenFile
object. Compare that with the mix-in version which is fully reflection compliant. It can tell you all about its "types" no matter how many there are.
Because of this, I tend to prefer mix-ins for cases where I am going to delegate an entire API. I feel like the advantages are pretty clear for those cases.
Arguments for and Against Decorators
I'm not trying to say there's no good uses for decorators. I've already provided some in this article. Greg Brown gives an even more spirited defense of them in Practicing Ruby 3.7, which is probably the most detailed, Ruby focused discussion of these issues to date. Greg is discussing higher level inheritance concepts in that article, but his examples largely boil down to an analysis of mix-ins and decorators.
I think Greg has some good and not-so-good points.
The best argument in favor of decorators, by far in my opinion, is the encapsulation issue. I think we normally envision encapsulation as hiding our data from the outside world, but in this case we're talking more about hiding it from ourselves. When you have some object with several mix-ins, how do we know that none of those methods have instance variable conflicts? Heck, the methods could even be accidentally overriding each other with unrelated functionality if we're not careful with naming. This could create some maddening bugs where things like the load order matter. These are very real and tough issues.
How often are we bitten by this encapsulation issue? I wish I knew. I think a strong test suite can help to reduce it. I also imagine it's less of a problem when most of the API is written by a single team. I'm guessing they would be more likely to catch misuses or impose naming conventions the protect themselves from these issues. But in the world where we embrace the value of sending Pull Requests on GitHub to fix most problems, I'm not sure how much of this safety we can count on.
I am less sold on the "bloated contracts" or "too many entry points" arguments that I see in many places, including Greg's post and the Objects on Rails thread I mentioned earlier. The example everyone throws around is ActiveRecord::Base
which does have a ton of mix-ins. However, languages, libraries, and frameworks (like ActiveRecord) have different goals than we do for our application code.
We want to keep application code as minimal as possible and expose even less than we write whenever possible. This helps with many, many aspects of application development.
As I mentioned though, languages, libraries, and frameworks have almost the opposite goals. They want to cast their nets far and wide. They expose a lot and let your code decide which pieces it needs to do its work.
This makes sense when you think about it. The more a library does for us, the less application code we end up writing. That satisfies both aims.
Even if you sympathize with the bloated contract argument, I'm not convinced that decorators do much for it. Each individual object may respond to less messages, but is that going to save you from reading the documentation of all of those decorator classes? It didn't save me a ton of memorization when I passed the Java certification, I can tell you that much. The big API is still there, even if we do a better job of hiding it.
Testability is another complaint I often see leveled at mix-ins that I'm skeptical of. By way of example, let's test the mix-in I made earlier. I created a csv_spec.rb
file and added this code to it to get started:
module CSVNormalizer
def gets(*args)
line = super
if line
line.gsub!(/"(?:\\\\|\\"|[^"])*"/) { |field|
field.gsub(/(?<!\\)\\"/) { '""' }
}
end
line
end
end
describe CSVNormalizer do
let(:message_queue) { [ ] }
def gets(*_)
message_queue.shift
end
before do
extend(CSVNormalizer)
end
it "translates escaped quotes as they are read" do
message_queue << '"a \"quoted\" field"'
gets.should eq('"a ""quoted"" field"')
end
# ...
end
As you can see, I just gave my example group the expected gets()
method and then extended it with my CSVNormalizer
. From there I can test normally. I don't feel like I'm jumping through a lot of extra hoops here compared to testing a normal object, but you be the judge.
The Winner Is…
I don't think we can definitively say that one approach is superior to the other in all cases. I hope given compelling use cases for both in this article.
Mix-ins seems to be the underdog in most of these discussions, but I'm not quite ready to write them off. I hope I've shown that there are at least some things going for them.
As with most things in programming, I think it pays to try to understand as many of these angles as we can and make the best choices we can on a case by case basis.
The biggest question remaining for me is if we can do better at modeling large API's, like ActiveRecord, with the tools Ruby gives us. Is there a good way to use composition there without the mix-ins? I'm not sure. Hopefully some Rubyist will improve on the formula and enlighten us.
Further Reading
If you really enjoy this kind of analysis about how to build objects, there are some terrific resources out there that you might want to take a deeper look at:
- Smalltalk Best Practice Patterns can be expensive to pick up these days and you may need to learn a new language just to read it. The truth is though that it's totally worth it. I learned more about objects from this one book than probably all others combined. While it doesn't cover these issues directly, it gave me a lot of the foundation that I use to think through these issues.
- The Objects on Rails mailing list started out as a place to give Avdi Grimm feedback for the ebook of the same name. Luckily, it seems to be sticking around even now that the book is out and, better yet, growing into the go-to source for these discussions in Rubyland. Reading that list is pretty addictive for object geeks like me.
- Design Patterns in Ruby does a great job of looking the various design patterns, like decorators, both in a traditional light and through Ruby colored glasses. This can provide quite a bit of insight when it comes to adapting patterns from other languages.
Comments (7)
-
Gregory Brown May 22nd, 2012 Reply Link
Hi James,
Very interesting article. I'm sure I'll have more comments for you later, but for now I wanted to drop a link to an article I wrote back in 2009 which attempts to use BOTH decorators and mixins together :)
http://blog.rubybestpractices.com/posts/gregory/008-decorator-delegator-disco.html
I never really ended up using this pattern in production, which is a sign that it was probably overkill. But it attempts a middle of the road solution to this problem.
-
Oh also, one important point:
So far, there's one point pretty much everyone agrees on: composition should typically be preferred to inheritance. The trickier part of that discussion though is deciding what composition looks like in Ruby. Generally you see Rubyists comparing the merits of decorators and mix-ins.
I would strongly suggest here that mixins should not be considered a form of composition at all. I would say that a couple defining characteristics of composition are the following:
- A composite object achieves code reuse through delegation (automatic or manual) to well-encapsulated sub-objects.
- Methods defined by the composite are invisible to its sub-objects (unless the composite explicitly passes a self-reference to its sub-objects). In other words, composite objects cannot affect sub-object behavior through late binding.
These two points are what separate composition from inheritance conceptually. Everything else is a matter of language semantics (i.e. different ways to model the same thing). With these points in mind, it's easy to see that mixins behave much more like class inheritance than they do composition.
In fact, it is almost fair to say that mixins are a limited form of multiple inheritance which cannot exist within a hierarchy and cannot be directly instantiated. Since calling mixins inheritance sounds a bit weird, I think maybe the term "implementation sharing" is a good one to use. Both class inheritance and mixins rely on implementation sharing, but composition CANNOT, otherwise it would not be composition. That forces composite objects to be written against interfaces only and not implementation details, which is arguably one of their benefits (of course, nothing says that mixins/class inheritance can't be used in the same way with a bit of care).
Anyway, this doesn't affect the validity of the rest of your article in any way, but I wanted to point out that at least from my point of view, it seems like if you are willing to consider mixins a form of composition, the distinction between composition and class based inheritance blurs too much.
-
Really great article, thanks for sharing your thoughts. I think it's a bit weird to talk about mixins as being a form of composition, because at the Ruby language-level it's clearly implemented as inheritance. When a module is included in a class, Ruby just creates a new "wrapper/proxy class" and inserts it right above the including class. So if
A
includes moduleB
, then moduleB
's wrapper/proxy class is inserted directly aboveA
. This means thatB
essentially becomesA
's superclass, just as ifB
were a class and we didclass A < B
. To me, composition is one object that holds a reference to an instance of another object, which is orthogonal to including a mixin. All of these concepts are well articulated in Paolo Perrotta's Metaprogramming Ruby book.I was recently looking at a new queueing Ruby library called Qless yesterday. They recently added middleware support for your workers. They describe that you can add custom middleware to the queueing system by opening up the Worker class directly and injecting middleware:
require 'qless/worker' Qless::Worker.class_eval do include ReEstablishDBConnection include SomeOtherAwesomeMiddleware end
Then, in your middleware, you would do this:
module MyMiddleware1 def around_perform(job) # do something with the job # invoke the next guy with "super" super end end
What do you guys think about using mixins for this type of architecture? This makes heavy use of the fact that Ruby's method lookup is a straight line, as JEG2 loves to say. :) However, once you start creating complex middleware that introduce other methods, you end up having a worker (the guy that includes the middleware) that has the potential to have namespace clashes with methods/state. Also, if you start including middleware that was created by other people, you can't really control which methods you're adding into your worker's method lookup path and avoid collisions.
How would the composition-style version of this look? Sidekiq uses a more composition approach for adding middleware:
Sidekiq.configure_server do |config| config.server_middleware do |chain| chain.add MyServerHook end end
An example middleware is not a module, but instead a simple class that is instantiated and invoked:
class MyMiddleware def call(*args) # do something fun # .... # invoke yield to move forward in the middleware chain yield end end
I thought this was another interesting concrete example to compare mixins/inheritance vs. composition.
-
Ryan,
I prefer the Sidekiq style, and that's how Newman works. In fact, the preferred method for building applications in Newman (which are essentially middleware) is to use Newman::Application.new which actually defines the whole application at the instance level! The newman server just expects a list of objects responding to call(), and runs them in sequence.
That said, ActiveRecord plugins have taken the "just mix them all into one object" approach for quite some time and though I'm sure clashes happen, they are impressively more rare than we might expect!
-
Greg and Ryan: good point about my use of the word composition. It's not really appropriate there as you note.
As for the queuing example, we're essentially talking about Rack's design. I agree that it's preferred for that case, but beware that it does have tradeoffs. I mentioned the effects of the lengthened call stack (like speed) in my article. Aaron has also talked many times about how Rack's design sort of pushes all manner of extensions into one pipe. It could be much more efficient, at least in the case of Rails, to have separate concepts for areas like output filters, resource managers, etc.
-
-
Also, it's worth noting that Avdi Grimm wrote a similar article to mine before I did, called Decoration is best, except when it isn't. He has some great examples in there and talks about the connascence it can push us into. Be sure to check it out as well.
-
Yeah, you don't want an infinitely growing chain of middleware, for sure. I liked Aaron's suggestion of keeping the same basic API (chains of objects that respond to call), but introducing diversity at the top level, so rather than having one chain, you have one per common extension point.
It looks like Avdi's main point in that article was that some designs depend on late binding, and composition prohibits it. I may not have the time to get to it soon, but I wonder if that's a sign that a design which does not depend on late binding might make composition easier here. That's a question I plan to dig much deeper into via composingruby.com, so I'm glad to see a practical example of when it can be a problem.
The more I read about (and try to write about) composition vs. inheritance, the more I am thinking that it's not fair to simply translate examples without changing their overall design. Designs that are well suited for composition will make inheritance (both mixins and classes) look bad, and vice-versa. So I think instead we should seek to come up with a common scenario and then implement it in two very different ways if necessary, to allow each technique to put its best foot forward.
Of course, that's a tricky thing to do, and I think that I'm going to try to let these ideas stew in my mind for a while before discussing all of this further. I'm really glad we're working through these issues though!
-