Gray Soft / Rubies in the Rough / A Stylish Critique

1

APR
2012

A Stylish Critique

Before getting started, I feel compelled to point out that my dictionary defines a critique as "a detailed analysis and assessment of something." It seems like we often assume the worst of that word, but that's not how I intend it here.

The Ruby community seems to be talking about style guides lately. So let's talk about them.

The fact is that you will have many choices if you go looking for style guides for our favorite language. You can pick from:

The old stuff
Google's way
GitHub's popular guide
Guides from respected Rubyists, like Dan Kubb
One of the many forks of the popular guides on GitHub
And many more

It's obvious that Rubyists care about this topic. Let's see what's out there and consider what really is and is not useful from these guides.

What is a style guide really, and why do we even have them?

Defining style guides is surprisingly tough. I suspect they started out as formatting rules for code, but they have evolved pretty far beyond that now.

Most guides include general conventions that the author feels are important when writing the language in question. This can go all the way down to how to use certain constructs, opinions on what the author considers idiomatic, and syntax to outright avoid.

To sum it up, a style guide is often a "How I Write Language X" opinion piece from the author.

Style guides do exist for pretty good reasons. First, most programmers put some effort into making sure that their code looks a certain way. Generally this is to make it easier to read. However, you may not see the code as intended if you are using a different indent size or tabs instead of the spaces they intended.

With teams, these conventions become even more important. Teams are trying to work in concert, so they need to be able to understand code not written by them. A set of conventions the team follows can really help this along.

And when you think about it, who isn't on a team these days? Aren't open source developers a team? The entire Ruby community might even be a team with the way we interact online.

It seems like this is a topic we should all care about, at least a little.

The Rules for the Rules

Even though we have decided this topic is important, we must never forget how careful we have to be when imposing rules on others. The goal is to make communication between us easier, not to restrict creativity.

In my experience, one rule seems to hold sway over all other rules in life whether those rules are for parenting, laws, or programming style guides:

Use the fewest amount of rules you can possibly get away with.

If there are more rules than I can easily internalize, the entire point is defeated. Odds are that the authors don't even consistently follow their own long lists.

Now, it is worth noting that some list size variation make sense. For example, GitHub needs to keep their rules minimal for the reasons I have just mentioned. However, Dan Kubb talked to us on the Ruby Rogues about how he uses his style guide to try and restrict himself into being more aware as a programmer. It makes sense that he would have more rules for that.

That said, Dan's list is still probably a little long, in my opinion. It's just too much. Some of his rules actually encourage me to tune out what he's saying, like:

Avoid bugs.
Use common sense.

Again, if you lose your audience, the whole point is lost.

The classic book that describes itself as a style guide, Smalltalk Best Practice Patterns, includes ten "Formatting" patterns. That strikes me as a pretty common sense upper limit.

Another rule that's key is one that most parents know:

Have a good reason for the rule.

If your kid hits you with, "But why?," and you have to counter with, "Because I said so!," you lose. Rules need to be important. Important things have a reason for their existence. The reason needs to be a good one too.

If you are just cargo culting some rules, I would recommend actively breaking them for a time. You'll figure out pretty quick if they are important or not. If they are, you'll have the reason. If they aren't, you'll be free of a useless constraint.

For example, Ryan Davis's old rules say, "Use parens to disambiguate, otherwise avoid them." Ultimately, that leads to the "Seattle Style" of method definitions. GitHub's newer rules say, "Use def with parentheses when there are arguments." Both sides will tell you that it reads better.

I personally find that my eye seems to drift more without the parentheses, so I prefer GitHub's style in particular this case. However, the Seattle group does have a significant following.

To me, this implies that it isn't likely worth a rule. The reason doesn't seem to hold in enough cases to be considered good. Both of these groups seem to be able to coexist in our community without creating two versions of every gem, just to rewrite the method definitions. I say let it go.

Now, both sides do agree to leave parentheses off of argument-less definitions and calls. Perhaps that means that one is rule worthy. It's definitely gray area though.

Finally, with style guides, I think it's critical that we remember the most important thing I learned from Smalltalk Best Practice Patterns:

Code's primary purpose is to communicate with the reader.

If you make a rule that violates that, it's instantly very suspect. For example, GitHub's guide has this rule:

The and and or keywords are banned. It's just not worth it. Always use && and || instead.

Of course, not everyone agrees. The problem with this rule is that is that it makes Ruby less expressive. Those operators actually have a purpose. Though it's not popular, I even use them in conditionals, because I think they read better. [Update: I was eventually swayed by Avdi's argument so I no longer do this.]

GitHub is likely worried about how these operators interact with assignment statements. It can trip up new Rubyists. If you read Avdi's post though, you'll see that the precedence of these operators is actually why they exist. Even my usage of them in conditionals is rarely a problem, since assignments are not common in conditionals. Plus, if you do assign in a conditional, GitHub has a rule that fixes the precedence even with an and/or:

Using the return value of = (an assignment) is ok, but surround the assignment with parenthesis.

That's a neat rule, if you ask me, because it allows for what I consider to be a natural programming construct in Ruby: run this code if I can assign this variable. For example:

if (message = inbox.find { |m| m.unread? })
  # ... work with message here...
end

But, the parentheses flag the reader that you haven't made the common bug of confusing = with ==.

Getting back to the and/or case, I think education about these operators beats forbidding them.

What's Our Style?

Enough blathering on about how to make rules and why they exist. Let's take a look at what's actually out there for us Rubyists.

We Agree About

I think it's safe to say that the majority of Ruby style guides agree about at least these points:

An indent should be two spaces.
Use spaces, not tabs.
Keep lines at 80 characters in length or less.

A lot of communities fight pretty heavily over these points, but the Ruby community seems to pretty universally agree. The reasoning is also sound: two spaces is enough to notice, but it leaves you plenty of room on the line. Spaces make sure I see what you see, even if our editors have different settings.

The last one surprises me a little. I strongly believe in it, but I see a lot of Ruby code that doesn't. Still almost all of the guides I checked had it in there.

There were many more rules about spacing in the guides. While I agreed with a lot of them, I'm not sure everyone would. Are we certain sure it's select { |arg| arg } instead of select {|arg| arg }? I use the former, but I see some of the latter and it doesn't throw me into a rage. I'm just not sure I'm willing to burn a rule on stuff like that.

We also seem to agree on how to name things:

Use snake_case for variable and method names, CamelCase for classes and modules, and ALL_CAPS_SNAKE_CASE for other constants.

The reasoning is pretty obvious. This helps you tell what you are looking at right away. It makes it much easier to read code.

What else do most of the style guides care about? How you indent your case statements. They definitely prefer this style:

case age
when 0..10
  puts "Too young."
when 11..65
  puts "Access granted."
when 66..110
  puts "Too old."
else
  puts "Malformed age."
end

Or even:

response = case age
           when 0..10   then "Too young."
           when 11..65  then "Access granted."
           when 66..110 then "Too old."
           else              "Malformed age."
           end

But never:

case age
  when 0..10
    puts "Too young."
  when 11..65
    puts "Access granted."
  when 66..110
    puts "Too old."
  else
    puts "Malformed age."
end

Or:

response = case age
             when 0..10   then "Too young."
             when 11..65  then "Access granted."
             when 66..110 then "Too old."
             else              "Malformed age."
           end

The when statements need to line up with the case. The reasoning? I have no idea, but it looks right to me!

The Parts I Found Interesting

The Google style guide mentioned using Ruby's block comments to hide headers and footers in your code. For example:

=begin
  * Name:
  * Description:
  * Author:
  * Date:
  * License:
=end

I'm not sure how I feel about that. I worry about programming constructs that can go off-screen. To give an example, I don't like this method of declaring class methods:

class Whatever
  class << self
    def class_method
      # ...
    end
  end
end

The reason is that you could end up facing something line this:

class Whatever
  class << self
    # a page of definitions here...
    def what_am_i
      # ...
    end
    # another page of definitions here...
  end
end

If you are centered on the definition of what_am_i, the only way you can know it's defined as a class method is if you pick up on the extra indentation. I don't think that communicates well, so I prefer the following technique, even if it does introduce some mild duplication:

class Whatever
  def self.class_method_1
    # ...
  end

  def self.class_method_2
    # ...
  end
end

Block comments are subject to the same off-screen issue, but a smart editor is going to save you with them. It should syntax highlight the entire chunk as a comment, so you would still know what you are in the middle of.

I doubt this is rule worthy in either case, but it's interesting to think about.

I found the following rule from GitHub's guide interesting mainly because it nails my bad habit:

Prefer single-quoted Strings when you don't need string interpolation or special symbols such as \t, \n, ', etc.

I don't do this. Instead I use double-quoted Strings most of the time. I switch to single-quoting only in rare cases where it reduces some escaping. Even then, I'm just as likely to switch to a special String form, like %Q{...}.

My reasons are pretty weak. It's mainly that I too often found myself having to switch a single-quoted String to double-quotes so I could introduce some interpolation. It also helps in my editor, since I can have it always pair double-quotes while never pairing singles. I'll be typing the former to make Strings but the latter for things like apostrophes, so this is generally perfect.

The guide's rule is probably superior though, because it communicates with the reader that they are looking at a simple String.

Dan Kubb's guide led me to read about a couple of interesting concepts that I hadn't thought about enough:

Try to have methods either return the state of the object and have no side effects, or return self and have side effects. This is otherwise known as Command-query separation (CQS):

http://en.wikipedia.org/wiki/Command-query_separation
Do not program defensively. (See http://www.erlang.se/doc/programming_rules.shtml#HDR11.)

GitHub has a rule I doubt is common:

Indent the public, protected, and private methods as much the method definitions they apply to. Leave one blank line above them.

Here's the sample code they give for that:

class SomeClass
  def public_method
    # ...
  end

  private
  def private_method
    # ...
  end
end

I think it's more common to see a blank line below it as well:

class SomeClass
  def public_method
    # ...
  end

  private

  def private_method
    # ...
  end
end

I have also seen it "outdented" like this:

class SomeClass
  def public_method
    # ...
  end

private

  def private_method
    # ...
  end
end

That doesn't feel right to me though, since we indent other inline class method calls. I do see how it kind of delimits a section though, much like rescue would inside a method definition. I think you can make the argument either way.

There was another variant that you use to see a lot, but even Rails has decided it's wrong now:

class SomeClass
  def public_method
    # ...
  end

  private

    def private_method
      # ...
    end
end

Aaron Patterson says this is wrong because it's not how Emacs indents it. I think he was joking at the time, but it turns out that he may be more right about that than we could have guessed.

The first rule in Bozhidar Batsov's "Comments" section was:

Write self-documenting code and ignore the rest of this section. Seriously!

I really like that. The more programming experience I get the less respect I have for comments. I do use annotations, like TODO and FIXME (and Bozhidar has rules for those too), but most other comments are suspect, in my opinion.

The guides definitely disagreed heavily about which documentation scheme we should be using or if we should even be using one at all.

I also learned a neat trick from Bozhidar's guide that I hadn't encountered before:

When using class_eval (or other eval) with String interpolation, add a comment block showing its appearance if interpolated (a practice I learned from the Rails code):

# from activesupport/lib/active_support/core_ext/string/output_safety.rb
UNSAFE_STRING_METHODS.each do |unsafe_method|
  if 'String'.respond_to?(unsafe_method)
    class_eval <<-EOT, __FILE__, __LINE__ + 1
      # ...

      def #{unsafe_method}!(*args)  # def capitalize!(*args)
        @dirty = true               #   @dirty = true
        super                       #   super
      end                           # end
    EOT
  end
end

But the most entertaining rule award definitely goes to Ryan Davis for this gem:

Don’t rescue Exception. EVER. Or I will stab you.

What Are They Just Wrong About?

I hate to say it, but some rules were just bad. Let's look at a handful that bugged me. First, I submit:

Do not use single letter variable names. Avoid uncommunicative names.

I don't even think the first half of that rule agrees with the second half of that rule. Consider this code:

ary.each_index do |i|
  # ...
end

There's nothing wrong with that code. i is kind of the universal variable name for an index and as such it's even more communicative than a longer name, like index.

Avoid multi-line ?: (the ternary operator), use if/unless instead.

While I understand where this rule is coming from, it's just a bad idea to draw hard lines in the sand like this. You're sure this is always wrong? I'm not.

Ruby 1.9 even supports a nice new syntax for this:

result = o.test? ? o.calc_true_result
                 : o.calc_false_result

Now that Ruby allows those operators to line up, this can be a very readable construct, in my opinion.

Avoid using $1-9 as it can be hard to track what they contain. Named groups can be used instead.

Almost all of the guides did contain a rule like this, but I see a lot of Ruby code and variables like $1 are extremely common. The issue is that the MatchData methods aren't all that more expressive and it can be awkward to get a hold of the object to use them on.

I do like the named variables that this rule recommends (added in Ruby 1.9), but they can make a simple regex quite a bit wordier and introduce local variables in the recommended usage. I don't think I'm comfortable saying that's always OK.

I'm not really sure what the perceived problem is with $1 and friends either. Programmers seem to understand them just fine. Some may worry that they are evil globals, but the fact is that they are magic Thread local variables that just do the right thing.

Until I see a good reason for this rule, I'm not buying it.

Do not use inject in library code. In app code it is ok.

I saw this rule on Dan Kubb's guide earlier this week, but it was already removed before I published this article. I'm glad to see he's over this one.

I think this stems from a fear of inject() being a little slower. It's true that it is. Here's a benchmark that shows the problem:

require "benchmark"

def each_sum(numbers)
  sum = 0
  numbers.each do |n|
    sum += n
  end
  sum
end

def inject_sum(numbers)
  numbers.inject(0) { |sum, n| sum + n }
end

TESTS   = 10_000_000
NUMBERS = 1..10

Benchmark.bmbm do |bench|
  bench.report("each_sum") do
    TESTS.times do
      each_sum(NUMBERS)
    end
  end

  bench.report("inject_sum") do
    TESTS.times do
      inject_sum(NUMBERS)
    end
  end
end

That shows the following on Ruby 1.9.3:

Rehearsal ----------------------------------------------
each_sum     7.440000   0.000000   7.440000 (  7.442971)
inject_sum  10.170000   0.010000  10.180000 ( 10.177593)
------------------------------------ total: 17.620000sec

                 user     system      total        real
each_sum     7.540000   0.000000   7.540000 (  7.546324)
inject_sum  10.220000   0.000000  10.220000 ( 10.217871)

As you can see, it's not that slow. Many iterators have a similar penalty. I had to use ten million iterations to get it to show up well.

The situation use to be worse though. It's more than twice as slow as each on Ruby 1.8.7:

Rehearsal ----------------------------------------------
each_sum    23.340000   0.010000  23.350000 ( 23.354953)
inject_sum  51.260000   0.020000  51.280000 ( 51.279145)
------------------------------------ total: 74.630000sec

                 user     system      total        real
each_sum    23.730000   0.010000  23.740000 ( 23.734525)
inject_sum  52.320000   0.010000  52.330000 ( 52.323466)

So perhaps that's where this fear came from.

In any event, don't avoid the iterators. Use them for what they are for. Remember, expressive code is always the first goal.

If you find something to be too slow, you can always refactor for performance. That should only be needed in extremely rare edge cases though. Go for readability first.

Hopefully this section reminds us all that we always need to treat these style guides as suggestions, not laws. You're the programmer, so you're in charge. No exceptions.

Worth a Read

I've dropped a ton of links in this article, but it really is worth taking a little time to read through at least some of these style guides. They cover a ton of ideas, good and bad, that I didn't choose to rehash here. If nothing else, I promise that they will get you thinking.

If you are pressed for time, start with the GitHub guide. It seems to reflect a lot of current thinking. Some of it is controversial though, as I've noted. If you have time for a second source, check out Dan Kubb's guide. It is a good representation of the various forks and it contains some neat ideas, even if it does probably go a little too far in some areas.

In: Rubies in the Rough | Tags: Style | 1 Comment

Comments (1)

Adam Weller April 2nd, 2012 Reply Link

Great article, really like the wisdom in this:

If you are just cargo culting some rules, I would recommend actively breaking them for a time. You'll figure out pretty quick if they are important or not. If they are, you'll have the reason. If they aren't, you'll be free of a useless constraint.
1. Reply (using GitHub Flavored Markdown)
  
  Comments on this blog are moderated. Spam is removed, formatting is fixed, and there's a zero tolerance policy on intolerance.
  
  Or login with:
  Name * Email URL Comment *