Gray Soft / Rubies in the Rough / Refactoring: rcat

21

DEC
2011

Refactoring: rcat

I use these Rubies in the Rough articles to teach how I think about code. Well, I have a scary admission to make: I didn't really understand refactoring until I was many years into being a programmer. Sure, I knew what it meant, but I just didn't get it. I hope to save you from the same mistake.

Refactoring is important. Very important. It may be one of the most important things we do as programmers. If I learned one thing from reading Smalltalk Best Practice Patterns, it's that code's primary purpose is to communicate with the reader. Let's face it though, when we are trying to get something working, it's often like stumbling around in the dark. We are running into all kinds of things, breaking stuff, and just trying to reach that "Holy cow it works!" moment. We're probably not thinking too long and hard about how well this mess we are making communicates and that's perfectly fine.

Refactoring is where you get to fix that. It's about taking working code and making it sexy. Note that I said it starts with working code. Until you have that, there's nothing worth communicating to a potential reader. Make it work, then make it sexy. (I believe that saying really involves speed, but that's a very different conversation we can have at a later date.)

Given the importance I place on this topic, refactoring will be a recurring theme in these articles. Expect to regularly see articles where we try to clean some code up. It's important that we practice this process and get good at it.

rcat

In Practicing Ruby 2.9, Gregory Brown builds a clone of the Unix cat program in pure Ruby. He calls this clone rcat. It's a great article and I recommend that you take a moment to read it before you finish reading this.

It turns out that the code for rcat is a great refactoring exercise for several reasons:

Most of us are probably familiar with what it does
It's working code
It has tests, but they are a little unusual
It's pretty darn good code

Did the last point surprise you? Yeah, I said it's good code. That's a win for us.

Hideous code is hard. You have to start by learning a lot about it. You just can't start editing it until you figure out which strings are safe to pull on and which are too risky. Then you have to attack it with the utmost care, baby stepping your way from safe foothold to safe foothold and verifying the heck out of what you are doing.

I'm not saying those skills aren't useful. Of course they are. It's a complex process though and I don't want to take us there on our first outing. We will get to that down the road a bit.

For now, let's look at some code that does a lot of things right. Most code could be better in some areas and rcat is no different. It makes a couple of minor missteps and we should be able to safely work on improving those.

You First

That's the task. We're going to try making some minor improvements to rcat.

I want you to work on a specific version of rcat: the one initially released with the article. That's the way I first saw it. Greg has made a couple of minor improvements to it since then and I would rather we had the chance to spot those issues too. So, here's how you can get a copy of the code locally and have it be on the version I want you to see:

$ git clone https://github.com/elm-city-craftworks/rcat.git
Cloning into rcat...
remote: Counting objects: 85, done.
remote: Compressing objects: 100% (50/50), done.
remote: Total 85 (delta 26), reused 82 (delta 23)
Unpacking objects: 100% (85/85), done.
$ cd rcat
$ git branch as_released e28f2c
$ git checkout as_released
Switched to branch 'as_released'

Now, when you see these refactoring articles, you can do what you always do an just read along. I encourage you not to do that. You may learn something that way, but you just won't learn as much as you could. Instead, take a swing at the problem yourself. You probably won't do what I do and that will be just great. That will help you see the differences between what you focus on and what I focus on. If we do attack the same areas, then you will get to see the differences in how we go about it.

In this particular example, because it is good code, it will just be great to see if you can find the problems I have with it. They are fairly minor points, so consider this a test of your observation skills. Even if you don't do the edits before you see mine, just trying to spot what I'm going to fix could be helpful.

So, run the commands above to get the code and take a look through it before you read on. If you are feeling really adventurous, try making some fixes yourself.

That One Bump in the Testing Road

We're in luck, rcat has tests!

These tests are a little unusual in that they don't use a testing framework. It's just a file of tests that Greg wrote, comparing rcat to cat in several different scenarios. That makes me want to try the obvious:

$ ruby tests.rb 
tests.rb:20:in ``': No such file or directory - rcat … (Errno::ENOENT)
    from tests.rb:20:in `<main>'

Doh! We were doing so well too.

I must admit, this is totally normal for my running-other-people's-test track record. There always seems to be one non-obvious requirement to run them. For example, they may run but all error out while trying to connect to Redis. Then I fire up Redis and they work fine. It's sad that this is the norm, because we really want it to be trivial for everyone to run our tests!

I guess I have found the first thing I want to fix.

If you read the Practicing Ruby article that described rcat you'll see that it talked about Greg setting an environment variable to get the rcat executable in his PATH. That tells us what the problem probably is, but we can't use Greg's exact suggestion since it's an absolute path on his machine. I tried translating it to a relative path that would work for me to see if I can get the tests working:

$ PATH="./bin:$PATH" ruby tests.rb 
You passed the tests, yay!

Bingo. All I did was update my PATH variable for the command that followed by prepending the ./bin directory that holds the rcat executable. Now we know how to get the tests to work.

Really, the tests should do that for us. They know where they are and where rcat is. They should just handle this. Let's make that change.

At this point, it's time to actually peak at the tests.rb file. Skipping some comments at the top, this is how the code begins:

require "open3"

working_dir = File.dirname(__FILE__)
gettysburg_file = "#{working_dir}/data/gettysburg.txt"
spaced_file     = "#{working_dir}/data/spaced_out.txt"

############################################################################

cat_output  = `cat #{gettysburg_file}`
rcat_output = `rcat #{gettysburg_file}`

fail "Failed 'cat == rcat'" unless cat_output == rcat_output

############################################################################

cat_output  = `cat #{gettysburg_file} #{spaced_file}`
rcat_output = `rcat #{gettysburg_file} #{spaced_file}`

fail "Failed 'cat [f1 f2] == rcat [f1 f2]'" unless cat_output == rcat_output

############################################################################

…

The first chunk shows that Greg was already aware of these path issues. He solved them for the data files by building local paths to them.

The second and third chunks show a couple of the tests. They are just shelling out to both programs and comparing the output. That's easy enough to understand.

We could solve the rcat problem the same way Greg handles the data files, but that will require us to edit every test. It would also create a strange asymmetry:

cat_output  = `cat …`
rcat_output = `#{rcat} …`

Those aren't a super big deal, but it makes things just a touch harder to understand and I would rather not do that. Given that, let's just twiddle the path as I did externally. I changed the top code to:

require "open3"

working_dir     = File.dirname(__FILE__)
gettysburg_file = "#{working_dir}/data/gettysburg.txt"
spaced_file     = "#{working_dir}/data/spaced_out.txt"

# make sure rcat is in the PATH
ENV["PATH"] = [File.join(working_dir, "bin"), ENV["PATH"]].compact.join(":")

Now we can just run the Ruby file normally:

$ ruby tests.rb 
You passed the tests, yay!

I like that. Keep it as easy as humanly possible for users to run your tests. This helps to get them into your code and that's always a plus.

What About the Duplication?

As you look through these tests, you'll see plenty of duplication. In just the two I posted above, you can already see it.

The same arguments are always sent cat and rcat
We capture the output from both
Output is compared and an error thrown if they don't match

Couldn't we wrap that in a method that handles those details for us? We definitely could, but it's not quite as utterly straightforward as I might have made it seem.

First, read through the error messages. They differ a bit, so we probably couldn't copy them exactly as they are. We would likely standardize them based on the arguments. I think that's fine. It may actually be an improvement since the current messages seem a little inconsistent.

But, if you keep reading down, you will see that the tests gain some variety of form. Have a peek at this later test, for example:

`cat #{gettysburg_file}`
cat_success = $?

`rcat #{gettysburg_file}`
rcat_success = $?

unless cat_success.exitstatus == 0 && rcat_success.exitstatus == 0
  fail "Failed 'cat and rcat success exit codes match"
end

Also, consider this one:

cat_out, cat_err, cat_process    = Open3.capture3("cat some_invalid_file")
rcat_out, rcat_err, rcat_process = Open3.capture3("rcat some_invalid_file") 

unless cat_process.exitstatus == 1 && rcat_process.exitstatus == 1
  fail "Failed 'cat and rcat exit codes match on bad file"
end

unless rcat_err == "rcat: No such file or directory - some_invalid_file\n"
  fail "Failed 'cat and rcat error messages match on bad file'"
end

We could handle these different cases, but we're going to need to introduce some infrastructure to do it. In turn, that infrastructure is going to make the tests less easy to browse.

It seems like there are some pros and cons to making that change. Because of that, I'm inclined not to go there. I can't tell if I'm going to make the tests communicate better or worse with a change like that. When I find myself at such a crossroads, I generally opt for a "Do No Harm" strategy.

Perhaps, as more tests are added down the road, the picture will crystalize more and I'll be able to make a stronger argument for the change. If that came to pass, I would make the change.

That's why I'm electing to leave the tests as they are. It's definitely a margin call though. If you replaced them, I don't blame you.

Reading the Code

With the tests running smoothly, I'm ready to actually start reading the code. For a small executable like this, I generally just crack open the executable file itself and follow where it leads me.

Using that strategy, this is some of the first code I ran into:

begin
  RCat::Application.new(ARGV).run
rescue Errno::ENOENT => err
  STDERR.puts "rcat: #{err.message}"
  exit(1)
rescue OptionParser::InvalidOption => err
  STDERR.puts "rcat: #{err.message}"
  STDERR.puts "usage: rcat [-bns] [file ...]"
  exit(1)
end

That has my Spidey Sense tingling. What are these errors doing up here? This looks like parts of the code leaking into places it doesn't belong. When I'm here, I have no idea what Errno::ENOENT or OptionParser::InvalidOption relate to. Also, I know I link to this book in almost every freaking article, but doesn't Exceptional Ruby say that begin is a code smell?

I'm also wondering why we are using STDERR here instead of $stderr. Plus those puts() and exit() calls could be collapsed into a calls to Ruby's abort(). (If you read the article, you may have noticed these issues were fixed. Greg updated them after I made the same mentions to him.)

Going back to the article, does Greg give a reasoning for why these rescues are at the top level? It turns out that he does. Here's the best part:

Regardless of how these exceptions are labeled, it's important to note that I intentionally let them bubble all the way up to the outermost layer and only then rescue them and call Kernel#exit. Intermingling exit calls within control flow or modeling logic makes debugging nearly impossible and also makes automated testing a whole lot harder.

That's an interesting argument. I would not be for making testing the code harder. However, we're hurting readability significantly by moving these pieces so far from the code they relate to. We have to find a way to balance those needs.

Can I Test exit()?

What is Greg saying about testing an exit() call? He's saying, if we call it in the middle of our test run, it's just going to halt the running test suite without giving us a chance to check anything. That would definitely be bad, but the real question is, "Is that true?"

Let's find out. I created a new file called exit_test.rb and entered this code:

require "minitest/autorun"

class ExitTest < MiniTest::Unit::TestCase
  def test_exit
    exit
  end
end

If we run that, we will see that Greg is right:

$ ruby exit_test.rb 
Run options: --seed 5481

# Running tests:

That's not good. It killed things before we could see some output.

I'll be honest and tell you that's not what I expected to see. I know that exit() just raises an Exception to end the program. I also know that testing frameworks catch any Exception raised during the run so that they can report them in the results. I expected it to catch the Exception and keep it from ending the program. I had to look at the source code to see why it wasn't doing that.

The answer is the while MiniTest does catch any Exception raised, it reraises some of them. It just so happens that the list of those that get reraised includes SystemExit, which is what exit() raises. That makes a lot of sense. If MiniTest didn't do that, exit() wouldn't work like exit() while it was running and that might make us mad if we were trying to use it to debug a problem.

Greg is a smart guy. Thanks for the lesson, buddy.

However, just because MiniTest doesn't automatically catch the error for us, doesn't mean we can't handle it. In fact, MiniTest will catch it for us, if we ask nicely. Watch this:

require "minitest/autorun"

class ExitTest < MiniTest::Unit::TestCase
  def test_exit
    assert_raises(SystemExit) do
      exit
    end
  end
end

If we run that, we can see that our test works fine:

$ ruby exit_test.rb 
Run options: --seed 50521

# Running tests:

.

Finished tests in 0.000427s, 2341.9204 tests/s, 2341.9204 assertions/s.

1 tests, 1 assertions, 0 failures, 0 errors, 0 skips

Now we're getting somewhere. Could we use this to test a method that sometimes calls exit()? Sure:

def good_and_bad(result)
  result == :good ? 42 : exit
end

require "minitest/autorun"

class GoodAndBadTest < MiniTest::Unit::TestCase
  def test_good
    assert_equal(42, good_and_bad(:good))
  end

  def test_bad
    assert_raises(SystemExit) do
      good_and_bad(:bad)
    end
  end
end

That works just as you would expect:

$ ruby good_and_bad_test.rb 
Run options: --seed 50819

# Running tests:

..

Finished tests in 0.000468s, 4273.5043 tests/s, 4273.5043 assertions/s.

2 tests, 2 assertions, 0 failures, 0 errors, 0 skips

So, while Greg is right that working with exit() in a test suite can have nasty consequences, that does not mean we can't test those methods. I think that tips the scales for me. I just feel it's too important to get that error handling into the right place. I make a note that it needs moving and keep reading the code.

Everything in its Place

If you can remember back to before I got off on that testing tangent, we saw the executable hand off to the library code like this: RCat::Application.new(ARGV).run. That makes me want to get a look at the Application object. I'm not going to dump that code in here, but I noticed two things when reading it:

It's gorgeous. I have no complaints.
It's also the the exact place I was looking for to hang those wayward errors!

Let me show you the code as I saw it:

module RCat
  class Application
    def initialize(argv)
      # setup code, safe to ignore...
    end

    def run
      # open files that could toss Errno::ENOENT here...
    end

    def parse_options(argv)
      # parse options with OptionParser here...
    end
  end
end

Now watch how it looks with the errors moved into here:

module RCat
  class Application
    def initialize(argv)
      # setup code, safe to ignore...
    end

    def run
      # open files that could toss Errno::ENOENT here...
    rescue Errno::ENOENT => error
      # handle Errno::ENOENT here...
    end

    def parse_options(argv)
      # parse options with OptionParser here...
    rescue OptionParser::InvalidOption => error
      # handle OptionParser::InvalidOption here...
    end
  end
end

Burn that pattern into your brain, because it's what great error handling looks like. This code already had perfect compartmentalization into methods that just do one thing. All we did was tack the error handling for those things onto those methods. You have the "happy path" first, which is the way code should be. Then, if trouble arises, you just take the low road. It's also worth noting that there's not a begin in site.

I can understand Greg's desire not to work with ordinary methods that occasionally try to halt everything. However, these aren't ordinary methods. These are the tasks handled by the Application object. Tasks like run(). Being able to stop everything is totally in the scope of what we are doing here.

Furthermore, look at the new and improved executable:

#!/usr/bin/env ruby

require_relative "../lib/rcat"

RCat::Application.new(ARGV).run

That's, in a word, ideal. This file is now just an interface to our library code. It just kicks off a process. It doesn't do any work. That allows us to safely forget about it. If we write proper tests, we will know this code is just fine.

I made the real move of the content, not just the comments shown above, and ran the tests. They passed, so we're looking great with our second change.

Thrown for a Loop

There are three code files left that I haven't gone inside of yet. However, one just requires some files and the other just sets a constant. So really, we are down to one interesting stop left: lib/rcat/display.

The first thing I notice in here is that we're in a Display object, but doing things like render(). It might be worthwhile to rename this thing to a Renderer for consistent terminology, but this is a minor point and I'm not going to worry about it now.

There is a method here that concerns me, render_line(). Here's the code:

def render_line(lines)
  current_line = lines.next 
  current_line_is_blank = current_line.chomp.empty?

  case line_numbering_style
  when :all_lines
    print_labeled_line(current_line)
    increment_line_number  
  when :significant_lines
    if current_line_is_blank
      print_unlabeled_line(current_line)
    else
      print_labeled_line(current_line)
      increment_line_number
    end
  else
    print_unlabeled_line(current_line)
    increment_line_number
  end

  if squeeze_extra_newlines && current_line_is_blank
     lines.next while lines.peek.chomp.empty?
  end
end

I see some warning signs in this:

The method is getting a bit long
It's called render_line() but it operates on a set of lines (printing one and possibly discarding others)
There's some duplication, like the two places where it prints a labeled line and then increments the line number
The nested conditionals make it hard for me to reason about
It's part of some flow control you can't even see by reading it

To explain that last point, this code works by calling next() on the passed Enumerator. That method will raise StopIteration when the list of items is exhausted. So that raise is happening here, either on the first line or the next-to-last line. But it's being handled elsewhere, in fact, the render() method rescues it:

def render(data)
  @line_number = 1

  lines = data.lines
  loop { render_line(lines) }
end

Do you see a rescue clause there? It's a bit hidden. In Ruby 1.9, loop() was enhanced to quietly catch StopIteration and use it as a sign to halt the iteration. I'm fine with that in practice, but once you break it across two methods, like it is in this case, it seems to hide some pretty deep magic.

I want to try to fix this section, but I need to do it in pieces. First, I want to address the confusing flow control. Greg talks about the reason for selecting an Enumerator in his article and it definitely makes sense, but I'm hoping that reverting to a normal each() can help me clean up the flow. I'm going to try that.

The hard part about switching to each() is this chunk of code towards the end of render_line():

if squeeze_extra_newlines && current_line_is_blank
   lines.next while lines.peek.chomp.empty?
end

We have to think about what that is doing. It's looking forward at upcoming lines and discarding those lines as long as they are don't need to be shown. The trick is to reverse that. If we remove the need to see into the future, each() will work just fine. Without peek() though, it's going to hand us some lines we don't want. We will have to skip over those before we process them. Put another way, if we move this code to the head of the loop, we can drop the peek() and switch to Ruby's normal next keyword.

This should allow me to get the process into one normal iterator. Given that, I plan to drop render_line() altogether. I'm hoping to get render() clean enough that it won't be needed. Here's my first pass at switching to a normal iterator:

def render(data)
  @line_number = 1

  data.each do |current_line|
    current_line_is_blank = current_line.chomp.empty?

    if squeeze_extra_newlines && current_line_is_blank
       next
    end

    case line_numbering_style
    when :all_lines
      print_labeled_line(current_line)
      increment_line_number  
    when :significant_lines
      if current_line_is_blank
        print_unlabeled_line(current_line)
      else
        print_labeled_line(current_line)
        increment_line_number
      end
    else
      print_unlabeled_line(current_line)
      increment_line_number
    end
  end
end

I'm liking that flow better, but don't get too excited. Greg's tests are giving me the thumbs down for this version:

$ ruby tests.rb 
tests.rb:62:in `<main>': Failed 'cat -s == rcat -s' (RuntimeError)

This proves I didn't understand the peek() and next() combo as well as I thought I did. Things like this are why it's so critical to have working tests while you are refactoring.

OK, so what's the problem here? Well, it's because this code use to be at the end of the loop, meaning that it was checked after a line was processed. That means the first blank line made it through. Then it stopped the rest. Under my new setup though, I stop all of them. No good.

I need to track another state variable to fix that. This code will do the trick:

def render(data)
  @line_number = 1
  in_blank_run = false

  data.each do |current_line|
    current_line_is_blank = current_line.chomp.empty?

    if squeeze_extra_newlines && in_blank_run && current_line_is_blank
       next
    end

    in_blank_run = current_line_is_blank

    case line_numbering_style
    when :all_lines
      print_labeled_line(current_line)
      increment_line_number  
    when :significant_lines
      if current_line_is_blank
        print_unlabeled_line(current_line)
      else
        print_labeled_line(current_line)
        increment_line_number
      end
    else
      print_unlabeled_line(current_line)
      increment_line_number
    end
  end
end

That does indeed get the tests passing again.

Now, at first glance, it probably looks like I have made things worse. That's a little true. However, I mostly have the code in one place now and that makes it easier to do extractions. I'm betting that will help me clean things up even more.

I said "mostly have the code in one place" because there's an instance variable involved: @line_number. That means other methods can modify it on me. It turns out they do. In fact, that's increment_line_number()'s job. I'm going to inline that method and switch to using a local variable to make sure I have this code under my control:

def render(data)
  line_number  = 1
  in_blank_run = false

  data.each do |current_line|
    current_line_is_blank = current_line.chomp.empty?

    if squeeze_extra_newlines && in_blank_run && current_line_is_blank
       next
    end

    in_blank_run = current_line_is_blank

    case line_numbering_style
    when :all_lines
      print_labeled_line(line_number, current_line)
      line_number += 1
    when :significant_lines
      if current_line_is_blank
        print_unlabeled_line(current_line)
      else
        print_labeled_line(line_number, current_line)
        line_number += 1
      end
    else
      print_unlabeled_line(current_line)
      line_number += 1
    end
  end
end

I also had to update print_labeled_line(), so I could pass it the label:

def print_labeled_line(label, line)
  print "%6d\t%s" % [label, line]
end

The tests still pass and I now have all the code in one place. I'm ready to start making this code pretty.

Those two state variables are tipping me off that a new object is called for here. That's what objects do, track state, so that's a way to predict that they are needed. I'll start by building an object that can track the line numbers. I was lazy and just put this new object in the same file for now. I can graduate it to its own file when the time comes.

class RendererStatus
  def initialize
    @lineno = 1
  end

  attr_reader :lineno

  def increment
    @lineno += 1
  end
end

def render(data)
  in_blank_run = false

  data.each_with_object(RendererStatus.new) do |current_line, status|
    current_line_is_blank = current_line.chomp.empty?

    if squeeze_extra_newlines && in_blank_run && current_line_is_blank
       next
    end

    in_blank_run = current_line_is_blank

    case line_numbering_style
    when :all_lines
      print_labeled_line(status.lineno, current_line)
      status.increment
    when :significant_lines
      if current_line_is_blank
        print_unlabeled_line(current_line)
      else
        print_labeled_line(status.lineno, current_line)
        status.increment
      end
    else
      print_unlabeled_line(current_line)
      status.increment
    end
  end
end

The RendererStatus object is trivial at this point. It just tracks a counter. But this gives me a place to hang status questions in the future and that should help me clean things up. Note that I push it through my loop using the new each_with_object() iterator in Ruby 1.9.

Now, it's time to start busting up that central case statement. The main problem with it is that it's doing two separate things: printing content and bumping the counter. I want to handle those separately, to cut out the repetition. Here's the split:

def render(data)
  in_blank_run = false

  data.each_with_object(RendererStatus.new) do |current_line, status|
    current_line_is_blank = current_line.chomp.empty?

    if squeeze_extra_newlines && in_blank_run && current_line_is_blank
       next
    end

    in_blank_run = current_line_is_blank

    case line_numbering_style
    when :all_lines
      print_labeled_line(status.lineno, current_line)
    when :significant_lines
      if current_line_is_blank
        print_unlabeled_line(current_line)
      else
        print_labeled_line(status.lineno, current_line)
      end
    else
      print_unlabeled_line(current_line)
    end

    unless line_numbering_style == :significant_lines &&
           current_line_is_blank
      status.increment
    end
  end
end

Now that they are separate, it's also easy to collapse the nested case/if logic:

def render(data)
  in_blank_run = false

  data.each_with_object(RendererStatus.new) do |current_line, status|
    current_line_is_blank = current_line.chomp.empty?

    if squeeze_extra_newlines && in_blank_run && current_line_is_blank
       next
    end

    in_blank_run = current_line_is_blank

    if line_numbering_style == :all_lines ||
       ( line_numbering_style == :significant_lines &&
         !current_line_is_blank )
      print_labeled_line(status.lineno, current_line)
    else
      print_unlabeled_line(current_line)
    end

    unless line_numbering_style == :significant_lines &&
           current_line_is_blank
      status.increment
    end
  end
end

That's better, but these conditional checks aren't very sexy. Let's add a couple of query methods and clean those up:

def squeeze_extra_newlines?
  @squeeze_extra_newlines
end

def number_all_lines?
  @line_numbering_style == :all_lines
end

def number_significant_lines?
  @line_numbering_style == :significant_lines
end

def render(data)
  in_blank_run = false

  data.each_with_object(RendererStatus.new) do |current_line, status|
    current_line_is_blank = current_line.chomp.empty?

    next if squeeze_extra_newlines? &&
            in_blank_run            &&
            current_line_is_blank

    in_blank_run = current_line_is_blank

    if number_all_lines? || ( number_significant_lines? &&
                              !current_line_is_blank )
      print_labeled_line(status.lineno, current_line)
    else
      print_unlabeled_line(current_line)
    end

    unless number_significant_lines? && current_line_is_blank
      status.increment
    end
  end
end

The tests are still passing and the code is getting a lot easier to understand. In fact, if you look at what's still not gorgeous, it all has to do with the blank line tracking. We already know where that state belongs too. We built an object just for that purpose. Let's fold that last variable into our status tracking:

class RendererStatus
  def initialize
    @lineno       = 0
    @in_blank_run = false
    @significant  = true
  end

  attr_reader :lineno

  def in_blank_run?
    @in_blank_run
  end

  def significant?
    @significant
  end

  def insignificant?
    !significant?
  end

  def extra_newline?
    in_blank_run? && insignificant?
  end

  def update(line, squeeze, only_count_significant)
    @in_blank_run = !significant?
    @significant  = !line.chomp.empty?
    @lineno      += 1 unless (squeeze && extra_newline?) ||
                             (only_count_significant && insignificant?)
  end
end

That's becoming a fair bit of code, but note that it's all trivial. This just answers very basic questions and the method names are such that you can probably understand the checks without even reading the implementations.

The proof is in the pudding though. Look what that does to our iterator:

def render(data)
  data.each_with_object(RendererStatus.new) do |line, status|
    status.update( line, squeeze_extra_newlines?,
                         number_significant_lines? )
    next if squeeze_extra_newlines? && status.extra_newline?

    if number_all_lines? || ( number_significant_lines? &&
                              status.significant? )
      print_labeled_line(status.lineno, line)
    else
      print_unlabeled_line(line)
    end
  end
end

I shortened current_line to line, added a call to RendererStatus#update(), and switched the remaining references to use the status object. I'll let you be the judge, but I find this version easier to digest. I can almost read what it does in plain English and I like that. When I do need to peek at a method, it's very easy to understand. It would also be easier to cover in unit tests.

Having added a RendererStatus object makes the choice of the Display name stick out even more. That made me decide that it was worth it to rename it to a Renderer. There are no surprises in that process, so I won't bore you by showing more code.

Try, Try Again

This is the second time I have tried this refactoring. I did it once, right after Greg released the code. You can find the results of that in the master branch of my fork on GitHub.

I started over for this article and I think I did an even better job this time. I've pushed these changes into a ritr_refactoring branch in that same repository. You can take a look at that to see the final product or to see how I organized the commits.

The point is, I got better by retrying this exercise. You will too. Always be refactoring code, even code you have refactored before. You'll learn more as you go and hopefully make the code easier on the eyes with every pass. That's our goal!

In: Rubies in the Rough | Tags: Iterators & Refactoring | 5 Comments

Comments (5)

Gregory Brown January 13th, 2012 Reply Link

Thanks for writing this one James. It's great to see a Practicing Ruby article cited elsewhere, with the code from it improved. Agreed with many of the suggestions you made here, with the possible exception of capturing SystemExit in tests.

The point I was trying to make about testing being difficult when you have exit calls buried in your call stack is that it makes manual testing hard too. I've scratched my head a hundred times trying to figure out why some code wasn't doing what I expected, only to find out that an exit call was to blame. An interesting thing I learned from this article is that attempting to rescue SystemExit might shed light on that sort of heisenbug, but I'd still prefer to never create it for the user in the first place :)
1. Reply (using GitHub Flavored Markdown)
  
  Comments on this blog are moderated. Spam is removed, formatting is fixed, and there's a zero tolerance policy on intolerance.
  
  Or login with:
  Name * Email URL Comment *
Jacob Tjoernholm January 14th, 2012 Reply Link
James,

(I sent you this in an email and you encouraged me to share it on the site, so here goes :))

I was a bit surprised by the way you refactored rcat. Not that I didn't like it, and I definitely learned from it. But I had expected you to go for a more object oriented design, splitting the responsibilities into separate classes.

In case you're interested, I had a go at using decorators for the refactoring. Here is my revised Display class:
```
class Display
  def initialize(params)
    @line_numbering_style   = params[:line_numbering_style]
    @squeeze_extra_newlines = params[:squeeze_extra_newlines]
  end

  def create_printer
    printer = LinePrinter.new
    printer = LineNumberer.new(printer, :mode => @line_numbering_style) if @line_numbering_style
    printer = ExtraNewlineSqueezer.new(printer) if @squeeze_extra_newlines
    printer
  end

  def render(data)
    printer = create_printer
    data.lines.each do |line|
      printer.print_line(line)
    end
  end
end
```
The complete file including decorators is at https://gist.github.com/1565178 and the repo is at https://github.com/chopmo/rcat.
1. Reply (using GitHub Flavored Markdown)
  
  Comments on this blog are moderated. Spam is removed, formatting is fixed, and there's a zero tolerance policy on intolerance.
  
  Or login with:
  Name * Email URL Comment *
2. James Edward Gray II January 14th, 2012 Reply Link
  
  Everyone should take a careful look at Jacob's work. It's a very clean refactoring that shows off the beauty of a good object oriented design.
  1. Reply (using GitHub Flavored Markdown)
    
    Comments on this blog are moderated. Spam is removed, formatting is fixed, and there's a zero tolerance policy on intolerance.
    
    Or login with:
    Name * Email URL Comment *
3. Gregory Brown January 14th, 2012 Reply Link
  
  Jacob:
  
  Nice work. I think the general idea of using decorators would definitely clean up that code. However, the extensive use of super() makes me wonder if a composition based approach would have been better than inheritance here.
  
  In any case, thanks for working on this! It's fun to see all the different approaches to this problem.
  1. Reply (using GitHub Flavored Markdown)
    
    Comments on this blog are moderated. Spam is removed, formatting is fixed, and there's a zero tolerance policy on intolerance.
    
    Or login with:
    Name * Email URL Comment *
  2. Jacob Tjoernholm January 17th, 2012 Reply Link
    
    Right, now I see what you mean. Would be interesting to try using composition instead, maybe I'll revisit this some day. And thanks to you for the original code!
    1. Reply (using GitHub Flavored Markdown)
      
      Comments on this blog are moderated. Spam is removed, formatting is fixed, and there's a zero tolerance policy on intolerance.
      
      Or login with:
      Name * Email URL Comment *