Higher-Order Ruby

A Rubyification of the Perl book about functional programming.



Iterators (Chapters 4 and 5)

Due to a printing error, these two chapters actually came out longer than intended. Originally their contents were: "Use Ruby."

All jokes aside, there's really not a whole lot for me to talk about from these chapters, since iterators are so internal to Ruby. Readers from our camp should run into a lot less surprises here that the intended audience. Just translate MDJ's anonymous subroutines to blocks, replace his returns with yields, and you are 90% of the way there.

Here are translations for some of the examples in these chapters. I think these all come out cleaner and more natural in Ruby, but you be the judge:


#!/usr/local/bin/ruby -w

def permute(items)
  0.upto(1.0/0.0) do |count|
    pattern = count_to_pattern(count, items.size) or break
    puts "Pattern #{pattern.join(' ')}:" if $DEBUG
    yield(pattern_to_permutation(pattern, items.dup))

def pattern_to_permutation(pattern, items)
  pattern.inject(Array.new) { |results, i| results + items.slice!(i, 1) }

def count_to_pattern(count, item_count)
  pattern = (1..item_count).inject(Array.new) do |pat, i|
    pat.unshift(count % i)
    count /= i
  count.zero? ? pattern : nil

if ARGV.empty?
  abort "Usage:  #{File.basename($PROGRAM_NAME)} LIST_OF_ITEMS"

permute(ARGV) { |perm| puts(($DEBUG ? "  " : "") + perm.join(" ")) }

Treasure Allocation

#!/usr/local/bin/ruby -w

def find_shares(target, treasures)
  shares = target.zero? ? [[target, Array.new, Array.new]] :
                          [[target, treasures, Array.new]]

  until shares.empty?
    goal, pool, share = shares.pop
    first             = pool.shift

    shares << [goal, pool.dup, share.dup] unless pool.empty?
    if goal == first
      yield(share + [first])
    elsif goal > first && !pool.empty?
      shares << [goal - first, pool.dup, share + [first]]

unless ARGV.size >= 2
  abort "Usage:  #{File.basename($PROGRAM_NAME)} TARGET TREASURES"

find_shares(ARGV.first.to_i, ARGV[1..-1].map { |n| n.to_i }) do |share|
  puts share.join(" ")

Integer Partitioning

#!/usr/local/bin/ruby -w

def partition(num)
  partitions = [[num]]

  until partitions.empty?
    current = partitions.shift


    largest = current.shift
    (([current.first.to_i, 1].max)..(largest / 2)).each do |n|
      partitions << Array[largest - n, n, *current.dup]

    partitions.sort! { |a, b| b <=> a }

unless ARGV.size == 1
  abort "Usage:  #{File.basename($PROGRAM_NAME)} INTEGER"

partition(ARGV.first.to_i) { |partition| puts partition.join(" ") }

Internal vs. External

All of the the iterators MJD uses are "external iterators." That's just a fancy way of saying that he returns these magic objects you can query for each item in a series. When you are done with an item, you can just ask for the next one, and it will let you know when it runs out of items to give you. (Java's iterators are also external, if you are familiar with that language.)

Ruby uses a different approach, called "internal iterators." Instead of asking for the magic object (pulling data), we pass in the operations to do on the items we are iterating over (pushing data) in the form of blocks.

Both models have their advantages.

First think about this: Ruby's iterators don't suffer from the semipredicate issues MJD keeps describing in these pages. You never have to worry about whether or not each() is trying to tell you it's out of items, because it handles all of that. That's an advantage of internal iterators.

However, sometimes we need to work through a couple of iterators in tandem. Internal iterators struggle with this, because we hand the processing code off for another object to run and manage. In these cases, external iterators fair better.

You should remember two things from this.

First, most of the time there is little difference and in the typical cases and when that's true internal iterators tend to come out cleaner, in my opinion. See the three examples above.

Second, some problems just need an external iterator. Because of that, you need to know how to get one in Ruby. There are a couple of ways, but tuck this one away in your mind because it's so easy to remember: all Enumerable objects have a to_a() method and an Array can be an external iterator (using indices).

MJD had a great example of a problem that doesn't bend well to internal iterators in this section, when he was playing with gene combinations. He solved it like I just hinted at, using arrays and tracking indices. Here's another option in Ruby using the standard Generator library, which is for switching internal iterators into external iterators:

[Note: The following code uses a standard library that has been removed from Ruby. It's no longer needed now that any iterator is trivially switched into an Enumerator, Ruby's current system for getting external iterators.]

#!/usr/local/bin/ruby -w

require "generator"

def permute_genes(pattern)
  tokens = pattern.split(/[()]/)
  (1...tokens.size).step(2) do |i|
    tokens[i] = Generator.new(tokens[i].split(""))

  loop do
    incrementing = false
    permutation = tokens.inject(String.new) do |result, token|
      if token.is_a?(String)
        result + token
        if incrementing
          result + token.current
          next_result = result + token.next
          if token.end?
            incrementing = true
    break unless incrementing

unless ARGV.size == 1
  abort "Usage:  #{File.basename($PROGRAM_NAME)} GENE_PATTERN"

permute_genes(ARGV.first) { |perm| puts perm }

I think that solution comes out very nice. However, I must warn you that Generator is pretty slow due to some implementation details. If you run into performance issues while using it, just remember the backup plan (to_a()).

A Web Robot

I took MJD's "extended example" and Rubyified it, to the best of my abilities. It requires my RobotRules port as well as the Rubyful Soup library by Leonard Richardson.

[Note: Rubyful Soup is no longer maintained and probably doesn't work on modern versions of Ruby.]


require "open-uri"
require "uri"

require "robot_rules"

require "rubygems"
require "rubyful_soup"

class SimpleRobot
  def initialize(&link_filter)

    @sites = Hash.new
    @rules = RobotRules.new("SimpleRobot/1.0")

  def filter(&link_filter)
    @filter = link_filter

  def traverse(top)
    filter = @filter || lambda { |link| link =~ /\A#{Regexp.escape(top)}/ }
    links  = [[top, "Top link."]]
    seen   = Hash.new(0)

    until links.empty?
      url, referrer = links.pop

      next unless robot_safe?(url)

      open(url) do |page|
        content = nil

        if page.content_type =~ /\Atext\/html\b/i
          tags = BeautifulSoup.new(content = page.read)
          links.push( *tags.find_all("a").map { |l| l.attrs["href"] }.
                                          map { |l| l.sub(/#.*\Z/, "") }.
                                          map { |l| l.sub(/\A(?!\w+:)/, top) }.
                                          select { |l| (seen[l] += 1) == 1 }.
                                          map { |l| [l, url] } )

        yield(url, page.meta, referrer, content)
      end rescue yield(url, Hash.new, referrer, nil)

  def robot_safe?(url)
    uri      = URI.parse(url)
    location = "#{uri.host}:#{uri.port}"

    return true unless %w{http https}.include?(uri.scheme)

    unless @sites.include?(location)
      @sites[location] = true

      robot_url  = "http://#{location}/robots.txt"
        robot_file = open(robot_url) { |page| page.read }
        return true
      @rules.parse(robot_url, robot_file)


You can use that to quickly make crawlers:


require "simple_robot"

bot = SimpleRobot.new
bot.traverse(ARGV.shift) do |url, head, referrer, html|
  puts url
Comments (2)
  1. Earle Clubb
    Earle Clubb March 18th, 2008 Reply Link

    There is a typo at the beginning of this post.
    "(Do) to a printing error" should be "(Due) to a printing error".

    1. Reply (using GitHub Flavored Markdown)

      Comments on this blog are moderated. Spam is removed, formatting is fixed, and there's a zero tolerance policy on intolerance.

      Ajax loader
    2. James Edward Gray II
      James Edward Gray II March 18th, 2008 Reply Link

      Good catch. Thanks for pointing that out.

      1. Reply (using GitHub Flavored Markdown)

        Comments on this blog are moderated. Spam is removed, formatting is fixed, and there's a zero tolerance policy on intolerance.

        Ajax loader
Leave a Comment (using GitHub Flavored Markdown)

Comments on this blog are moderated. Spam is removed, formatting is fixed, and there's a zero tolerance policy on intolerance.

Ajax loader