23
MAY2015
Rich Methods
Some APIs provide collections of dirt simple methods that just do one little thing.
This approach in less common in Ruby though, especially in the core and standard library of the language itself. Ruby often gives us rich methods with lots of switches we can toggle and half hidden behaviors.
Let's look at some examples of what I am talking about.
Get a Line at a Time
I suspect most Rubyists have used gets()
to read lines of input from some kind of IO
. Here's the basic usage:
>> require "stringio"
=> true
>> f = StringIO.new(<<END_STR)
<xml>
<tags>Content</tags>
</xml>
END_STR
=> #<StringIO:0x007fd5a264fa08>
>> f.gets
=> "<xml>\n"
>> f.gets
=> " <tags>Content</tags>\n"
I didn't want to mess with external files for these trivial examples, so I just loaded StringIO
from the standard library. It allows us to wrap a simple String
(defined in this example using the heredoc syntax) in the IO
interface. In other words, I'm calling gets()
here for a String
just as I could with a File
or $stdin
.
As the last two calls show, gets()
reads until it finds a "\n"
and then returns the content read. Actually, that's what it does by default, but you can tell gets()
what character to read to, if you prefer:
>> f.rewind
=> 0
>> f.gets(">")
=> "<xml>"
>> f.gets(">")
=> "\n <tags>"
>> f.gets(">")
=> "Content</tags>"
When you're working with XML documents, newlines don't really mean much. You don't actually care where they are. What you do care about are tags. Reading from tag to tag is like reading one of those great books that skip the boring bits to give you interesting scene after interesting scene.
As you can see above, one tiny change to the gets()
call, specifying the character to read to as the tag ending ">"
, can make this happen.
"But wait, there's more!"
>> f = StringIO.new("One\n\nTwo\n\nThree")
=> #<StringIO:0x007fd5a260efa8>
>> f.gets("")
=> "One\n\n"
>> f.gets("")
=> "Two\n\n"
The empty String
(""
) is a magic value for the character to read to, since it makes no sense as that value. This turns on paragraph mode and in that mode Ruby will read one paragraph at a time. For this purpose a paragraphs are defined as being separated by two consecutive newlines (or a blank line in word processor terms).
These aren't even all the features of gets()
. It can do more. For example, you can provide an upper limit of bytes to read, to prevent wonky input from forcing your program to allocate the tons of memory to hold large Ruby String
objects.
Let's look at another method.
Hash
Merging
Many Ruby methods sneak their rich functionality in through the use of blocks. Deferring some decision to the caller by allowing them to provide custom code for handling it makes some methods crazy flexible.
To show what I mean, let's play with good old merge()
:
>> {a: 1, b: 2}.merge(c: 3, d: 4)
=> {:a=>1, :b=>2, :c=>3, :d=>4}
Most Rubyists run into examples like this pretty early in their studies. The code just returns a fresh Hash
containing the keys and values of both the receiver and the Hash
passed as an argument to merge()
.
How are ties handled?
>> {a: 1, b: 2}.merge(b: :two, c: 3)
=> {:a=>1, :b=>:two, :c=>3}
The Hash
passed as an argument to merge()
wins. Again, I doubt this is much of a surprise to anyone.
However, I don't think everyone knows that you can take control of this merging process. During a merge()
any conflict will be passed to a block, if provided, and the block can return what to store in the new Hash
:
>> {a: 1, b: 2}.merge(b: :two, c: 3) { |_, old, new| Array(old) + Array(new) }
=> {:a=>1, :b=>[2, :two], :c=>3}
You can throw away either item, log the conflict, combine them as I have done here, or do whatever else you can think of, all because merge()
takes a block.
Can you guess how ActiveSupport
implements reverse_merge!()
now?
Easy Tokenizing
Let's do one last method with a rich interface (even though Ruby has many more):
>> "1,2,3".split(",")
=> ["1", "2", "3"]
This is another very common method. It turns a String
into an Array
by dividing up the contents everywhere the passed separator is encountered. I used a String
separator above but a Regexp
is also allowed:
>> "1, 2, 3".split(/\s*,\s*/)
=> ["1", "2", "3"]
This makes it easier to handle complex separators. For example, the Regexp
above permits optional whitespace characters on either side of the comma.
But a Regexp
can include capture groups. How are they handled?
>> "1, 2, 3".split(/\s*(,)\s*/)
=> ["1", ",", "2", ",", "3"]
Easy enough: the captured value(s) are returned with the separated contents.
The real question this raises for me is, "What the heck is this feature good for?" Well, one thing I have found over the years is that this usage of split()
can make dividing some input into tokens pretty darn easy:
>> "<xml><tags>Content</tags></xml>".split(/(<[^>]+>)/)
=> ["", "<xml>", "", "<tags>", "Content", "</tags>", "", "</xml>"]
You can use this one feature as a backbone for a moderately complex parser. Mote does just that.
I made a video explaining how this parsing trick (and more) are accomplished in detail. You can use the coupon BLOGREADER
for $3 off if you want to check it out.
Leave a Comment (using GitHub Flavored Markdown)