Gray Soft / Ruby Voodoo / Rich Methodstag:graysoftinc.com,2014-03-20:/posts/1402015-05-23T05:18:20ZJames Edward Gray IIRich Methodstag:graysoftinc.com,2015-05-23:/posts/1402015-05-23T05:18:20ZA quick look into the half hidden extras of some common Ruby methods.<p>Some APIs provide collections of dirt simple methods that just do one little thing.</p>
<p>This approach in less common in Ruby though, especially in the core and standard library of the language itself. Ruby often gives us rich methods with lots of switches we can toggle and half hidden behaviors.</p>
<p>Let's look at some examples of what I am talking about.</p>
<h4>Get a <em>Line</em> at a Time</h4>
<p>I suspect most Rubyists have used <code>gets()</code> to read lines of input from some kind of <code>IO</code>. Here's the basic usage:</p>
<div class="highlight highlight-ruby"><pre><span class="o">>></span> <span class="nb">require</span> <span class="s2">"stringio"</span>
<span class="o">=></span> <span class="kp">true</span>
<span class="o">>></span> <span class="n">f</span> <span class="o">=</span> <span class="no">StringIO</span><span class="o">.</span><span class="n">new</span><span class="p">(</span><span class="o"><<</span><span class="no">END_STR</span><span class="p">)</span>
<span class="sh"><xml></span>
<span class="sh"> <tags>Content</tags></span>
<span class="sh"></xml></span>
<span class="no">END_STR</span>
<span class="o">=></span> <span class="c1">#<StringIO:0x007fd5a264fa08></span>
<span class="o">>></span> <span class="n">f</span><span class="o">.</span><span class="n">gets</span>
<span class="o">=></span> <span class="s2">"<xml></span><span class="se">\n</span><span class="s2">"</span>
<span class="o">>></span> <span class="n">f</span><span class="o">.</span><span class="n">gets</span>
<span class="o">=></span> <span class="s2">" <tags>Content</tags></span><span class="se">\n</span><span class="s2">"</span>
</pre></div>
<p>I didn't want to mess with external files for these trivial examples, so I just loaded <code>StringIO</code> from the standard library. It allows us to wrap a simple <code>String</code> (defined in this example using <a href="http://graysoftinc.com/ruby-voodoo/working-with-multiline-strings">the <em>heredoc</em> syntax</a>) in the <code>IO</code> interface. In other words, I'm calling <code>gets()</code> here for a <code>String</code> just as I could with a <code>File</code> or <code>$stdin</code>.</p>
<p>As the last two calls show, <code>gets()</code> reads until it finds a <code>"\n"</code> and then returns the content read. Actually, that's what it does by default, but you can tell <code>gets()</code> what character to read to, if you prefer:</p>
<div class="highlight highlight-ruby"><pre><span class="o">>></span> <span class="n">f</span><span class="o">.</span><span class="n">rewind</span>
<span class="o">=></span> <span class="mi">0</span>
<span class="o">>></span> <span class="n">f</span><span class="o">.</span><span class="n">gets</span><span class="p">(</span><span class="s2">">"</span><span class="p">)</span>
<span class="o">=></span> <span class="s2">"<xml>"</span>
<span class="o">>></span> <span class="n">f</span><span class="o">.</span><span class="n">gets</span><span class="p">(</span><span class="s2">">"</span><span class="p">)</span>
<span class="o">=></span> <span class="s2">"</span><span class="se">\n</span><span class="s2"> <tags>"</span>
<span class="o">>></span> <span class="n">f</span><span class="o">.</span><span class="n">gets</span><span class="p">(</span><span class="s2">">"</span><span class="p">)</span>
<span class="o">=></span> <span class="s2">"Content</tags>"</span>
</pre></div>
<p>When you're working with XML documents, newlines don't really mean much. You don't actually care where they are. What you do care about are tags. Reading from tag to tag is like reading one of those great books that skip the boring bits to give you interesting scene after interesting scene.</p>
<p>As you can see above, one tiny change to the <code>gets()</code> call, specifying the character to read to as the tag ending <code>">"</code>, can make this happen.</p>
<p>"But wait, there's more!"</p>
<div class="highlight highlight-ruby"><pre><span class="o">>></span> <span class="n">f</span> <span class="o">=</span> <span class="no">StringIO</span><span class="o">.</span><span class="n">new</span><span class="p">(</span><span class="s2">"One</span><span class="se">\n\n</span><span class="s2">Two</span><span class="se">\n\n</span><span class="s2">Three"</span><span class="p">)</span>
<span class="o">=></span> <span class="c1">#<StringIO:0x007fd5a260efa8></span>
<span class="o">>></span> <span class="n">f</span><span class="o">.</span><span class="n">gets</span><span class="p">(</span><span class="s2">""</span><span class="p">)</span>
<span class="o">=></span> <span class="s2">"One</span><span class="se">\n\n</span><span class="s2">"</span>
<span class="o">>></span> <span class="n">f</span><span class="o">.</span><span class="n">gets</span><span class="p">(</span><span class="s2">""</span><span class="p">)</span>
<span class="o">=></span> <span class="s2">"Two</span><span class="se">\n\n</span><span class="s2">"</span>
</pre></div>
<p>The empty <code>String</code> (<code>""</code>) is a magic value for the character to read to, since it makes no sense as that value. This turns on <em>paragraph mode</em> and in that mode Ruby will read one paragraph at a time. For this purpose a paragraphs are defined as being separated by two consecutive newlines (or a blank line in word processor terms).</p>
<p>These aren't even all the features of <code>gets()</code>. It can do more. For example, you can provide an upper limit of bytes to read, to prevent wonky input from forcing your program to allocate the tons of memory to hold large Ruby <code>String</code> objects.</p>
<p>Let's look at another method.</p>
<h4>
<code>Hash</code> Merging</h4>
<p>Many Ruby methods sneak their rich functionality in through the use of blocks. Deferring some decision to the caller by allowing them to provide custom code for handling it makes some methods crazy flexible.</p>
<p>To show what I mean, let's play with good old <code>merge()</code>:</p>
<div class="highlight highlight-ruby"><pre><span class="o">>></span> <span class="p">{</span><span class="ss">a</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span> <span class="ss">b</span><span class="p">:</span> <span class="mi">2</span><span class="p">}</span><span class="o">.</span><span class="n">merge</span><span class="p">(</span><span class="ss">c</span><span class="p">:</span> <span class="mi">3</span><span class="p">,</span> <span class="ss">d</span><span class="p">:</span> <span class="mi">4</span><span class="p">)</span>
<span class="o">=></span> <span class="p">{</span><span class="ss">:a</span><span class="o">=></span><span class="mi">1</span><span class="p">,</span> <span class="ss">:b</span><span class="o">=></span><span class="mi">2</span><span class="p">,</span> <span class="ss">:c</span><span class="o">=></span><span class="mi">3</span><span class="p">,</span> <span class="ss">:d</span><span class="o">=></span><span class="mi">4</span><span class="p">}</span>
</pre></div>
<p>Most Rubyists run into examples like this pretty early in their studies. The code just returns a fresh <code>Hash</code> containing the keys and values of both the receiver and the <code>Hash</code> passed as an argument to <code>merge()</code>.</p>
<p>How are ties handled?</p>
<div class="highlight highlight-ruby"><pre><span class="o">>></span> <span class="p">{</span><span class="ss">a</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span> <span class="ss">b</span><span class="p">:</span> <span class="mi">2</span><span class="p">}</span><span class="o">.</span><span class="n">merge</span><span class="p">(</span><span class="ss">b</span><span class="p">:</span> <span class="ss">:two</span><span class="p">,</span> <span class="ss">c</span><span class="p">:</span> <span class="mi">3</span><span class="p">)</span>
<span class="o">=></span> <span class="p">{</span><span class="ss">:a</span><span class="o">=></span><span class="mi">1</span><span class="p">,</span> <span class="ss">:b</span><span class="o">=></span><span class="ss">:two</span><span class="p">,</span> <span class="ss">:c</span><span class="o">=></span><span class="mi">3</span><span class="p">}</span>
</pre></div>
<p>The <code>Hash</code> passed as an argument to <code>merge()</code> wins. Again, I doubt this is much of a surprise to anyone.</p>
<p>However, I don't think everyone knows that you can take control of this merging process. During a <code>merge()</code> any conflict will be passed to a block, if provided, and the block can return what to store in the new <code>Hash</code>:</p>
<div class="highlight highlight-ruby"><pre><span class="o">>></span> <span class="p">{</span><span class="ss">a</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span> <span class="ss">b</span><span class="p">:</span> <span class="mi">2</span><span class="p">}</span><span class="o">.</span><span class="n">merge</span><span class="p">(</span><span class="ss">b</span><span class="p">:</span> <span class="ss">:two</span><span class="p">,</span> <span class="ss">c</span><span class="p">:</span> <span class="mi">3</span><span class="p">)</span> <span class="p">{</span> <span class="o">|</span><span class="n">_</span><span class="p">,</span> <span class="n">old</span><span class="p">,</span> <span class="kp">new</span><span class="o">|</span> <span class="nb">Array</span><span class="p">(</span><span class="n">old</span><span class="p">)</span> <span class="o">+</span> <span class="nb">Array</span><span class="p">(</span><span class="kp">new</span><span class="p">)</span> <span class="p">}</span>
<span class="o">=></span> <span class="p">{</span><span class="ss">:a</span><span class="o">=></span><span class="mi">1</span><span class="p">,</span> <span class="ss">:b</span><span class="o">=>[</span><span class="mi">2</span><span class="p">,</span> <span class="ss">:two</span><span class="o">]</span><span class="p">,</span> <span class="ss">:c</span><span class="o">=></span><span class="mi">3</span><span class="p">}</span>
</pre></div>
<p>You can throw away either item, log the conflict, combine them as I have done here, or do whatever else you can think of, all because <code>merge()</code> takes a block.</p>
<p>Can you guess <a href="https://github.com/rails/rails/blob/42e66fac38b54dd53d062fb5d3376218ed2ffdae/activesupport/lib/active_support/core_ext/hash/reverse_merge.rb#L17-L20">how <code>ActiveSupport</code> implements <code>reverse_merge!()</code></a> now?</p>
<h4>Easy Tokenizing</h4>
<p>Let's do one last method with a rich interface (even though Ruby has many more):</p>
<div class="highlight highlight-ruby"><pre><span class="o">>></span> <span class="s2">"1,2,3"</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">","</span><span class="p">)</span>
<span class="o">=></span> <span class="o">[</span><span class="s2">"1"</span><span class="p">,</span> <span class="s2">"2"</span><span class="p">,</span> <span class="s2">"3"</span><span class="o">]</span>
</pre></div>
<p>This is another very common method. It turns a <code>String</code> into an <code>Array</code> by dividing up the contents everywhere the passed separator is encountered. I used a <code>String</code> separator above but a <code>Regexp</code> is also allowed:</p>
<div class="highlight highlight-ruby"><pre><span class="o">>></span> <span class="s2">"1, 2, 3"</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="sr">/\s*,\s*/</span><span class="p">)</span>
<span class="o">=></span> <span class="o">[</span><span class="s2">"1"</span><span class="p">,</span> <span class="s2">"2"</span><span class="p">,</span> <span class="s2">"3"</span><span class="o">]</span>
</pre></div>
<p>This makes it easier to handle complex separators. For example, the <code>Regexp</code> above permits optional whitespace characters on either side of the comma.</p>
<p>But a <code>Regexp</code> can include capture groups. How are they handled?</p>
<div class="highlight highlight-ruby"><pre><span class="o">>></span> <span class="s2">"1, 2, 3"</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="sr">/\s*(,)\s*/</span><span class="p">)</span>
<span class="o">=></span> <span class="o">[</span><span class="s2">"1"</span><span class="p">,</span> <span class="s2">","</span><span class="p">,</span> <span class="s2">"2"</span><span class="p">,</span> <span class="s2">","</span><span class="p">,</span> <span class="s2">"3"</span><span class="o">]</span>
</pre></div>
<p>Easy enough: the captured value(s) are returned with the separated contents.</p>
<p>The real question this raises for me is, "What the heck is this feature good for?" Well, one thing I have found over the years is that this usage of <code>split()</code> can make dividing some input into tokens pretty darn easy:</p>
<div class="highlight highlight-ruby"><pre><span class="o">>></span> <span class="s2">"<xml><tags>Content</tags></xml>"</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="sr">/(<[^>]+>)/</span><span class="p">)</span>
<span class="o">=></span> <span class="o">[</span><span class="s2">""</span><span class="p">,</span> <span class="s2">"<xml>"</span><span class="p">,</span> <span class="s2">""</span><span class="p">,</span> <span class="s2">"<tags>"</span><span class="p">,</span> <span class="s2">"Content"</span><span class="p">,</span> <span class="s2">"</tags>"</span><span class="p">,</span> <span class="s2">""</span><span class="p">,</span> <span class="s2">"</xml>"</span><span class="o">]</span>
</pre></div>
<p>You can use this one feature as a backbone for a moderately complex parser. <a href="https://github.com/soveran/mote/blob/b43b3879076dade130aac8c34b76cb06caf26e35/lib/mote.rb#L23-L26"><em>Mote</em> does</a> just that.</p>
<p>I made <a href="https://codalyzed.com/videos/lesscode">a video explaining how this parsing trick (and more) are accomplished</a> in detail. You can use the coupon <code>BLOGREADER</code> for $3 off if you want to check it out.</p>James Edward Gray II