<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>Gray Soft / Tags / Parsing</title>
  <id>tag:graysoftinc.com,2014-03-20:/tags/Parsing</id>
  <updated>2015-05-23T05:18:20Z</updated>
  <link rel="self" href="http://graysoftinc.com/tags/Parsing/feed.xml"/>
  <link rel="alternate" href="http://graysoftinc.com/tags/Parsing"/>
  <author>
    <name>James Edward Gray II</name>
  </author>
  <entry>
    <title>Rich Methods</title>
    <link rel="alternate" href="http://graysoftinc.com/ruby-voodoo/rich-methods"/>
    <id>tag:graysoftinc.com,2015-05-23:/posts/140</id>
    <updated>2015-05-23T05:18:20Z</updated>
    <summary>A quick look into the half hidden extras of some common Ruby methods.</summary>
    <content type="html">&lt;p&gt;Some APIs provide collections of dirt simple methods that just do one little thing.&lt;/p&gt;

&lt;p&gt;This approach in less common in Ruby though, especially in the core and standard library of the language itself.  Ruby often gives us rich methods with lots of switches we can toggle and half hidden behaviors.&lt;/p&gt;

&lt;p&gt;Let's look at some examples of what I am talking about.&lt;/p&gt;

&lt;h4&gt;Get a &lt;em&gt;Line&lt;/em&gt; at a Time&lt;/h4&gt;

&lt;p&gt;I suspect most Rubyists have used &lt;code&gt;gets()&lt;/code&gt; to read lines of input from some kind of &lt;code&gt;IO&lt;/code&gt;.  Here's the basic usage:&lt;/p&gt;

&lt;div class="highlight highlight-ruby"&gt;&lt;pre&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;require&lt;/span&gt; &lt;span class="s2"&gt;"stringio"&lt;/span&gt;
&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="kp"&gt;true&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;StringIO&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;END_STR&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="sh"&gt;&amp;lt;xml&amp;gt;&lt;/span&gt;
&lt;span class="sh"&gt;  &amp;lt;tags&amp;gt;Content&amp;lt;/tags&amp;gt;&lt;/span&gt;
&lt;span class="sh"&gt;&amp;lt;/xml&amp;gt;&lt;/span&gt;
&lt;span class="no"&gt;END_STR&lt;/span&gt;
&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="c1"&gt;#&amp;lt;StringIO:0x007fd5a264fa08&amp;gt;&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gets&lt;/span&gt;
&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;"&amp;lt;xml&amp;gt;&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gets&lt;/span&gt;
&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;"  &amp;lt;tags&amp;gt;Content&amp;lt;/tags&amp;gt;&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;I didn't want to mess with external files for these trivial examples, so I just loaded &lt;code&gt;StringIO&lt;/code&gt; from the standard library.  It allows us to wrap a simple &lt;code&gt;String&lt;/code&gt; (defined in this example using &lt;a href="http://graysoftinc.com/ruby-voodoo/working-with-multiline-strings"&gt;the &lt;em&gt;heredoc&lt;/em&gt; syntax&lt;/a&gt;) in the &lt;code&gt;IO&lt;/code&gt; interface.  In other words, I'm calling &lt;code&gt;gets()&lt;/code&gt; here for a &lt;code&gt;String&lt;/code&gt; just as I could with a &lt;code&gt;File&lt;/code&gt; or &lt;code&gt;$stdin&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;As the last two calls show, &lt;code&gt;gets()&lt;/code&gt; reads until it finds a &lt;code&gt;"\n"&lt;/code&gt; and then returns the content read.  Actually, that's what it does by default, but you can tell &lt;code&gt;gets()&lt;/code&gt; what character to read to, if you prefer:&lt;/p&gt;

&lt;div class="highlight highlight-ruby"&gt;&lt;pre&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rewind&lt;/span&gt;
&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gets&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;"&amp;lt;xml&amp;gt;"&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gets&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;  &amp;lt;tags&amp;gt;"&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gets&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;"Content&amp;lt;/tags&amp;gt;"&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;When you're working with XML documents, newlines don't really mean much.  You don't actually care where they are.  What you do care about are tags.  Reading from tag to tag is like reading one of those great books that skip the boring bits to give you interesting scene after interesting scene.&lt;/p&gt;

&lt;p&gt;As you can see above, one tiny change to the &lt;code&gt;gets()&lt;/code&gt; call, specifying the character to read to as the tag ending &lt;code&gt;"&amp;gt;"&lt;/code&gt;, can make this happen.&lt;/p&gt;

&lt;p&gt;"But wait, there's more!"&lt;/p&gt;

&lt;div class="highlight highlight-ruby"&gt;&lt;pre&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;StringIO&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"One&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;Two&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;Three"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="c1"&gt;#&amp;lt;StringIO:0x007fd5a260efa8&amp;gt;&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gets&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;"One&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gets&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;"Two&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The empty &lt;code&gt;String&lt;/code&gt; (&lt;code&gt;""&lt;/code&gt;) is a magic value for the character to read to, since it makes no sense as that value.  This turns on &lt;em&gt;paragraph mode&lt;/em&gt; and in that mode Ruby will read one paragraph at a time.  For this purpose a paragraphs are defined as being separated by two consecutive newlines (or a blank line in word processor terms).&lt;/p&gt;

&lt;p&gt;These aren't even all the features of &lt;code&gt;gets()&lt;/code&gt;.  It can do more.  For example, you can provide an upper limit of bytes to read, to prevent wonky input from forcing your program to allocate the tons of memory to hold large Ruby &lt;code&gt;String&lt;/code&gt; objects.&lt;/p&gt;

&lt;p&gt;Let's look at another method.&lt;/p&gt;

&lt;h4&gt;
&lt;code&gt;Hash&lt;/code&gt; Merging&lt;/h4&gt;

&lt;p&gt;Many Ruby methods sneak their rich functionality in through the use of blocks.  Deferring some decision to the caller by allowing them to provide custom code for handling it makes some methods crazy flexible.&lt;/p&gt;

&lt;p&gt;To show what I mean, let's play with good old &lt;code&gt;merge()&lt;/code&gt;:&lt;/p&gt;

&lt;div class="highlight highlight-ruby"&gt;&lt;pre&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="ss"&gt;a&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;b&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;merge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;c&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;d&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="ss"&gt;:a&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:b&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:c&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:d&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Most Rubyists run into examples like this pretty early in their studies.  The code just returns a fresh &lt;code&gt;Hash&lt;/code&gt; containing the keys and values of both the receiver and the &lt;code&gt;Hash&lt;/code&gt; passed as an argument to &lt;code&gt;merge()&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;How are ties handled?&lt;/p&gt;

&lt;div class="highlight highlight-ruby"&gt;&lt;pre&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="ss"&gt;a&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;b&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;merge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;b&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="ss"&gt;:two&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;c&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="ss"&gt;:a&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:b&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="ss"&gt;:two&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:c&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;Hash&lt;/code&gt; passed as an argument to &lt;code&gt;merge()&lt;/code&gt; wins.  Again, I doubt this is much of a surprise to anyone.&lt;/p&gt;

&lt;p&gt;However, I don't think everyone knows that you can take control of this merging process.  During a &lt;code&gt;merge()&lt;/code&gt; any conflict will be passed to a block, if provided, and the block can return what to store in the new &lt;code&gt;Hash&lt;/code&gt;:&lt;/p&gt;

&lt;div class="highlight highlight-ruby"&gt;&lt;pre&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="ss"&gt;a&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;b&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;merge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;b&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="ss"&gt;:two&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;c&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;old&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kp"&gt;new&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="nb"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;old&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nb"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kp"&gt;new&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="ss"&gt;:a&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:b&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:two&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:c&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;You can throw away either item, log the conflict, combine them as I have done here, or do whatever else you can think of, all because &lt;code&gt;merge()&lt;/code&gt; takes a block.&lt;/p&gt;

&lt;p&gt;Can you guess &lt;a href="https://github.com/rails/rails/blob/42e66fac38b54dd53d062fb5d3376218ed2ffdae/activesupport/lib/active_support/core_ext/hash/reverse_merge.rb#L17-L20"&gt;how &lt;code&gt;ActiveSupport&lt;/code&gt; implements &lt;code&gt;reverse_merge!()&lt;/code&gt;&lt;/a&gt; now?&lt;/p&gt;

&lt;h4&gt;Easy Tokenizing&lt;/h4&gt;

&lt;p&gt;Let's do one last method with a rich interface (even though Ruby has many more):&lt;/p&gt;

&lt;div class="highlight highlight-ruby"&gt;&lt;pre&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;"1,2,3"&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;","&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"3"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This is another very common method.  It turns a &lt;code&gt;String&lt;/code&gt; into an &lt;code&gt;Array&lt;/code&gt; by dividing up the contents everywhere the passed separator is encountered.  I used a &lt;code&gt;String&lt;/code&gt; separator above but a &lt;code&gt;Regexp&lt;/code&gt; is also allowed:&lt;/p&gt;

&lt;div class="highlight highlight-ruby"&gt;&lt;pre&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;"1, 2, 3"&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/\s*,\s*/&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"3"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This makes it easier to handle complex separators.  For example, the &lt;code&gt;Regexp&lt;/code&gt; above permits optional whitespace characters on either side of the comma.&lt;/p&gt;

&lt;p&gt;But a &lt;code&gt;Regexp&lt;/code&gt; can include capture groups.  How are they handled?&lt;/p&gt;

&lt;div class="highlight highlight-ruby"&gt;&lt;pre&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;"1, 2, 3"&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/\s*(,)\s*/&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;","&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;","&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"3"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Easy enough:  the captured value(s) are returned with the separated contents.&lt;/p&gt;

&lt;p&gt;The real question this raises for me is, "What the heck is this feature good for?"  Well, one thing I have found over the years is that this usage of &lt;code&gt;split()&lt;/code&gt; can make dividing some input into tokens pretty darn easy:&lt;/p&gt;

&lt;div class="highlight highlight-ruby"&gt;&lt;pre&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;"&amp;lt;xml&amp;gt;&amp;lt;tags&amp;gt;Content&amp;lt;/tags&amp;gt;&amp;lt;/xml&amp;gt;"&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/(&amp;lt;[^&amp;gt;]+&amp;gt;)/&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&amp;lt;xml&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&amp;lt;tags&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"Content"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&amp;lt;/tags&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&amp;lt;/xml&amp;gt;"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;You can use this one feature as a backbone for a moderately complex parser.  &lt;a href="https://github.com/soveran/mote/blob/b43b3879076dade130aac8c34b76cb06caf26e35/lib/mote.rb#L23-L26"&gt;&lt;em&gt;Mote&lt;/em&gt; does&lt;/a&gt; just that.&lt;/p&gt;

&lt;p&gt;I made &lt;a href="https://codalyzed.com/videos/lesscode"&gt;a video explaining how this parsing trick (and more) are accomplished&lt;/a&gt; in detail.  You can use the coupon &lt;code&gt;BLOGREADER&lt;/code&gt; for $3 off if you want to check it out.&lt;/p&gt;</content>
    <author>
      <name>James Edward Gray II</name>
    </author>
  </entry>
  <entry>
    <title>"You can't parse [X]HTML with regex."</title>
    <link rel="alternate" href="http://graysoftinc.com/deadly-regular-expressions/you-cant-parse-x-html-with-regex"/>
    <id>tag:graysoftinc.com,2014-09-19:/posts/129</id>
    <updated>2014-09-19T17:02:46Z</updated>
    <summary>Everyone knows you can't parse HTML with a regular expression.  It's totally a given at this point.  So let's do it anyway.</summary>
    <content type="html">&lt;p&gt;The only explanation I'll give for the following code it to provide this link to &lt;a href="http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454"&gt;my favorite Stack Overflow answer&lt;/a&gt;.&lt;/p&gt;

&lt;div class="highlight highlight-ruby"&gt;&lt;pre&gt;&lt;span class="c1"&gt;#!/usr/bin/env ruby -w&lt;/span&gt;

&lt;span class="nb"&gt;require&lt;/span&gt; &lt;span class="s2"&gt;"open-uri"&lt;/span&gt;

&lt;span class="no"&gt;URL&lt;/span&gt;    &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"http://stackoverflow.com/questions/1732348/"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
         &lt;span class="s2"&gt;"regex-match-open-tags-except-xhtml-self-contained-tags"&lt;/span&gt;
&lt;span class="no"&gt;PARSER&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sr"&gt;%r{&lt;/span&gt;
&lt;span class="sr"&gt;  (?&amp;lt;doctype_declaration&amp;gt;&lt;/span&gt;
&lt;span class="sr"&gt;    &amp;lt;!DOCTYPE\b (?&amp;lt;doctype&amp;gt; [^&amp;gt;]* ) &amp;gt;&lt;/span&gt;
&lt;span class="sr"&gt;  ){0}&lt;/span&gt;
&lt;span class="sr"&gt;  (?&amp;lt;comment&amp;gt;&lt;/span&gt;
&lt;span class="sr"&gt;    &amp;lt;!-- .* --&amp;gt;&lt;/span&gt;
&lt;span class="sr"&gt;  ){0}&lt;/span&gt;

&lt;span class="sr"&gt;  (?&amp;lt;script_tag&amp;gt;&lt;/span&gt;
&lt;span class="sr"&gt;    &amp;lt; \s* (?&amp;lt;tag_name&amp;gt; script ) \s* (?&amp;lt;attributes&amp;gt; [^&amp;gt;]* &amp;gt; )&lt;/span&gt;
&lt;span class="sr"&gt;      (?&amp;lt;script&amp;gt; .*? )&lt;/span&gt;
&lt;span class="sr"&gt;    &amp;lt; \s* / \s* script \s* &amp;gt;&lt;/span&gt;
&lt;span class="sr"&gt;  ){0}&lt;/span&gt;
&lt;span class="sr"&gt;  (?&amp;lt;self_closed_tag&amp;gt;&lt;/span&gt;
&lt;span class="sr"&gt;    &amp;lt; \s* (?&amp;lt;tag_name&amp;gt; \w+ ) \s* (?&amp;lt;attributes&amp;gt; [^&amp;gt;]* / \s* &amp;gt; )&lt;/span&gt;
&lt;span class="sr"&gt;  ){0}&lt;/span&gt;
&lt;span class="sr"&gt;  (?&amp;lt;unclosed_tag&amp;gt;&lt;/span&gt;
&lt;span class="sr"&gt;    &amp;lt; \s*&lt;/span&gt;
&lt;span class="sr"&gt;    (?&amp;lt;tag_name&amp;gt; link | meta | br | input | hr | img ) \b&lt;/span&gt;
&lt;span class="sr"&gt;    \s*&lt;/span&gt;
&lt;span class="sr"&gt;    (?&amp;lt;attributes&amp;gt; [^&amp;gt;]* &amp;gt; )&lt;/span&gt;
&lt;span class="sr"&gt;  ){0}&lt;/span&gt;
&lt;span class="sr"&gt;  (?&amp;lt;open_tag&amp;gt;&lt;/span&gt;
&lt;span class="sr"&gt;    &amp;lt; \s* (?&amp;lt;tag_name&amp;gt; \w+ ) \s* (?&amp;lt;attributes&amp;gt; [^&amp;gt;]* &amp;gt; )&lt;/span&gt;
&lt;span class="sr"&gt;  ){0}&lt;/span&gt;
&lt;span class="sr"&gt;  (?&amp;lt;close_tag&amp;gt;&lt;/span&gt;
&lt;span class="sr"&gt;    &amp;lt; \s* / \s* (?&amp;lt;tag_name&amp;gt; \w+ ) \s* &amp;gt;&lt;/span&gt;
&lt;span class="sr"&gt;  ){0}&lt;/span&gt;

&lt;span class="sr"&gt;  (?&amp;lt;attribute&amp;gt;&lt;/span&gt;
&lt;span class="sr"&gt;    (?&amp;lt;attribute_name&amp;gt; [-\w]+ )&lt;/span&gt;
&lt;span class="sr"&gt;    (?: \s* = \s* (?&amp;lt;attribute_value&amp;gt; "[^"]*" | '[^']*' | [^&amp;gt;\s]+ ) )? \s*&lt;/span&gt;
&lt;span class="sr"&gt;  ){0}&lt;/span&gt;
&lt;span class="sr"&gt;  (?&amp;lt;attribute_list&amp;gt;&lt;/span&gt;
&lt;span class="sr"&gt;    \g&amp;lt;attribute&amp;gt;&lt;/span&gt;
&lt;span class="sr"&gt;    (?= [^&amp;gt;]* &amp;gt; \z )  # attributes keep a trailing &amp;gt; to disambiguate from text&lt;/span&gt;
&lt;span class="sr"&gt;  ){0}&lt;/span&gt;

&lt;span class="sr"&gt;  (?&amp;lt;text&amp;gt;&lt;/span&gt;
&lt;span class="sr"&gt;    (?! [^&amp;lt;]* /?\s*&amp;gt; \z )  # a guard to prevent this from parsing attributes&lt;/span&gt;
&lt;span class="sr"&gt;    [^&amp;lt;]+&lt;/span&gt;
&lt;span class="sr"&gt;  ){0}&lt;/span&gt;

&lt;span class="sr"&gt;  \G&lt;/span&gt;
&lt;span class="sr"&gt;  (?:&lt;/span&gt;
&lt;span class="sr"&gt;    \g&amp;lt;doctype_declaration&amp;gt;&lt;/span&gt;
&lt;span class="sr"&gt;    |&lt;/span&gt;
&lt;span class="sr"&gt;    \g&amp;lt;comment&amp;gt;&lt;/span&gt;
&lt;span class="sr"&gt;    |&lt;/span&gt;
&lt;span class="sr"&gt;    \g&amp;lt;script_tag&amp;gt;&lt;/span&gt;
&lt;span class="sr"&gt;    |&lt;/span&gt;
&lt;span class="sr"&gt;    \g&amp;lt;self_closed_tag&amp;gt;&lt;/span&gt;
&lt;span class="sr"&gt;    |&lt;/span&gt;
&lt;span class="sr"&gt;    \g&amp;lt;unclosed_tag&amp;gt;&lt;/span&gt;
&lt;span class="sr"&gt;    |&lt;/span&gt;
&lt;span class="sr"&gt;    \g&amp;lt;open_tag&amp;gt;&lt;/span&gt;
&lt;span class="sr"&gt;    |&lt;/span&gt;
&lt;span class="sr"&gt;    \g&amp;lt;attribute_list&amp;gt;&lt;/span&gt;
&lt;span class="sr"&gt;    |&lt;/span&gt;
&lt;span class="sr"&gt;    \g&amp;lt;close_tag&amp;gt;&lt;/span&gt;
&lt;span class="sr"&gt;    |&lt;/span&gt;
&lt;span class="sr"&gt;    \g&amp;lt;text&amp;gt;&lt;/span&gt;
&lt;span class="sr"&gt;  )&lt;/span&gt;
&lt;span class="sr"&gt;  \s*&lt;/span&gt;
&lt;span class="sr"&gt;}mix&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;html&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;stack&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="ss"&gt;attributes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;contents&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="ss"&gt;:root&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
  &lt;span class="kp"&gt;loop&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
    &lt;span class="n"&gt;html&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sub!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;PARSER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="k"&gt;break&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="vg"&gt;$~&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:doctype_declaration&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
      &lt;span class="n"&gt;add_to_tree&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stack&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;last&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"DOCTYPE"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="vg"&gt;$~&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:doctype&lt;/span&gt;&lt;span class="o"&gt;].&lt;/span&gt;&lt;span class="n"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;elsif&lt;/span&gt; &lt;span class="vg"&gt;$~&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:script_tag&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
      &lt;span class="n"&gt;add_to_stack&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stack&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="vg"&gt;$~&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:tag_name&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="vg"&gt;$~&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:attributes&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="vg"&gt;$~&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:script&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;elsif&lt;/span&gt; &lt;span class="vg"&gt;$~&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:self_closed_tag&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="vg"&gt;$~&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:unclosed_tag&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="vg"&gt;$~&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:open_tag&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
      &lt;span class="n"&gt;add_to_stack&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stack&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="vg"&gt;$~&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:tag_name&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="vg"&gt;$~&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:attributes&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="vg"&gt;$~&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:open_tag&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;elsif&lt;/span&gt; &lt;span class="vg"&gt;$~&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:close_tag&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
      &lt;span class="n"&gt;stack&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pop&lt;/span&gt;
    &lt;span class="k"&gt;elsif&lt;/span&gt; &lt;span class="vg"&gt;$~&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:text&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
      &lt;span class="n"&gt;stack&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;last&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:contents&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="vg"&gt;$~&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:text&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
  &lt;span class="n"&gt;stack&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pop&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;add_to_tree&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;branch&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;branch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;include?&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;branch&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;name&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;branch&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;name&lt;/span&gt;&lt;span class="o"&gt;]]&lt;/span&gt; &lt;span class="k"&gt;unless&lt;/span&gt; &lt;span class="n"&gt;branch&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;name&lt;/span&gt;&lt;span class="o"&gt;].&lt;/span&gt;&lt;span class="n"&gt;is_a?&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;branch&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;name&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;
  &lt;span class="k"&gt;else&lt;/span&gt;
    &lt;span class="n"&gt;branch&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;name&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;add_to_stack&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stack&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tag_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;attributes_html&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;open&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kp"&gt;false&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;tag&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="ss"&gt;attributes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;parse_attributes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;attributes_html&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
          &lt;span class="ss"&gt;contents&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;   &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="o"&gt;].&lt;/span&gt;&lt;span class="n"&gt;reject&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="ss"&gt;:empty?&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
          &lt;span class="nb"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;       &lt;span class="n"&gt;tag_name&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="n"&gt;add_to_tree&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stack&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;last&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tag_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tag&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;stack&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;last&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:contents&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;tag&lt;/span&gt;
  &lt;span class="n"&gt;stack&lt;/span&gt;                 &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;tag&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nb"&gt;open&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;parse_attributes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;attributes_html&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;attributes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="kp"&gt;loop&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
    &lt;span class="n"&gt;attributes_html&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sub!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;PARSER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="k"&gt;break&lt;/span&gt;
    &lt;span class="n"&gt;add_to_tree&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="n"&gt;attributes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="vg"&gt;$~&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:attribute_name&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="vg"&gt;$~&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:attribute_value&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="vg"&gt;$~&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:attribute_name&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/\A(["'])(.*)\1\z/&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'\2'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
  &lt;span class="n"&gt;attributes&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;convert_to_bbcode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;is_a?&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;Hash&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nb"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:name&lt;/span&gt;&lt;span class="o"&gt;].&lt;/span&gt;&lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/\Astrike\z/&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"s"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="s2"&gt;"[&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="nb"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;]&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:contents&lt;/span&gt;&lt;span class="o"&gt;].&lt;/span&gt;&lt;span class="n"&gt;map&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="nb"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;__method__&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.join}[/&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="nb"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;]"&lt;/span&gt;
  &lt;span class="k"&gt;else&lt;/span&gt;
    &lt;span class="n"&gt;node&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="n"&gt;html&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;URL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="ss"&gt;:read&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;strip&lt;/span&gt;
&lt;span class="n"&gt;ast&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;html&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="n"&gt;ast&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"html"&lt;/span&gt;&lt;span class="o"&gt;][&lt;/span&gt;&lt;span class="s2"&gt;"body"&lt;/span&gt;&lt;span class="o"&gt;][&lt;/span&gt;&lt;span class="s2"&gt;"div"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
  &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;find&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;div&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;div&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:attributes&lt;/span&gt;&lt;span class="o"&gt;][&lt;/span&gt;&lt;span class="s2"&gt;"class"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"container"&lt;/span&gt;      &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"div"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
  &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;find&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;div&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;div&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:attributes&lt;/span&gt;&lt;span class="o"&gt;][&lt;/span&gt;&lt;span class="s2"&gt;"id"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;    &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"content"&lt;/span&gt;        &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"div"&lt;/span&gt;&lt;span class="o"&gt;][&lt;/span&gt;&lt;span class="s2"&gt;"div"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
  &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;find&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;div&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;div&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:attributes&lt;/span&gt;&lt;span class="o"&gt;][&lt;/span&gt;&lt;span class="s2"&gt;"id"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;    &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"mainbar"&lt;/span&gt;        &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"div"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
  &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;find&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;div&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;div&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:attributes&lt;/span&gt;&lt;span class="o"&gt;][&lt;/span&gt;&lt;span class="s2"&gt;"id"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;    &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"answers"&lt;/span&gt;        &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"div"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
  &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;find&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;div&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;div&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:attributes&lt;/span&gt;&lt;span class="o"&gt;][&lt;/span&gt;&lt;span class="s2"&gt;"id"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;    &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"answer-1732454"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"table"&lt;/span&gt;&lt;span class="o"&gt;][&lt;/span&gt;&lt;span class="s2"&gt;"tr"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
  &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;first&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"td"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
  &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;find&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;div&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;div&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:attributes&lt;/span&gt;&lt;span class="o"&gt;][&lt;/span&gt;&lt;span class="s2"&gt;"class"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"answercell"&lt;/span&gt;     &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"div"&lt;/span&gt;&lt;span class="o"&gt;][&lt;/span&gt;&lt;span class="s2"&gt;"p"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
  &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;first&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="ss"&gt;:contents&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
  &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nb"&gt;method&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;:convert_to_bbcode&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;  &lt;span class="c1"&gt;# to reach a wider audience&lt;/span&gt;
  &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;join&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;pre&gt;&lt;code&gt;$ ruby html_parser.rb
You can't parse [X]HTML with regex. Because HTML can't be parsed by regex.
Regex is not a tool that can be used to correctly parse HTML. As I have
answered in HTML-and-regex questions here so many times before, the use of
regex will not allow you to consume HTML. Regular expressions are a tool that
is insufficiently sophisticated to understand the constructs employed by HTML.
HTML is not a regular language and hence cannot be parsed by regular
expressions. Regex queries are not equipped to break down HTML into its
meaningful parts. so many times but it is not getting to me. Even enhanced
irregular regular expressions as used by Perl are not up to the task of parsing
HTML. You will never make me crack. HTML is a language of sufficient
complexity that it cannot be parsed by regular expressions. Even Jon Skeet
cannot parse HTML using regular expressions. Every time you attempt to
parse HTML with regular expressions, the unholy child weeps the blood of
virgins, and Russian hackers pwn your webapp. Parsing HTML with regex
summons tainted souls into the realm of the living. HTML and regex go
together like love, marriage, and ritual infanticide. The &amp;amp;lt;center&amp;gt; cannot hold
it is too late. The force of regex and HTML together in the same conceptual space
will destroy your mind like so much watery putty. If you parse HTML with regex
you are giving in to Them and their blasphemous ways which doom us all to
inhuman toil for the One whose Name cannot be expressed in the Basic
Multilingual Plane, he comes. HTML-plus-regexp will liquify the n​erves of the
sentient whilst you observe, your psyche withering in the onslaught of horror.
Rege̿̔̉x-based HTML parsers are the cancer that is killing StackOverflow [i]it is too
late it is too late we cannot be saved[/i]the trangession of a chi͡ld ensures regex
will consume all living tissue (except for HTML which it cannot, as previously
prophesied) [i]dear lord help us how can anyone survive this scourge[/i]using
regex to parse HTML has doomed humanity to an eternity of dread torture and
security holes [i]using rege[/i]x as a tool to process HTML establishes a brea[i]ch
between this world[/i]and the dread realm of c͒ͪo͛ͫrrupt entities (like SGML entities,
but [i]more corrupt) a mere glimp[/i]se of the world of reg​[b]ex parsers for HTML
will ins[/b]​tantly transport a p[i]rogrammer's consciousness i[/i]nto a w[i]orl[/i]d of
ceaseless screaming, he comes[s], the pestilent sl[/s]ithy regex-infection wil​[b]l
devour your HT[/b]​ML parser, application and existence for all time like Visual
Basic only worse [i]he comes he com[/i]es [i]do not fi[/i]​ght h[b]e com̡e̶s, ̕h̵i[/b]​s
un̨ho͞ly radiańcé de[i]stro҉ying all enli̍̈́̂̈́ghtenment, HTML tags [b]lea͠ki̧n͘g fr̶ǫm ̡yo​͟ur
eye͢s̸ ̛l̕ik͏e liq[/b]​uid p[/i]ain, the song of re̸gular exp​re[s]ssion parsing [/s]will
exti[i]​nguish the voices of mor​[b]tal man from the sp[/b]​here I can see it can you
see ̲͚̖͔̙î̩́t̲͎̩̱͔́̋̀ it is beautiful t​[/i]he f[code]inal snuf[/code]fing o[i]f the lie​[b]s of Man ALL IS
LOŚ͖̩͇̗̪̏̈́T A[/b][/i][b]LL I​S L[/b]OST th[i]e pon̷y he come[/i]s he c̶̮om[s]es he
co[/s][b][s]me[/s]s t[i]he[/i]ich​[/b]or permeat[i]es al[/i]l MY FAC[i]E MY FACE ᵒh god
n[b]o NO NOO̼[/b][/i][b]O​O N[/b]Θ stop t[i]he an​*̶͑̾̾​̅ͫ͏̙̤g͇̫͛͆̾ͫ̑͆l͖͉̗̩̳̟̍ͫͥͨ[/i]e̠̅s[code]͎a̧͈͖r̽̾̈́͒͑e[/code]n[b]​ot rè̑ͧ̌aͨl̘̝̙̃ͤ͂̾̆
ZA̡͊͠͝LGΌ ISͮ̂҉̯͈͕̹̘̱ T[/b]O͇̹̺ͅƝ̴ȳ̳ TH̘[b]Ë͖́̉ ͠P̯͍̭O̚​N̐Y̡ H̸̡̪̯ͨ͊̽̅̾̎Ȩ̬̩̾͛ͪ̈́̀́͘ ̶̧̨̱̹̭̯ͧ̾ͬC̷̙̲̝͖ͭ̏ͥͮ͟Oͮ͏̮̪̝͍M̲̖͊̒ͪͩͬ̚̚͜Ȇ̴̟̟͙̞ͩ͌͝[/b]S̨̥̫͎̭ͯ̿̔̀ͅ
&lt;/code&gt;&lt;/pre&gt;</content>
    <author>
      <name>James Edward Gray II</name>
    </author>
  </entry>
  <entry>
    <title>Doing it Wrong</title>
    <link rel="alternate" href="http://graysoftinc.com/rubies-in-the-rough/doing-it-wrong"/>
    <id>tag:graysoftinc.com,2011-11-11:/posts/108</id>
    <updated>2014-04-24T19:22:55Z</updated>
    <summary>This article takes a hard look at why the rules of programming exist, what we can learn from them, and how and when to go about safely breaking them.</summary>
    <content type="html">&lt;p&gt;Continuing with my &lt;em&gt;Breaking All of the Rules&lt;/em&gt; series, I want to peek into several little areas where I've been caught doing the wrong thing.  I'm a rule breaker and I'm determined to take someone down with me!&lt;/p&gt;

&lt;h4&gt;My Forbidden Parser&lt;/h4&gt;

&lt;p&gt;In one application, I work with an API that hands me very simple data like this:&lt;/p&gt;

&lt;div class="highlight highlight-xml"&gt;&lt;pre&gt;&lt;span class="nt"&gt;&amp;lt;emails&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;email&amp;gt;&lt;/span&gt;user1@example.com&lt;span class="nt"&gt;&amp;lt;/email&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;email&amp;gt;&lt;/span&gt;user2@example.com&lt;span class="nt"&gt;&amp;lt;/email&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;email&amp;gt;&lt;/span&gt;user3@example.com&lt;span class="nt"&gt;&amp;lt;/email&amp;gt;&lt;/span&gt;
  …
&lt;span class="nt"&gt;&amp;lt;/emails&amp;gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Now I need to make a dirty confession:  I parsed this with a Regular Expression.&lt;/p&gt;

&lt;p&gt;I know, I know.  We should &lt;a href="http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags#answer-1732454"&gt;never parse HTML or XML with a Regular Expression&lt;/a&gt;.  If you don't believe me, just take a moment to actually read that response.  Yikes!&lt;/p&gt;

&lt;p&gt;Oh and you shouldn't validate emails with a Regular Expression.  Oops.  We're talking about at least two violations here.&lt;/p&gt;

&lt;p&gt;But it gets worse.&lt;/p&gt;

&lt;p&gt;You may be think I rolled a little parser based on Regular Expressions.  That might look like this:&lt;/p&gt;

&lt;div class="highlight highlight-ruby"&gt;&lt;pre&gt;&lt;span class="c1"&gt;#!/usr/bin/env ruby -w&lt;/span&gt;

&lt;span class="nb"&gt;require&lt;/span&gt; &lt;span class="s2"&gt;"strscan"&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;EmailParser&lt;/span&gt;
  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;initialize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="vi"&gt;@scanner&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;StringScanner&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;parse_emails&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="kp"&gt;private&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;parse_emails&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="vi"&gt;@scanner&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;%r{\s*&amp;lt;emails&amp;gt;\s*}&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="nb"&gt;fail&lt;/span&gt; &lt;span class="s2"&gt;"Failed to match list start"&lt;/span&gt;
    &lt;span class="kp"&gt;loop&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
      &lt;span class="n"&gt;parse_email&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="k"&gt;break&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;
    &lt;span class="vi"&gt;@scanner&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;%r{\s*&amp;lt;/emails&amp;gt;}&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="nb"&gt;fail&lt;/span&gt; &lt;span class="s2"&gt;"Failed to match list end"&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;parse_email&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="vi"&gt;@scanner&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;%r{&amp;lt;email&amp;gt;\s*}&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="vi"&gt;@scanner&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scan_until&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;%r{&amp;lt;/email&amp;gt;\s*}&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;strip&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;.&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="o"&gt;].&lt;/span&gt;&lt;span class="n"&gt;strip&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kp"&gt;true&lt;/span&gt;
      &lt;span class="k"&gt;else&lt;/span&gt;
        &lt;span class="nb"&gt;fail&lt;/span&gt; &lt;span class="s2"&gt;"Failed to match email end"&lt;/span&gt;
      &lt;span class="k"&gt;end&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;
    &lt;span class="kp"&gt;false&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="no"&gt;EmailParser&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;ARGF&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parse&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
  &lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;If you aren't familiar with &lt;code&gt;StringScanner&lt;/code&gt;, it's a standard Ruby library that just wraps a &lt;code&gt;String&lt;/code&gt; and tracks your current position in that &lt;code&gt;String&lt;/code&gt;.  You can then throw Regular Expressions at the wrapped &lt;code&gt;String&lt;/code&gt;, like I did here with &lt;code&gt;scan()&lt;/code&gt; and &lt;code&gt;scan_util()&lt;/code&gt;.  Whenever an expressions matches, your position is advanced past it.  Future expressions will then be tested at the new position.  A failed match has no effect on the position.&lt;/p&gt;

&lt;p&gt;With that understanding, the example above should be pretty easy to follow.  It just looks for the start of the emails list, works through each email in the list, and then ensures the list is closed properly.  I take a &lt;code&gt;block&lt;/code&gt; when you kick off the &lt;code&gt;parse()&lt;/code&gt; and call it each time I find a matching email to facilitate iterating over the emails.&lt;/p&gt;

&lt;p&gt;That's great to know, but it's not what I did.  I said I parsed it with &lt;strong&gt;a&lt;/strong&gt; Regular Expression.  My real solution looked more like this:&lt;/p&gt;

&lt;div class="highlight highlight-ruby"&gt;&lt;pre&gt;&lt;span class="c1"&gt;#!/usr/bin/env ruby -wKU&lt;/span&gt;

&lt;span class="no"&gt;ARGF&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/[^&amp;gt;\s@]+@[^\s@]+\.[^\s@&amp;lt;]+/&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
  &lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;If you have any sense, you should be scared blind by this point.  Fair enough, but let's talk it through.  Then you can lynch me.&lt;/p&gt;

&lt;p&gt;My Regular Expression just hunts through the data for something that roughly looks like an email address.  It's super low tech.&lt;/p&gt;

&lt;p&gt;There are also new violations in this example, for those keeping score:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;My email-like expression doesn't match some valid emails&lt;/li&gt;
&lt;li&gt;With two minor character exceptions, I'm ignoring the XML completely&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;Alright, it's time to actually address all of these violations that are piling up.  Let's start with:  don't validate emails with a Regular Expression.&lt;/p&gt;

&lt;p&gt;I'm not.&lt;/p&gt;

&lt;p&gt;Man, that was easy.  One down.&lt;/p&gt;

&lt;p&gt;No, seriously, I'm not.  I'm just hunting for emails in some data.  Perhaps the API validates them before it sends them down.  Perhaps we will go on to validate the addresses after we have them, most likely by actually sending a message to the address.  None of this is relevant to what I'm doing here.  I'm just working with what I was given.&lt;/p&gt;

&lt;p&gt;As always though, I would rather ask why this rule exists.  What is it trying to protect me from?  In this case, it's two things.  First, it's difficult to match a well-formed email address.  The specification is quite complex.  I bet you could do it with a Regular Expression (I've never tried), but it's not going to be pretty.&lt;/p&gt;

&lt;p&gt;Honesty, I think that's an argument in my favor.  I don't worry about validating at all, because you shouldn't.  I look for an email email-ish thing and leave validation to the code properly suited to the task.&lt;/p&gt;

&lt;p&gt;On a related note, my email-like expression isn't perfect.  It doesn't match the entire email specification.  For example, &lt;code&gt;james@localhost&lt;/code&gt; is a valid email address that I don't match.  However, that address would be for a specific machine I couldn't send email to anyway.  Given that, I'm inclined to call this a feature instead of a bug.&lt;/p&gt;

&lt;p&gt;I'm not sure what the specification says about emails containing &amp;lt; or &amp;gt;, to be painfully honest.  But they should be escaped in XML anyway, so I get a buy on that issue.&lt;/p&gt;

&lt;p&gt;Parsing XML with a Regular Expression?  Again, I'm going to argue that it never happened.  We already decided that I ignored the XML, right?  So I'm off the hook there.&lt;/p&gt;

&lt;p&gt;Still, why is it that we are expected to use a real parser over anything else?  Well, XML can be complicated, right?  I mean, this XML is still a valid representation of the same list:&lt;/p&gt;

&lt;div class="highlight highlight-xml"&gt;&lt;pre&gt;&lt;span class="nt"&gt;&amp;lt;emails&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;email&amp;gt;&lt;/span&gt;
    user1@example.com
  &lt;span class="nt"&gt;&amp;lt;/email&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;email&amp;gt;&lt;/span&gt;
    user2@example.com
  &lt;span class="nt"&gt;&amp;lt;/email&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;email&amp;gt;&lt;/span&gt;
    user3@example.com
  &lt;span class="nt"&gt;&amp;lt;/email&amp;gt;&lt;/span&gt;
  …
&lt;span class="nt"&gt;&amp;lt;/emails&amp;gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;A parser just handles differences like that for you.  But you know what?  So does my solution.  Try it!&lt;/p&gt;

&lt;p&gt;OK, but the XML could really change.  Let's say it became this:&lt;/p&gt;

&lt;div class="highlight highlight-xml"&gt;&lt;pre&gt;&lt;span class="nt"&gt;&amp;lt;users&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;user&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;first&amp;gt;&lt;/span&gt;User&lt;span class="nt"&gt;&amp;lt;/first&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;last&amp;gt;&lt;/span&gt;One&lt;span class="nt"&gt;&amp;lt;/last&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;email&amp;gt;&lt;/span&gt;user1@example.com&lt;span class="nt"&gt;&amp;lt;/email&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;/user&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;user&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;first&amp;gt;&lt;/span&gt;User&lt;span class="nt"&gt;&amp;lt;/first&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;last&amp;gt;&lt;/span&gt;Two&lt;span class="nt"&gt;&amp;lt;/last&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;email&amp;gt;&lt;/span&gt;user2@example.com&lt;span class="nt"&gt;&amp;lt;/email&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;/user&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;user&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;first&amp;gt;&lt;/span&gt;User&lt;span class="nt"&gt;&amp;lt;/first&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;last&amp;gt;&lt;/span&gt;Three&lt;span class="nt"&gt;&amp;lt;/last&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;email&amp;gt;&lt;/span&gt;user3@example.com&lt;span class="nt"&gt;&amp;lt;/email&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;/user&amp;gt;&lt;/span&gt;
  …
&lt;span class="nt"&gt;&amp;lt;/users&amp;gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Now that's an interesting change, because I think it's pretty unlikely that a parser-based solution wouldn't need updates to handle that.  However, my solution still works just the same.&lt;/p&gt;

&lt;p&gt;In fact, since I am ignoring the XML, I can handle non-XML data such as this boring text list:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;user1@example.com
user2@example.com
user3@example.com
…
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Not that my solution is perfect, of course.  It wouldn't handle a CSV list without changes, for example.  Still it does seem surprisingly flexible.&lt;/p&gt;

&lt;p&gt;In the end though, it comes down to one point for me.  If this thing breaks, I'm out a few minutes of work.  It was very little effort to setup and it worked fine for our needs at the time.  If that changes, I can always replace it.&lt;/p&gt;

&lt;p&gt;This just didn't seem like a huge issue.  Given that, I didn't want to spend a lot of energy on it unless I was sure it was needed.  A solution should generally be in the same scale as the problem itself, in my opinion.&lt;/p&gt;

&lt;h4&gt;There's a Wrong Time for Everything&lt;/h4&gt;

&lt;p&gt;Let's move on to a separate issue.&lt;/p&gt;

&lt;p&gt;You know how you should schedule all of that pesky background work for off-peak hours so you don't hassle the users?  Well, I'm not very good at that either.  Have a look at these two &lt;code&gt;/etc/crontab&lt;/code&gt; entries from this application's server:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;00 17 * * *  deploy  … rake payments:schedule_due …
30 20 * * *  deploy  … rake reports:email_admin_statistics …
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;These times are UTC, and if you adjust for that, the first one lands right around lunchtime where I live in the middle of Central Time (11 AM or 12 PM depending on Daylight Savings Time).  That means I'm doing the busy work right in the middle of the day.  In fact, I chose that time because I am in the middle of the U.S. and adjusting for the time zones around me should still make it mid-morning or mid-afternoon.  I wanted to make sure it's in the middle of most users' days.&lt;/p&gt;

&lt;p&gt;Why?  Well, that first job triggers the submission of due payments.  There are a few steps for that process, but it will generally be resolved within a couple of hours.  That means, anything in the process requiring user attention, like a failed payment, should be found when they have a decent chance of being available to do something about it.&lt;/p&gt;

&lt;p&gt;To further that goal, the second job kicks in mid-afternoon and sends me a report of how everything went today.  If something does need addressing, I'm probably still at work to handle it.  My users are probably still around too, should I need to discuss anything with them.&lt;/p&gt;

&lt;p&gt;You get the idea.  We do work on off-times so it doesn't bother our users.  However, if that work might need to involve users, doing it on their time can totally make sense.&lt;/p&gt;

&lt;h4&gt;Serial Storage Violations&lt;/h4&gt;

&lt;p&gt;If I will violate parsing and processing laws, you just know my database usage has to be suspect.  You're right.&lt;/p&gt;

&lt;p&gt;One of the rules for relational data storage is that we shouldn't serialize a bunch of data into some field.  One way we sometimes see that done in Rails is with code list this:&lt;/p&gt;

&lt;div class="highlight highlight-ruby"&gt;&lt;pre&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Transaction&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;ActiveRecord&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Base&lt;/span&gt;
  &lt;span class="n"&gt;serialize&lt;/span&gt; &lt;span class="ss"&gt;:data&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;By default that will convert whatever Ruby object is assigned to &lt;code&gt;data&lt;/code&gt; into YAML, for storage.  Then, when you retrieve the record back, the &lt;code&gt;data&lt;/code&gt; field will be restored to its (likely non-&lt;code&gt;String&lt;/code&gt;) form.  You could even arrange to have the serialization done in a binary format or whatever you like, YAML is just the default.&lt;/p&gt;

&lt;p&gt;Why don't we want to store data like that?  Mainly because it can make it difficult to impossible to query on.  That's painful because we would be giving up one of the primary reasons to use a relational database in the first place.&lt;/p&gt;

&lt;p&gt;Of course, there's always an assumption hidden in rules like this.  This time it is:  you will need to query by the serialized data.&lt;/p&gt;

&lt;p&gt;In the example shown above, it's not going to happen.  A &lt;code&gt;Transaction&lt;/code&gt; object in that system records fields passed to the application in response to API calls.  An API typically gives us quite a bit of data and we typically care about just a couple of fields.  Of course, we may want that data someday.  You never know.&lt;/p&gt;

&lt;p&gt;To handle that, a &lt;code&gt;Transaction&lt;/code&gt; is created for each API response.  A &lt;code&gt;Hash&lt;/code&gt; of all provided fields is stuffed into &lt;code&gt;data&lt;/code&gt; and serialized into the database.  Then the system goes on to access just the fields it cares about and ignore everything else, knowing it was tucked away for safe keeping.&lt;/p&gt;

&lt;p&gt;If we do ever need to pull those &lt;code&gt;Transaction&lt;/code&gt; objects, we sure as heck won't be doing it by data we ignored.  We wouldn't know what that is to use it in a query.  Instead, each &lt;code&gt;Transaction&lt;/code&gt; is tied to the relevant related record with a separate foreign key field.  We would have to query on that to have any kind of context.&lt;/p&gt;

&lt;p&gt;Plus, we get a big bonus for using this approach.  The API can change the fields they are giving us and that likely wouldn't affect anything.  Going one step further, we could switch to a totally different API and this setup wouldn't need to change at all.  In this case, not tying ourselves to specific fields gives us flexibility and only costs us what we don't need.  It's a double win.&lt;/p&gt;

&lt;h4&gt;Yes, There's a Point&lt;/h4&gt;

&lt;p&gt;Rules are great.  I love them and I hand them out as much as any other programmer.  Most of the time, they will help you.&lt;/p&gt;

&lt;p&gt;Programmers need to get into the habit of dissecting rules though.  You always need to be asking, "Why are they telling me this?"&lt;/p&gt;

&lt;p&gt;Once you understand the reasoning, you can follow the rule when it makes sense, but put it aside when it's not needed.  That will open up all sorts of new options for you.&lt;/p&gt;</content>
    <author>
      <name>James Edward Gray II</name>
    </author>
  </entry>
  <entry>
    <title>Ghost Wheel Example</title>
    <link rel="alternate" href="http://graysoftinc.com/my-projects/ghost-wheel-example"/>
    <id>tag:graysoftinc.com,2007-11-18:/posts/41</id>
    <updated>2014-04-05T14:56:38Z</updated>
    <summary>Showing the Ghost Wheel approach to the recently popular Treetop example.</summary>
    <content type="html">&lt;p&gt;There has been a fair bit of buzz around the &lt;a href="https://github.com/nathansobo/treetop"&gt;Treetop parser&lt;/a&gt; in the Ruby community lately.  Part of that is fueled by the nice &lt;a href="http://www.pivotalblabs.com/files/treetop-arithmetic-example.mov"&gt;screencast&lt;/a&gt; that shows off how to use the parser generator.&lt;/p&gt;

&lt;p&gt;It doesn't get talked about as much, but I wrote a parser generator too, called &lt;a href="http://rubygems.org/gems/ghostwheel"&gt;Ghost Wheel&lt;/a&gt;.  Probably the main reason Ghost Wheel doesn't receive much attention yet is that I have been slow in getting the documentation written.  Given that, I thought I would show how the code built in the Treetop screencast translates to Ghost Wheel:&lt;/p&gt;

&lt;div class="highlight highlight-ruby"&gt;&lt;pre&gt;&lt;span class="c1"&gt;#!/usr/bin/env ruby -wKU&lt;/span&gt;

&lt;span class="nb"&gt;require&lt;/span&gt; &lt;span class="s2"&gt;"rubygems"&lt;/span&gt;
&lt;span class="nb"&gt;require&lt;/span&gt; &lt;span class="s2"&gt;"ghost_wheel"&lt;/span&gt;

&lt;span class="c1"&gt;# define a parser using Ghost Wheel's Ruby DSL&lt;/span&gt;
&lt;span class="no"&gt;RubyParser&lt;/span&gt;    &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;GhostWheel&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;build_parser&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="n"&gt;rule&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt; &lt;span class="ss"&gt;:additive&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;alt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt; &lt;span class="n"&gt;seq&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt; &lt;span class="ss"&gt;:multiplicative&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                  &lt;span class="ss"&gt;:space&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                  &lt;span class="ss"&gt;:additive_op&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                  &lt;span class="ss"&gt;:space&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                  &lt;span class="ss"&gt;:additive&lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;add&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;add&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;].&lt;/span&gt;&lt;span class="n"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;add&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;add&lt;/span&gt;&lt;span class="o"&gt;[-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;)},&lt;/span&gt;
             &lt;span class="ss"&gt;:multiplicative&lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;rule&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;:additive_op&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;alt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"+"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"-"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

  &lt;span class="n"&gt;rule&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt; &lt;span class="ss"&gt;:multiplicative&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;alt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt; &lt;span class="n"&gt;seq&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt; &lt;span class="ss"&gt;:primary&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                  &lt;span class="ss"&gt;:space&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                  &lt;span class="ss"&gt;:multiplicative_op&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                  &lt;span class="ss"&gt;:space&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                  &lt;span class="ss"&gt;:multiplicative&lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;mul&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;mul&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;].&lt;/span&gt;&lt;span class="n"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mul&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mul&lt;/span&gt;&lt;span class="o"&gt;[-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;)},&lt;/span&gt;
             &lt;span class="ss"&gt;:primary&lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;rule&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;:multiplicative_op&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;alt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"*"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"/"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

  &lt;span class="n"&gt;rule&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;:primary&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;alt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;:parenthized_additive&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:number&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
  &lt;span class="n"&gt;rule&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt; &lt;span class="ss"&gt;:parenthized_additive&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;seq&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"("&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:space&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:additive&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;:space&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;")"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;par&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;par&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;rule&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;:number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sr"&gt;/[1-9][0-9]*|0/&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="nb"&gt;Integer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="n"&gt;rule&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;:space&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sr"&gt;/\s*/&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;:exp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;seq&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ss"&gt;:additive&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;eof&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="c1"&gt;# define a parser using Ghost Wheel's grammar syntax&lt;/span&gt;
&lt;span class="no"&gt;GrammarParser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;GhostWheel&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;build_parser&lt;/span&gt; &lt;span class="sx"&gt;%q{&lt;/span&gt;
&lt;span class="sx"&gt;  additive             =  multiplicative space additive_op space additive&lt;/span&gt;
&lt;span class="sx"&gt;                          { ast[0].send(ast[2], ast[-1]) }&lt;/span&gt;
&lt;span class="sx"&gt;                       |  multiplicative&lt;/span&gt;
&lt;span class="sx"&gt;  additive_op          =  "+" | "-"&lt;/span&gt;

&lt;span class="sx"&gt;  multiplicative       =  primary space multiplicative_op space multiplicative&lt;/span&gt;
&lt;span class="sx"&gt;                          { ast[0].send(ast[2], ast[-1])}&lt;/span&gt;
&lt;span class="sx"&gt;                       |  primary&lt;/span&gt;
&lt;span class="sx"&gt;  multiplicative_op    =  "*" | "/"&lt;/span&gt;

&lt;span class="sx"&gt;  primary              = parenthized_additive | number&lt;/span&gt;
&lt;span class="sx"&gt;  parenthized_additive =  "(" space additive space ")" { ast[2] }&lt;/span&gt;
&lt;span class="sx"&gt;  number               =  /[1-9][0-9]*|0/ { Integer(ast) }&lt;/span&gt;

&lt;span class="sx"&gt;  space                =  /\s*/&lt;/span&gt;
&lt;span class="sx"&gt;  exp                  := additive EOF { ast[0] }&lt;/span&gt;
&lt;span class="sx"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="bp"&gt;__FILE__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="vg"&gt;$PROGRAM_NAME&lt;/span&gt;
  &lt;span class="nb"&gt;require&lt;/span&gt; &lt;span class="s2"&gt;"test/unit"&lt;/span&gt;

  &lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;TestArithmetic&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;Test&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Unit&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;TestCase&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_paring_numbers&lt;/span&gt;
      &lt;span class="n"&gt;assert_parses&lt;/span&gt;         &lt;span class="s2"&gt;"0"&lt;/span&gt;
      &lt;span class="n"&gt;assert_parses&lt;/span&gt;         &lt;span class="s2"&gt;"1"&lt;/span&gt;
      &lt;span class="n"&gt;assert_parses&lt;/span&gt;         &lt;span class="s2"&gt;"123"&lt;/span&gt;
      &lt;span class="n"&gt;assert_does_not_parse&lt;/span&gt; &lt;span class="s2"&gt;"01"&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_parsing_multiplicative&lt;/span&gt;
      &lt;span class="n"&gt;assert_parses&lt;/span&gt; &lt;span class="s2"&gt;"1*2"&lt;/span&gt;
      &lt;span class="n"&gt;assert_parses&lt;/span&gt; &lt;span class="s2"&gt;"1 * 2"&lt;/span&gt;
      &lt;span class="n"&gt;assert_parses&lt;/span&gt; &lt;span class="s2"&gt;"1/2"&lt;/span&gt;
      &lt;span class="n"&gt;assert_parses&lt;/span&gt; &lt;span class="s2"&gt;"1 / 2"&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_parsing_additive&lt;/span&gt;
      &lt;span class="n"&gt;assert_parses&lt;/span&gt; &lt;span class="s2"&gt;"1+2"&lt;/span&gt;
      &lt;span class="n"&gt;assert_parses&lt;/span&gt; &lt;span class="s2"&gt;"1 + 2"&lt;/span&gt;
      &lt;span class="n"&gt;assert_parses&lt;/span&gt; &lt;span class="s2"&gt;"1-2"&lt;/span&gt;
      &lt;span class="n"&gt;assert_parses&lt;/span&gt; &lt;span class="s2"&gt;"1 - 2"&lt;/span&gt;

      &lt;span class="n"&gt;assert_parses&lt;/span&gt; &lt;span class="s2"&gt;"1*2 + 3 * 4"&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_parsing_parenthized_expressions&lt;/span&gt;
      &lt;span class="n"&gt;assert_parses&lt;/span&gt; &lt;span class="s2"&gt;"1 * (2 + 3) * 4"&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_parse_results&lt;/span&gt;
      &lt;span class="n"&gt;assert_correct_result&lt;/span&gt; &lt;span class="s2"&gt;"0"&lt;/span&gt;
      &lt;span class="n"&gt;assert_correct_result&lt;/span&gt; &lt;span class="s2"&gt;"1"&lt;/span&gt;
      &lt;span class="n"&gt;assert_correct_result&lt;/span&gt; &lt;span class="s2"&gt;"123"&lt;/span&gt;

      &lt;span class="n"&gt;assert_correct_result&lt;/span&gt; &lt;span class="s2"&gt;"1*2"&lt;/span&gt;
      &lt;span class="n"&gt;assert_correct_result&lt;/span&gt; &lt;span class="s2"&gt;"1 * 2"&lt;/span&gt;
      &lt;span class="n"&gt;assert_correct_result&lt;/span&gt; &lt;span class="s2"&gt;"1/2"&lt;/span&gt;
      &lt;span class="n"&gt;assert_correct_result&lt;/span&gt; &lt;span class="s2"&gt;"1 / 2"&lt;/span&gt;

      &lt;span class="n"&gt;assert_correct_result&lt;/span&gt; &lt;span class="s2"&gt;"1+2"&lt;/span&gt;
      &lt;span class="n"&gt;assert_correct_result&lt;/span&gt; &lt;span class="s2"&gt;"1 + 2"&lt;/span&gt;
      &lt;span class="n"&gt;assert_correct_result&lt;/span&gt; &lt;span class="s2"&gt;"1-2"&lt;/span&gt;
      &lt;span class="n"&gt;assert_correct_result&lt;/span&gt; &lt;span class="s2"&gt;"1 - 2"&lt;/span&gt;

      &lt;span class="n"&gt;assert_correct_result&lt;/span&gt; &lt;span class="s2"&gt;"1*2 + 3 * 4"&lt;/span&gt;
      &lt;span class="n"&gt;assert_correct_result&lt;/span&gt; &lt;span class="s2"&gt;"1 * (2 + 3) * 4"&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;

    &lt;span class="kp"&gt;private&lt;/span&gt;

    &lt;span class="no"&gt;PARSERS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="no"&gt;RubyParser&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;GrammarParser&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;assert_parses&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="no"&gt;PARSERS&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;each&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
        &lt;span class="n"&gt;assert_nothing_raised&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;GhostWheel&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;FailedParseError&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
          &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;end&lt;/span&gt;
      &lt;span class="k"&gt;end&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;assert_does_not_parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="no"&gt;PARSERS&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;each&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
        &lt;span class="n"&gt;assert_raises&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;GhostWheel&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;FailedParseError&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="k"&gt;end&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;assert_correct_result&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="no"&gt;PARSERS&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;each&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;assert_equal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;eval&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The primary differences you should note from the above code and Treetop are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I show two different ways to build the parser using Ghost Wheel:  using a Ruby DSL and using a grammar syntax.  I prefer the grammar syntax in this and, in fact, most cases.  The Ruby DSL can be handy when you want the AST transformations to be true closures though.&lt;/li&gt;
&lt;li&gt;Ghost Wheel builds on your regular expression knowledge.  Note that the grammar syntax is regex-like and you can even match &lt;code&gt;Regexp&lt;/code&gt; literals.&lt;/li&gt;
&lt;li&gt;Ghost Wheel's AST transformations are more Lispish compared to Treetop's very object oriented syntax.  I think they both have strengths in certain scenarios.&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;There's still plenty I want to do with Ghost Wheel, but maybe this will begin the process of getting the word out about it.  Feel free to post questions here.&lt;/p&gt;</content>
    <author>
      <name>James Edward Gray II</name>
    </author>
  </entry>
</feed>
