Gray Soft / Tags / Experimentationtag:graysoftinc.com,2014-03-20:/tags/Experimentation2014-04-29T01:03:10ZJames Edward Gray IIExperimenting With DATAtag:graysoftinc.com,2012-01-11:/posts/982014-04-29T01:03:10ZThis article dives into just one little feature of Ruby, but shows the various ways that it inspires us and has been abused over time.<p>In the last article, I talked about the importance of a culture that encourages experimentation. It's hard to fiddle with something and not gain a better understanding of how it works. That knowledge is valuable to us programmers. I mentioned though that the way Perl programmers experiment is not the same way us Rubyists do it. Let me show you some actual Ruby experimentation I've witnessed over the years…</p>
<h4>Executing Your Email</h4>
<p>Some of Ruby's features are fairly obscure. Even worse, some of us who use those obscure features try to bend them to even stranger purposes. This is one way Rubyists like to experiment. Ironically, the features I'm going to talk about in this article are inherited from Perl.</p>
<p>Ruby can literally use your email as an executable program. Assume I have the following saved in a file called <code>email.txt</code>:</p>
<pre><code>Dear Nuby:
I just thought you would like to know what the Hello World program looks
like in Ruby. Here's the code:
#!/usr/bin/env ruby -w
puts "Hello world!"
__END__
I hope the simplicity of that inspires you to learn more.
May Ruby Be With You,
Ruby Jedi
</code></pre>
<p>If we ask nicely, Ruby will will happily execute the code in that email:</p>
<pre><code>$ ruby -x email.txt
Hello world!
</code></pre>
<p>Now, there are two features that make this possible. The least interesting is the <code>-x</code> switch I fed to the Ruby interpreter. It throws away all contents of the passed program up to the "shebang line" (<code>#!</code>) that mentions <code>ruby</code>. That's why Ruby ignored the top of the email message.</p>
<p><code>__END__</code>, the reason Ruby ignored the rest of the message, is a totally different story.</p>
<h4>Getting Your DATA</h4>
<p>If your Ruby program contains a line that is just <code>__END__</code>, a couple of things happen. First, Ruby stops executing code just <em>before</em> that line, so anything that follows the special marker is ignored. The other effect is that Ruby opens a special <code>IO</code>-like object (it's usually a <code>File</code> object, but it may be more generic if Ruby is reading the program from <code>stdin</code>), positions it just <em>after</em> the special marker, and places it in the <code>DATA</code> constant.</p>
<p>There's kind of a lot going on there, so let's walk through an example:</p>
<pre><code>$ cat data.rb
p DATA.read
__END__
Some data.
$ ruby data.rb
"Some data.\n"
</code></pre>
<p>Here's what happened:</p>
<ul>
<li>Ruby ignored what came after <code>__END__</code> (the <code>Some data.</code> line)</li>
<li>Ruby opened an <code>File</code> object and positioned it just before the <code>S</code>
</li>
<li>The code accessed that object via the <code>DATA</code> constant</li>
</ul><p>Hopefully that adequately explains this slightly odd feature. I believe the intended usage is for form-letter type content, like this:</p>
<pre><code>$ cat generate_thanks.rb
require "erb"
unless ARGV.size >= 2
abort "USAGE: #{$PROGRAM_NAME} NAME PURCHASE1 [PURCHASE2 ...]"
end
name = ARGV.shift
purchases = ARGV
letter = ERB.new(DATA.read, nil, "%")
letter.run
__END__
Dear <%= name %>:
Thank you for the recent purchase of:
% purchases.each do |purchase|
* <%= purchase %>
% end
We hope these products don't steal too much of your work time.
The Distraction Team
$ ruby generate_thanks.rb James Skyrim Catherine
Dear James:
Thank you for the recent purchase of:
* Skyrim
* Catherine
We hope these products don't steal too much of your work time.
The Distraction Team
</code></pre>
<p>As you can see, it's nice not to have the code cluttered up with the huge letter <code>String</code>. Using <code>__END__</code> we can keep the two separate, but still let them interact via <code>DATA</code>.</p>
<p>If you don't recognize the <code>ERB</code> template I used above, it's still the same template engine Rails uses. I just turned on a "trim mode" to allow full lines of code starting with a percent sign (<code>%</code>). You can do the same for templates in Rails, if you like:</p>
<div class="highlight highlight-ruby"><pre><span class="n">config</span><span class="o">.</span><span class="n">action_view</span><span class="o">.</span><span class="n">erb_trim_mode</span> <span class="o">=</span> <span class="s2">"%"</span>
</pre></div>
<p><em>[<strong>Update</strong>: the above was true in old versions of Rails. Changes to the templating system in newer versions dropped this feature and <a href="https://github.com/rails/rails/pull/5915">the Rails core team elected not to restore it</a>.]</em></p>
<h4>A Cheat</h4>
<p>This feature is often used to cheat an implementation of a quine, a program that outputs its own source. Since <code>DATA</code> is pointed at the source, we can shift it back to the beginning and read away. Observe:</p>
<pre><code>$ cat quine.rb
print DATA.tap(&:rewind).read
__END__
DO NOT DELETE: needed for DATA
$ ruby quine.rb
print DATA.tap(&:rewind).read
__END__
DO NOT DELETE: needed for DATA
</code></pre>
<p>The process is simple. First, we need to backtrack <code>DATA</code> to the beginning of the file. We use <code>rewind()</code> for that, but it has a rather unhelpful return value. We discard that with the help of <code>tap()</code>. Then we can just <code>read()</code> the source and <code>print()</code> it back out.</p>
<p>It's worth noting that this really is a cheat. Most definitions of a quine forbid <code>IO</code> operations for what are now likely very obvious reasons. Still, it's interesting just how easily <code>DATA</code> cuts through this challenge.</p>
<h4>A Built-in Lock</h4>
<p>For some reason, this silly language feature seems to inspire us programmers. I have seen multiple Rubyists twist it to strange purposes. My favorite example comes from Daniel Berger (who stole it from those Perl guys).</p>
<p>Say you have a program that you only wish to run one copy of at any given time. There are many reasons you might need this, but a common one is that the script is run as a Cron job. It may do something like read records from a database and send emails to your users. If there are a ton to send and your system is bogged down, this could take a while. You don't want Cron to kick in another copy before the job finishes, because it might cause users to be emailed twice.</p>
<p>This is usually handled with a complex dance of having the process write out a PID file when it starts up and remove it as it finishes. When a new process starts, it can check for the existence of that PID file and exit without doing any work if it is still there. Of course, the original process may die without properly cleaning up the file, so processes that find the file should probably search the process table to make sure a job with that process ID is still running. If it isn't, they should ignore the PID file and start anyway.</p>
<p>That's the tried and true system because it works, but it's also a pain to code up and get all of the edge cases right. Look at this trivial recreation:</p>
<pre><code>$ cat exclusive.rb
DATA.flock(File::LOCK_EX | File::LOCK_NB) or abort "Already running."
trap("INT", "EXIT")
puts "Running..."
loop do
sleep
end
__END__
DO NOT DELETE: used for locking
$ ruby exclusive.rb
Running...
^Z
[1]+ Stopped ruby exclusive.rb
$ ruby exclusive.rb
Already running.
$ fg
ruby exclusive.rb
^C$ ruby exclusive.rb
Running...
</code></pre>
<p>This is essentially a one-liner that handles all of the scenarios above. We add a meaningless <code>__END__</code> section so Ruby opens the <code>File</code> object for us. Then we grab an exclusive file lock on that object. We tell <code>flock()</code> we don't want to block waiting on that lock, so it will toss a <code>false</code> if we can't have it right now and hand-off to our <code>abort()</code> call.</p>
<p>The rest of the code is just to make the example cleaner. The <code>trap()</code> call makes interrupt signal from <code>⌃C</code> exit quietly. The rest of the code is just a busy loop to keep things going.</p>
<p>Now focus on the examples. I run the program and background it with a <code>⌃Z</code>. Note how it won't let me start a second copy after that, until I pull the original process back to the foreground and halt it.</p>
<p>The beauty of this system is that we don't have to do any cleanup. The operating system will remove the file lock as our process exits, even if it's because we crashed. That's ideal.</p>
<p>Never do the work you can push off on others.</p>
<p>That pattern comes up over and over again in programming. A lot of people complain that Ruby leaks memory (the truth of that is complex and for another post). They claim Ruby is not useable for a long running process due to this leaking. Even if the complaint were true, the conclusion doesn't follow. I use this pattern when I want a long running Ruby process:</p>
<ul>
<li>Write the simplest event loop I can that just pulls jobs and assigns workers</li>
<li>Fork a process for each worker, do the work, then exit</li>
</ul><p>Ruby can run forever like this. Why? Because <code>exit()</code> is the ultimate garbage collector. Properly cleaning up after yourself is hard. That's what operating systems are for. Leave that job to the pro.</p>
<h4>Carrying Your DATA With You</h4>
<p>I have done my own experiments with <code>__END__</code> and <code>DATA</code>. My efforts have been about actually storing content in <code>DATA</code>. Yes, I mean both reading and writing to it.</p>
<p>For example, let's say that I have some program that works on a Git repository. It runs through the various commits and does something expensive with them. We will say that it calculates some metrics and perhaps checks out the code for each SHA to do that. Plus, it stores the results somewhere else that we're not going to worry about for the sake of this example. But we don't want it to store duplicates.</p>
<p>If we pretend that <code>DATA</code> is read and write (it's not really intended for that), we can just toss the last SHA we worked with there. Each time we work forward from that SHA and update it to the latest.</p>
<p>Here's the code to do something like that:</p>
<div class="highlight highlight-ruby"><pre><span class="c1"># remember position of DATA</span>
<span class="n">pos</span> <span class="o">=</span> <span class="no">DATA</span><span class="o">.</span><span class="n">pos</span>
<span class="c1"># read the last SHA processed</span>
<span class="n">last</span> <span class="o">=</span> <span class="no">DATA</span><span class="o">.</span><span class="n">read</span><span class="o">.</span><span class="n">to_s</span><span class="o">.</span><span class="n">strip</span>
<span class="c1"># work with the SHA after the last (all on first run)</span>
<span class="no">Dir</span><span class="o">.</span><span class="n">chdir</span><span class="p">(</span><span class="no">ARGV</span><span class="o">.</span><span class="n">first</span> <span class="o">||</span> <span class="no">Dir</span><span class="o">.</span><span class="n">pwd</span><span class="p">)</span> <span class="k">do</span>
<span class="n">command</span> <span class="o">=</span> <span class="s2">"git rev-list --reverse HEAD"</span>
<span class="n">command</span> <span class="o"><<</span> <span class="s2">" ^</span><span class="si">#{</span><span class="n">last</span><span class="si">}</span><span class="s2">"</span> <span class="k">if</span> <span class="n">last</span><span class="o">.</span><span class="n">size</span> <span class="o">==</span> <span class="mi">40</span>
<span class="n">shas</span> <span class="o">=</span> <span class="sb">`</span><span class="si">#{</span><span class="n">command</span><span class="si">}</span><span class="sb">`</span><span class="o">.</span><span class="n">lines</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="o">&</span><span class="ss">:strip</span><span class="p">)</span>
<span class="n">shas</span><span class="o">.</span><span class="n">each</span> <span class="k">do</span> <span class="o">|</span><span class="n">sha</span><span class="o">|</span>
<span class="nb">puts</span> <span class="s2">"Checking out </span><span class="si">#{</span><span class="n">sha</span><span class="o">[</span><span class="sr">/\A.{7}/</span><span class="o">]</span><span class="si">}</span><span class="s2">, calculating metrics, "</span> <span class="o">+</span>
<span class="s2">"and storing results..."</span>
<span class="k">end</span>
<span class="n">last</span> <span class="o">=</span> <span class="n">shas</span><span class="o">.</span><span class="n">last</span>
<span class="k">end</span>
<span class="c1"># write out the last SHA we processed</span>
<span class="k">if</span> <span class="n">last</span>
<span class="no">DATA</span><span class="o">.</span><span class="n">reopen</span><span class="p">(</span><span class="bp">__FILE__</span><span class="p">,</span> <span class="s2">"r+"</span><span class="p">)</span>
<span class="no">DATA</span><span class="o">.</span><span class="n">truncate</span><span class="p">(</span><span class="n">pos</span><span class="p">)</span>
<span class="no">DATA</span><span class="o">.</span><span class="n">seek</span><span class="p">(</span><span class="n">pos</span><span class="p">)</span>
<span class="no">DATA</span><span class="o">.</span><span class="n">puts</span> <span class="n">last</span>
<span class="k">end</span>
<span class="cp">__END__</span>
<span class="cp">NONE</span>
</pre></div>
<p>The biggest chunk of work above is in the middle and you can safely ignore that section since it's just me talking to Git and faking some work. The interesting bits are at the beginning and the end.</p>
<p>The first trick is to memorize where <code>DATA</code> starts out (right after <code>__END__</code>), because thats the point we need to go back to when we want to update the SHA. After we've memorized that key position, we load the previous SHA (if any), and do the work.</p>
<p>At the end, I have to handle the fact that <code>DATA</code> is really just for reading. To do that, I <code>reopen()</code> it for reading and writing. Then it's a simple matter of replacement, since I memorized the magic position number:</p>
<ul>
<li>Lop off the end of the file after the position</li>
<li>Move the write head to the new end (the position again)</li>
<li>Append the new SHA</li>
</ul><p>After I run that code on the repository for this site, the SHA is updated:</p>
<pre><code>$ cat walk_git_commits.rb | grep -A 1 __END__ walk_git_commits.rb
__END__
NONE
$ ruby walk_git_commits.rb ../Documents/subinterest
Checking out 9490ee4, calculating metrics, and storing results...
Checking out f2ce511, calculating metrics, and storing results...
Checking out dfaded9, calculating metrics, and storing results...
...
$ cat walk_git_commits.rb | grep -A 1 __END__ walk_git_commits.rb
__END__
50cb651fa11de417c0db7978127ba8d06aa67f06
</code></pre>
<p>If I run it again immediately, it doesn't do any work (because the last recorded SHA is the latest). I can also manually update the SHA, if I want to start from some arbitrary point.</p>
<p>I view the current SHA as metadata for the code in this case, so this approach allows it to live with the code. If I email this script to a coworker, it will pick up at the right place in our shared repository.</p>
<p>You can use this trick in other areas. For example, S3 objects can have custom headers associated with them. You can squirrel away some metadata in these fields, keeping it with the object it relates to. I like this better in many cases than needing to match an S3 object to a separate database record in order to have the full picture of what I'm looking at.</p>
<h4>No Magic Here</h4>
<p>One important thing to realize about these tricks is how unmagical they really are. Can you do the source locking trick without <code>DATA</code>? Sure. You just change this code:</p>
<div class="highlight highlight-ruby"><pre><span class="no">DATA</span><span class="o">.</span><span class="n">flock</span><span class="p">(</span><span class="no">File</span><span class="o">::</span><span class="no">LOCK_EX</span> <span class="o">|</span> <span class="no">File</span><span class="o">::</span><span class="no">LOCK_NB</span><span class="p">)</span> <span class="ow">or</span> <span class="nb">abort</span> <span class="s2">"Already running."</span>
</pre></div>
<p>into this:</p>
<div class="highlight highlight-ruby"><pre><span class="nb">open</span><span class="p">(</span><span class="bp">__FILE__</span><span class="p">)</span><span class="o">.</span><span class="n">flock</span><span class="p">(</span> <span class="no">File</span><span class="o">::</span><span class="no">LOCK_EX</span> <span class="o">|</span>
<span class="no">File</span><span class="o">::</span><span class="no">LOCK_NB</span> <span class="p">)</span> <span class="ow">or</span> <span class="nb">abort</span> <span class="s2">"Already running."</span>
</pre></div>
<p>We can open the file ourselves and lock it. If we do, we don't even need the special <code>__END__</code> marker.</p>
<p>It's the same with my rewriting example. I could just work with the file manually, though I would need to find the <code>__END__</code> token myself and that's a bit if work if I want to do it as well as Ruby does.</p>
<p>The point isn't that <code>DATA</code> makes these hacks possible. It's that the feature inspired programmers to find these hacks. Inspiration is powerful. That's why it's so key that many programmers say they enjoy working with Ruby. A design goal of the language was to be friendly to humans. That matters. It helps us think and play.</p>
<p>I've shown you some historical experimentation in Ruby. This may or may not appeal to you. That's fine. We all enjoy different things and find inspiration in different places. The important thing is that you do find yours and exercise it.</p>
<h4>Pop Quiz: One Last DATA Trick</h4>
<p>When I first showed the <code>__END__</code> marker, did you think you had seen it before in Sinatra? It uses the same feature, right?</p>
<p>Yes and no.</p>
<p>The problem is that a Sinatra application may not be the file executed. For example, the server may get kicked off with <code>rackup</code>. If it does, <code>rackup</code> was the executed file and <code>DATA</code> could only be set using the <code>__END__</code> marker in that. There can only be one <code>DATA</code> after all. Other <code>__END__</code> tokens still work for ignoring content and Sinatra definitely counts on that, but it cannot use <code>DATA</code>.</p>
<p>Given that, can you puzzle out how it seems to do the same thing? Take your best guess, then <a href="https://github.com/sinatra/sinatra/blob/e111243e813ede1f0f4c6918d9a8cc029e776fc3/lib/sinatra/base.rb#L1049">check to see if you are right</a>.</p>James Edward Gray IIPerl's Golf Culturetag:graysoftinc.com,2012-01-01:/posts/1042014-04-25T16:41:16ZThis article takes a quick look at some crazy Perl code and reflects on why you might want to write it.<p>I'm stealing some time to write this while on vacation. I am also under the weather. Given that, we'll make this article short and easier on me to think up. That's not always a bad thing though. There are plenty of simple concepts I would like to get across. For example, let's talk about how Perl programmers do what it is they do.</p>
<h4>Ruby's Sister Language</h4>
<p>I spent plenty of time in the Perl camps and I really learned a lot about programming there. That may shock you to hear, because Perl programmers often get a bad wrap from the rest of the programming community.</p>
<p>One reason they catch a lot flak is that their language is often terse to the point of obscurity. We joke that Perl is a "write only" language or too hard for other developers to read. That would be bad enough on its own, but Perl programmers seem to intentionally make this worse.</p>
<p>Perl programmers love to play the programmer's version of golf. That is writing a program with the fewest possible keystrokes. To shrink their program's size, they will resort to every dirty trick in the book, including:</p>
<ul>
<li>Using one letter variable names</li>
<li>Eliminating all non-needed whitespace</li>
<li>Using and abusing global variables</li>
<li>Using magic numbers that only work on the desired subset of a problem</li>
<li>Intentionally avoiding the abstractions that keep us sane</li>
<li>Throwing as much code in every statement as the language allows</li>
</ul><p>I could go on and on, but perhaps it's better to just show you what I mean. Here's a golfed program that prints the first 32 numbers in the Fibonacci sequence:</p>
<pre><code>$ perl -e '$a=1;print$a-=$b+=$a*=-1,$/for 0..31'
0
1
1
2
3
5
8
13
21
34
55
89
144
233
377
610
987
1597
2584
4181
6765
10946
17711
28657
46368
75025
121393
196418
317811
514229
832040
1346269
</code></pre>
<p>If you know a little Perl, spend some time working out how that does its job. It took me about 20 minutes or so. I learned some great tricks too.</p>
<p>I'm not going to take you all the way there, but I'll give you some hints for how I broke that program down. This is your last chance to try it yourself, and learn more, before I start ruining it for you…</p>
<p>The biggest trick to understanding alien code is to modify it. Making changes allows us to observe what is happening. For example, I thought I remembered that <code>$/</code> was a special Perl variable, but I couldn't remember what it contained. To find out, I just replaced it:</p>
<pre><code>$ perl -e '$a=1;print$a-=$b+=$a*=-1,"-"for 0..31'
0-1-1-2-3-5-8-13-…1346269-
</code></pre>
<p>That reminded me that <code>$/</code> holds the "record separator" or essentially a newline.</p>
<p>It didn't look like the <code>for</code> loop was doing much, so I tried to yank it out:</p>
<pre><code>$ perl -e '$a=1;print$a-=$b+=$a*=-1,$/'
0
</code></pre>
<p>I was right. It was causing the code to repeat, but the value of the loop wasn't being used.</p>
<p>Finally, I started trying to figure out the math transformations, bit by bit:</p>
<pre><code>$ perl -e '$a = 1; $a *= -1; print $a, "\n"'
-1
$ perl -e '$a = 1; $b += $a *= -1; print $b, "\n"'
-1
$ perl -e '$a = 1; $a -= $b += $a *= -1; print $a, "\n", $b, "\n"'
0
-1
</code></pre>
<p>You get the idea.</p>
<h4>Our Way</h4>
<p>I can translate that program into Ruby, but it looses something along the way, in my opinion.</p>
<p>Our version of <code>…for 0..31</code> is a minimum of <code>32.times{…}</code>. We also can't use Perl's autovivification of the <code>$b</code> variable, so we will need to initialize it manually. Finally, we need to break the math into two chunks, due to a difference in Ruby's assignment semantics. On the upside, we can switch from <code>print</code> to <code>puts</code> and drop some <code>$</code>'s.</p>
<p>Our code is the same size, but I feel it has lost some of its Perlish qualities:</p>
<pre><code>$ ruby -e 'a,b=1,0;32.times{b+=a*=-1;puts a-=b}'
0
1
1
2
3
5
8
13
21
34
55
89
144
233
377
610
987
1597
2584
4181
6765
10946
17711
28657
46368
75025
121393
196418
317811
514229
832040
1346269
</code></pre>
<p>The most surprising difference in the Ruby code is how I had to break up the math. It doesn't work correctly if I don't do that. Here's why:</p>
<pre><code>$ perl -e '$a = 1; $a -= $b += $a *= -1; print $a, "\n", $b, "\n"'
0
-1
$ ruby -e 'a, b = 1, 0; a -= b += a *= -1; puts a, b'
2
-1
</code></pre>
<p>As you can see, Perl and Ruby handle that last assignment differently. Both of them multiply <code>a</code>'s <code>1</code> by <code>-1</code>, assign the result to back to <code>a</code>, add <code>-1</code> to <code>b</code>'s <code>0</code>, and assign that back to <code>b</code>. For the last step, Perl subtracts <code>-1</code> from the modified value of <code>a</code>: <code>-1</code> from the rightmost assignment. Ruby, on the other hand, subtracts <code>-1</code> from the original value of <code>a</code>: <code>1</code>. It looks like Ruby handles all of the variable dereferencing before it ever starts the assignments, so all <code>a</code>'s in the expression are their original value. Perl seems to lazily resolve the variables, so it picks up that <code>a</code> changed in the same expression.</p>
<p>This difference makes sense when you think about it. Ruby has multiple assignment and it allows us to swap variables with it. For example, this does what you expect in Ruby:</p>
<pre><code>a, b = b, a
</code></pre>
<p>It makes sense that Ruby needs to dereference all of those variables before it handles that assignment.</p>
<p>Perl can do the same trick, but you have to use a different syntax to kick in its multiple assignment behavior. Without it, Perl seems to just lazily build up the references as needed.</p>
<h4>Why We Do Crazy Things</h4>
<p>This silly exercise brings me to a point: Rubyists hate golf. Sure, some of us play, but there's no comparing the two cultures. It's huge in Perl and an afterthought in Ruby.</p>
<p>There are several reasons for that. First, as we saw above, Ruby fights being bent that way a bit more than Perl does. Also, Rubyists tend to place a super high value readable code, so it's just against our nature to try and make it ugly.</p>
<p>However, you need to reconcile this in your mind: Perl's golfers–the good ones anyway–are generally great programmers. I know many of them and they are real experts. How can that be?</p>
<p>I've got one word for you: experimentation.</p>
<p>When Perl programmers are playing golf, they are experimenting with their language. It's hard to do that and not learn a ton. Heck, I learned quite a bit just translating 36 bytes of Perl above. I didn't expect to be bitten by how Ruby assigns variables, for example, and it took me a bit to work out what was different there.</p>
<p>I won't use that knowledge for evil. I'm not going to try collapsing my Rails applications down to one line. This exercise didn't make me think all variables should be one letter. I'm still the same programmer, but I'm now armed with more knowledge.</p>
<p>Someday I may find myself talking to a Rubyist who says something like, "I just don't understand what Ruby is doing with these assignments." I'll know and I'll be able to help them, all because I screwed around with a Perl one-liner today.</p>
<p>You don't have to play golf, but experimenting to learn more is important. Doing quizzes, katas, and the like can teach you a ton. Remember that programmers are always measured by how many ideas they can come up with. The more ways you can think of to attack a problem the better off you are. Experimentation gives you a head start on idea generation.</p>
<p>In the next article, I'll show you a very Rubyish form of experimentation and what we can learn from it…</p>James Edward Gray II