Gray Soft / Tags / Tokyo Cabinettag:graysoftinc.com,2014-03-20:/tags/Tokyo%20Cabinet2014-04-19T02:04:47ZJames Edward Gray IITokyo Cabinet's Key-Value Database Typestag:graysoftinc.com,2010-01-10:/posts/932014-04-19T02:04:47ZThis article digs deeper into the capabilities of Tokyo Cabinet. B+Tree and Fixed-length Databases are discussed, as well as how to tune the database types to your specific needs.<p>We've taken a good look at Tokyo Cabinet's Hash Database, but there's a lot more to the library than just that. Tokyo Cabinet supports three other kinds of databases. In addition, each database type accepts various tuning parameters that can be used to change its behavior. Each database type and setting involves different tradeoffs so you really have a lot of options for turning Tokyo Cabinet into exactly what you need. Let's look into some of those options now.</p>
<h4>The B+Tree Database</h4>
<p>Tokyo Cabinet's B+Tree Database is a little slower than the Hash Database we looked at before. That's its downside. However, giving up a little speed gains you several extra features that may just allow you to work smarter instead of faster.</p>
<p>The B+Tree Database is a more advanced form of the Hash Database. What that means is that all of the stuff I showed you in the last article still applies. You can set, read, and remove values by keys, iteration is supported, and you still have access to the neat options like adding to counters. With a B+Tree Database you get all of that and more.</p>
<p>The first major addition is that a B+Tree Database is ordered. You don't really need to do anything to turn this on, it's just the way it is. As you add pairs to the database, they will be ordered by the keys you use. The default ordering is lexical:</p>
<div class="highlight highlight-ruby"><pre><span class="c1">#!/usr/bin/env ruby -wKU</span>
<span class="nb">require</span> <span class="s2">"oklahoma_mixer"</span>
<span class="no">OklahomaMixer</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="s2">"ordered.tcb"</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">db</span><span class="o">|</span>
<span class="n">db</span><span class="o">[</span><span class="ss">:c</span><span class="o">]</span> <span class="o">=</span> <span class="mi">3</span>
<span class="n">db</span><span class="o">[</span><span class="ss">:a</span><span class="o">]</span> <span class="o">=</span> <span class="mi">1</span>
<span class="n">db</span><span class="o">[</span><span class="ss">:b</span><span class="o">]</span> <span class="o">=</span> <span class="mi">2</span>
<span class="n">db</span><span class="o">.</span><span class="n">to_a</span> <span class="c1"># => [["a", "1"], ["b", "2"], ["c", "3"]]</span>
<span class="k">end</span>
</pre></div>
<p>This simple example shows us a couple of things. First, creating a B+Tree Database is as simple as changing the file extension. Remember when I said the <code>.tch</code> stood for <b>T</b>okyo <b>C</b>abinet <b>H</b>ash Database? Well, it shouldn't be too surprising that <code>.tcb</code> stands for <b>T</b>okyo <b>C</b>abinet <b>B</b>+Tree Database. Oklahoma Mixer will notice which extension you use and load the right features for that database type.</p>
<p>The other thing to notice here is the ordering. I purposely added the keys out of order, but you can see that <code>to_a()</code> shows them all lined up correctly. Now <code>to_a()</code> is really just an iterator the database object inherits from <code>Enumerable</code>, so we now know that iteration will be in database order. Methods like <code>keys()</code> and even <code>values()</code> will also return their listings in order as well.</p>
<p>As I said, the default ordering is lexical, so number keys are little strange:</p>
<div class="highlight highlight-ruby"><pre><span class="c1">#!/usr/bin/env ruby -wKU</span>
<span class="nb">require</span> <span class="s2">"oklahoma_mixer"</span>
<span class="no">OklahomaMixer</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="s2">"lexical.tcb"</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">db</span><span class="o">|</span>
<span class="n">db</span><span class="o">[</span><span class="mi">1</span><span class="o">]</span> <span class="o">=</span> <span class="ss">:first</span>
<span class="n">db</span><span class="o">[</span><span class="mi">2</span><span class="o">]</span> <span class="o">=</span> <span class="ss">:middle</span>
<span class="n">db</span><span class="o">[</span><span class="mi">11</span><span class="o">]</span> <span class="o">=</span> <span class="ss">:last</span>
<span class="n">db</span><span class="o">.</span><span class="n">to_a</span> <span class="c1"># => [["1", "first"], ["11", "last"], ["2", "middle"]]</span>
<span class="k">end</span>
</pre></div>
<p>Notice they don't come out in the order we would probably think is most natural, as I described in the values. To fix that we need to change the default ordering and you can do that using a tuning parameter of the B+Tree Database. We are allowed to set a <em>comparison function</em> when we <code>open()</code> the database that will order the keys however we desire. This function is just like a block you would pass to <code>sort()</code> in Ruby: it will be handed two keys at a time to compare and it is expected to return negative, zero, or positive for the first argument being less than, equal to, or greater than the second. The good news is, you can usually cheat your way out of remembering these comparison rules by leaning on Ruby's <em>spaceship operator</em>:</p>
<div class="highlight highlight-ruby"><pre><span class="c1">#!/usr/bin/env ruby -wKU</span>
<span class="nb">require</span> <span class="s2">"oklahoma_mixer"</span>
<span class="no">OklahomaMixer</span><span class="o">.</span><span class="n">open</span><span class="p">(</span> <span class="s2">"numerical.tcb"</span><span class="p">,</span>
<span class="ss">:cmpfunc</span> <span class="o">=></span> <span class="nb">lambda</span> <span class="p">{</span> <span class="o">|</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="o">|</span> <span class="n">a</span><span class="o">.</span><span class="n">to_i</span> <span class="o"><=></span> <span class="n">b</span><span class="o">.</span><span class="n">to_i</span> <span class="p">}</span> <span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">db</span><span class="o">|</span>
<span class="n">db</span><span class="o">[</span><span class="mi">1</span><span class="o">]</span> <span class="o">=</span> <span class="ss">:first</span>
<span class="n">db</span><span class="o">[</span><span class="mi">2</span><span class="o">]</span> <span class="o">=</span> <span class="ss">:middle</span>
<span class="n">db</span><span class="o">[</span><span class="mi">11</span><span class="o">]</span> <span class="o">=</span> <span class="ss">:last</span>
<span class="n">db</span><span class="o">.</span><span class="n">to_a</span> <span class="c1"># => [["1", "first"], ["2", "middle"], ["11", "last"]]</span>
<span class="k">end</span>
</pre></div>
<p>This example shows how tuning parameters get set with Oklahoma Mixer. Just pass some keyword arguments to <code>open()</code> for each parameter you need to adjust. This allows Oklahoma Mixer to perform the needed setup before connecting to your database. That's critical for things like a B+Tree comparison function that have to be set before the database is accepting data.</p>
<p>It's worth noting that the comparison function is not stored in the database file and needs to be reset (to the same function if you want to avoid unpredictable results) each time you <code>open()</code> that database.</p>
<p>OK, enough about ordering. What else do we get with the B+Tree Database?</p>
<p>You also get key ranges. Since the database has an inherit order, we're no longer limited to <code>:prefix</code> searches of the <code>keys()</code> and we can now ask for all of the keys between two endpoints:</p>
<div class="highlight highlight-ruby"><pre><span class="c1">#!/usr/bin/env ruby -wKU</span>
<span class="nb">require</span> <span class="s2">"oklahoma_mixer"</span>
<span class="no">OklahomaMixer</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="s2">"ranges.tcb"</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">db</span><span class="o">|</span>
<span class="n">db</span><span class="o">.</span><span class="n">update</span><span class="p">(</span><span class="ss">:a</span> <span class="o">=></span> <span class="mi">1</span><span class="p">,</span> <span class="ss">:b</span> <span class="o">=></span> <span class="mi">2</span><span class="p">,</span> <span class="ss">:c</span> <span class="o">=></span> <span class="mi">3</span><span class="p">,</span> <span class="ss">:d</span> <span class="o">=></span> <span class="mi">4</span><span class="p">,</span> <span class="ss">:e</span> <span class="o">=></span> <span class="mi">5</span><span class="p">)</span>
<span class="n">db</span><span class="o">.</span><span class="n">keys</span><span class="p">(</span><span class="ss">:range</span> <span class="o">=></span> <span class="s2">"ab"</span><span class="o">.</span><span class="n">.</span><span class="s2">"d"</span><span class="p">)</span> <span class="c1"># => ["b", "c", "d"]</span>
<span class="n">db</span><span class="o">.</span><span class="n">keys</span><span class="p">(</span><span class="ss">:range</span> <span class="o">=></span> <span class="s2">"ab"</span><span class="o">.</span><span class="n">.</span><span class="o">.</span><span class="s2">"d"</span><span class="p">)</span> <span class="c1"># => ["b", "c"]</span>
<span class="k">end</span>
</pre></div>
<p>Note that I used <code>"ab"</code> in my <code>Range</code> queries which is really between the actual <code>"a"</code> and <code>"b"</code> keys in the database. That works just fine.</p>
<p>You can also pass the <code>:limit</code> option I've shown before with <code>:range</code>, but you can't pass <code>:prefix</code>. It's one or the other: <code>:prefix</code> or <code>:range</code>.</p>
<p>This ability to work with a <code>Range</code> of keys is even extended to the iterators. You've always had the ability to stop iterating whenever you like using Ruby's <code>break</code> keyword, but now you can tell the iterators where to start, making it possible to iterate over a subset of the pairs in the database:</p>
<div class="highlight highlight-ruby"><pre><span class="c1">#!/usr/bin/env ruby -wKU</span>
<span class="nb">require</span> <span class="s2">"oklahoma_mixer"</span>
<span class="no">OklahomaMixer</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="s2">"ranges.tcb"</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">db</span><span class="o">|</span>
<span class="n">db</span><span class="o">.</span><span class="n">update</span><span class="p">(</span><span class="ss">:a</span> <span class="o">=></span> <span class="mi">1</span><span class="p">,</span> <span class="ss">:b</span> <span class="o">=></span> <span class="mi">2</span><span class="p">,</span> <span class="ss">:c</span> <span class="o">=></span> <span class="mi">3</span><span class="p">,</span> <span class="ss">:d</span> <span class="o">=></span> <span class="mi">4</span><span class="p">,</span> <span class="ss">:e</span> <span class="o">=></span> <span class="mi">5</span><span class="p">)</span>
<span class="n">db</span><span class="o">.</span><span class="n">each</span><span class="p">(</span><span class="s2">"ab"</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">key</span><span class="p">,</span> <span class="n">value</span><span class="o">|</span>
<span class="nb">puts</span> <span class="s2">"%p => %p"</span> <span class="o">%</span> <span class="o">[</span><span class="n">key</span><span class="p">,</span> <span class="n">value</span><span class="o">]</span>
<span class="k">break</span> <span class="k">if</span> <span class="n">key</span> <span class="o">>=</span> <span class="s2">"d"</span>
<span class="k">end</span>
<span class="c1"># >> "b" => "2"</span>
<span class="c1"># >> "c" => "3"</span>
<span class="c1"># >> "d" => "4"</span>
<span class="k">end</span>
</pre></div>
<p>Again, I used <code>"ab"</code> and it jumped to the first key after that. The only place that might get a little confusing is if you try that same trick with the (also added to B+Tree Databases) <code>reverse_each()</code> iterator:</p>
<div class="highlight highlight-ruby"><pre><span class="c1">#!/usr/bin/env ruby -wKU</span>
<span class="nb">require</span> <span class="s2">"oklahoma_mixer"</span>
<span class="no">OklahomaMixer</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="s2">"ranges.tcb"</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">db</span><span class="o">|</span>
<span class="n">db</span><span class="o">.</span><span class="n">update</span><span class="p">(</span><span class="ss">:a</span> <span class="o">=></span> <span class="mi">1</span><span class="p">,</span> <span class="ss">:b</span> <span class="o">=></span> <span class="mi">2</span><span class="p">,</span> <span class="ss">:c</span> <span class="o">=></span> <span class="mi">3</span><span class="p">,</span> <span class="ss">:d</span> <span class="o">=></span> <span class="mi">4</span><span class="p">,</span> <span class="ss">:e</span> <span class="o">=></span> <span class="mi">5</span><span class="p">)</span>
<span class="n">db</span><span class="o">.</span><span class="n">reverse_each</span><span class="p">(</span><span class="s2">"ddd"</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">key</span><span class="p">,</span> <span class="n">value</span><span class="o">|</span>
<span class="nb">puts</span> <span class="s2">"%p => %p"</span> <span class="o">%</span> <span class="o">[</span><span class="n">key</span><span class="p">,</span> <span class="n">value</span><span class="o">]</span>
<span class="k">break</span> <span class="k">if</span> <span class="n">key</span> <span class="o"><=</span> <span class="s2">"b"</span>
<span class="k">end</span>
<span class="c1"># >> "e" => "5"</span>
<span class="c1"># >> "d" => "4"</span>
<span class="c1"># >> "c" => "3"</span>
<span class="c1"># >> "b" => "2"</span>
<span class="k">end</span>
</pre></div>
<p>See how it started with <code>"e"</code>? It always jumps to the first key equal to or <em>after</em> the one you provide, even if you are planning to iterate backwards. Since <code>"ddd"</code> is between <code>"d"</code> and <code>"e"</code>, that means we start on the key after <code>"ddd"</code> (<code>"e"</code>).</p>
<p>B+Tree Databases have one more feature and it's a wild one. These databases support an additional storage mode that allows duplicate values to be stored under the same key:</p>
<div class="highlight highlight-ruby"><pre><span class="c1">#!/usr/bin/env ruby -wKU</span>
<span class="nb">require</span> <span class="s2">"oklahoma_mixer"</span>
<span class="no">OklahomaMixer</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="s2">"dupes.tcb"</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">db</span><span class="o">|</span>
<span class="sx">%w[James Dana Baby]</span><span class="o">.</span><span class="n">each</span> <span class="k">do</span> <span class="o">|</span><span class="nb">name</span><span class="o">|</span>
<span class="n">db</span><span class="o">.</span><span class="n">store</span><span class="p">(</span><span class="s2">"Gray"</span><span class="p">,</span> <span class="nb">name</span><span class="p">,</span> <span class="ss">:dup</span><span class="p">)</span>
<span class="k">end</span>
<span class="n">db</span><span class="o">.</span><span class="n">to_a</span> <span class="c1"># => [["Gray", "James"], ["Gray", "Dana"], ["Gray", "Baby"]]</span>
<span class="k">end</span>
</pre></div>
<p>As you can see, the <code>:dup</code> storage mode shuts off the normal value replacing behavior and instead inserts the duplicate value after what was already stored for that key.</p>
<p>Several methods in Oklahoma Mixer have been expanded to support these duplicate values. For example, with a B+Tree Database you can scope <code>values()</code> or <code>size()</code> to a specific key, retrieving just the <code>values()</code> stored under that key or getting a count of how many values there are for that key:</p>
<div class="highlight highlight-ruby"><pre><span class="c1">#!/usr/bin/env ruby -wKU</span>
<span class="nb">require</span> <span class="s2">"oklahoma_mixer"</span>
<span class="no">OklahomaMixer</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="s2">"names.tcb"</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">db</span><span class="o">|</span>
<span class="n">db</span><span class="o">[</span><span class="s2">"Matsumoto"</span><span class="o">]</span> <span class="o">=</span> <span class="s2">"Yukihiro"</span>
<span class="sx">%w[James Dana]</span><span class="o">.</span><span class="n">each</span> <span class="k">do</span> <span class="o">|</span><span class="nb">name</span><span class="o">|</span>
<span class="n">db</span><span class="o">.</span><span class="n">store</span><span class="p">(</span><span class="s2">"Gray"</span><span class="p">,</span> <span class="nb">name</span><span class="p">,</span> <span class="ss">:dup</span><span class="p">)</span>
<span class="k">end</span>
<span class="n">db</span><span class="o">.</span><span class="n">values</span> <span class="c1"># => ["James", "Dana", "Yukihiro"]</span>
<span class="n">db</span><span class="o">.</span><span class="n">values</span><span class="p">(</span><span class="s2">"Gray"</span><span class="p">)</span> <span class="c1"># => ["James", "Dana"]</span>
<span class="n">db</span><span class="o">.</span><span class="n">size</span> <span class="c1"># => 3</span>
<span class="n">db</span><span class="o">.</span><span class="n">size</span><span class="p">(</span><span class="s2">"Gray"</span><span class="p">)</span> <span class="c1"># => 2</span>
<span class="k">end</span>
</pre></div>
<p>You will need to use these methods to work with duplicate values because normal indexing, <code>fetch()</code>, and <code>delete()</code> still just work with the first value stored under a key. That behavior can be valuable too though:</p>
<div class="highlight highlight-ruby"><pre><span class="c1">#!/usr/bin/env ruby -wKU</span>
<span class="nb">require</span> <span class="s2">"oklahoma_mixer"</span>
<span class="no">OklahomaMixer</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="s2">"todo.tcb"</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">db</span><span class="o">|</span>
<span class="k">if</span> <span class="n">db</span><span class="o">.</span><span class="n">size</span><span class="o">.</span><span class="n">zero?</span>
<span class="sx">%w[B+tree Fixed-length tuning]</span><span class="o">.</span><span class="n">each</span> <span class="k">do</span> <span class="o">|</span><span class="n">topic</span><span class="o">|</span>
<span class="n">db</span><span class="o">.</span><span class="n">store</span><span class="p">(</span><span class="ss">:blog</span><span class="p">,</span> <span class="n">topic</span><span class="p">,</span> <span class="ss">:dup</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="nb">puts</span> <span class="s2">"Write about </span><span class="si">#{</span><span class="n">db</span><span class="o">.</span><span class="n">delete</span><span class="p">(</span><span class="ss">:blog</span><span class="p">)</span><span class="si">}</span><span class="s2">."</span>
<span class="k">end</span>
</pre></div>
<p>If I run that program three times, this is what you see:</p>
<pre><code>$ ruby tc_example.rb
Write about B+tree.
$ ruby tc_example.rb
Write about Fixed-length.
$ ruby tc_example.rb
Write about tuning.
</code></pre>
<p>See how <code>delete()</code> just kept pulling the first value that was left? That allowed us to use it as a simple queue in this case.</p>
<p>The <code>delete()</code> method can be passed the <code>:dup</code> storage mode as a second argument. When you do, all values under the passed key will be removed.</p>
<p>When working with duplicates, be aware that <code>keys()</code> and <code>each_key()</code> (or any iterator) behave differently. <code>keys()</code> returns a unique list, so keys with duplicate values under them will only be listed once. Iteration walks each pair in the database though, so a key will come up once for each value stored under it. Put another way, iteration does show duplicates while <code>keys()</code> won't.</p>
<p>Let me show one last, slightly bigger example to bring together all of the features discussed above. Here's a little more involved queuing system:</p>
<div class="highlight highlight-ruby"><pre><span class="c1">#!/usr/bin/env ruby -wKU</span>
<span class="nb">require</span> <span class="s2">"oklahoma_mixer"</span>
<span class="no">GROUPS</span> <span class="o">=</span> <span class="p">{</span><span class="kp">nil</span> <span class="o">=></span> <span class="mi">0</span><span class="p">,</span> <span class="s2">"critical"</span> <span class="o">=></span> <span class="mi">1</span><span class="p">,</span> <span class="s2">"normal"</span> <span class="o">=></span> <span class="mi">2</span><span class="p">,</span> <span class="s2">"low"</span> <span class="o">=></span> <span class="mi">3</span><span class="p">}</span>
<span class="n">order</span> <span class="o">=</span> <span class="nb">lambda</span> <span class="p">{</span> <span class="o">|</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="o">|</span>
<span class="n">a_group</span><span class="p">,</span> <span class="n">a_priority</span> <span class="o">=</span> <span class="n">a</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">":"</span><span class="p">)</span>
<span class="n">b_group</span><span class="p">,</span> <span class="n">b_priority</span> <span class="o">=</span> <span class="n">b</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">":"</span><span class="p">)</span>
<span class="o">[</span><span class="no">GROUPS</span><span class="o">[</span><span class="n">a_group</span><span class="o">]</span><span class="p">,</span> <span class="o">-</span><span class="n">a_priority</span><span class="o">.</span><span class="n">to_i</span><span class="o">]</span> <span class="o"><=></span> <span class="o">[</span><span class="no">GROUPS</span><span class="o">[</span><span class="n">b_group</span><span class="o">]</span><span class="p">,</span> <span class="o">-</span><span class="n">b_priority</span><span class="o">.</span><span class="n">to_i</span><span class="o">]</span>
<span class="p">}</span>
<span class="no">OklahomaMixer</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="s2">"queue.tcb"</span><span class="p">,</span> <span class="ss">:cmpfunc</span> <span class="o">=></span> <span class="n">order</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">db</span><span class="o">|</span>
<span class="k">case</span> <span class="no">ARGV</span><span class="o">.</span><span class="n">shift</span>
<span class="k">when</span> <span class="s2">"add"</span>
<span class="n">group</span> <span class="o">=</span> <span class="s2">"normal"</span>
<span class="no">ARGV</span><span class="o">.</span><span class="n">delete_if</span> <span class="p">{</span> <span class="o">|</span><span class="n">o</span><span class="o">|</span> <span class="n">o</span> <span class="o">=~</span> <span class="sr">/\A--(critical|low)\z/</span> <span class="ow">and</span> <span class="n">group</span> <span class="o">=</span> <span class="vg">$1</span> <span class="p">}</span>
<span class="n">priority</span> <span class="o">=</span> <span class="mi">10</span>
<span class="no">ARGV</span><span class="o">.</span><span class="n">delete_if</span> <span class="p">{</span> <span class="o">|</span><span class="n">o</span><span class="o">|</span> <span class="n">o</span> <span class="o">=~</span> <span class="sr">/\A-(\d+)\z/</span> <span class="ow">and</span> <span class="n">priority</span> <span class="o">=</span> <span class="vg">$1</span><span class="o">.</span><span class="n">to_i</span> <span class="p">}</span>
<span class="n">db</span><span class="o">.</span><span class="n">store</span><span class="p">(</span><span class="s2">"</span><span class="si">#{</span><span class="n">group</span><span class="si">}</span><span class="s2">:</span><span class="si">#{</span><span class="n">priority</span><span class="si">}</span><span class="s2">"</span><span class="p">,</span> <span class="no">ARGV</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="s2">"; "</span><span class="p">),</span> <span class="ss">:dup</span><span class="p">)</span>
<span class="k">when</span> <span class="s2">"list"</span>
<span class="n">db</span><span class="o">.</span><span class="n">each</span> <span class="k">do</span> <span class="o">|</span><span class="n">key</span><span class="p">,</span> <span class="n">value</span><span class="o">|</span>
<span class="nb">puts</span> <span class="n">key</span>
<span class="nb">puts</span> <span class="s2">" </span><span class="si">#{</span><span class="n">value</span><span class="si">}</span><span class="s2">"</span>
<span class="k">end</span>
<span class="k">when</span> <span class="s2">"do_one"</span>
<span class="k">if</span> <span class="n">key</span> <span class="o">=</span> <span class="n">db</span><span class="o">.</span><span class="n">keys</span><span class="p">(</span><span class="ss">:limit</span> <span class="o">=></span> <span class="mi">1</span><span class="p">)</span><span class="o">.</span><span class="n">first</span> <span class="ow">and</span> <span class="n">job</span> <span class="o">=</span> <span class="n">db</span><span class="o">.</span><span class="n">delete</span><span class="p">(</span><span class="n">key</span><span class="p">)</span>
<span class="nb">eval</span><span class="p">(</span><span class="n">job</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">when</span> <span class="s2">"do_all"</span>
<span class="kp">loop</span> <span class="k">do</span>
<span class="k">if</span> <span class="n">key</span> <span class="o">=</span> <span class="n">db</span><span class="o">.</span><span class="n">keys</span><span class="p">(</span><span class="ss">:limit</span> <span class="o">=></span> <span class="mi">1</span><span class="p">)</span><span class="o">.</span><span class="n">first</span> <span class="ow">and</span> <span class="n">job</span> <span class="o">=</span> <span class="n">db</span><span class="o">.</span><span class="n">delete</span><span class="p">(</span><span class="n">key</span><span class="p">)</span>
<span class="nb">eval</span><span class="p">(</span><span class="n">job</span><span class="p">)</span>
<span class="k">else</span>
<span class="k">break</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">else</span>
<span class="nb">abort</span> <span class="s2">"Usage: </span><span class="si">#{</span><span class="vg">$PROGRAM_NAME</span><span class="si">}</span><span class="s2"> add|list|do_one|do_all [OPTIONS]"</span>
<span class="k">end</span>
<span class="k">end</span>
</pre></div>
<p>Most of that should be pretty straightforward code after all we've talked about, but let me point out one tricky spot. I had to add the <code>nil => 0</code> entry to my <code>GROUPS</code> because fetching a full, or in this case <code>:limit</code>ed, set of <code>keys()</code> is really just a <code>:prefix</code> query with an empty <code>:prefix</code>. Because of that, you want to make sure your ordering functions always order an empty <code>String</code> key before anything else. Calling <code>split()</code> on the empty <code>String</code> gives me a <code>nil</code> group, which is converted to a <code>0</code> so it will come out first.</p>
<p>It's probably also worth pointing out that I could have just used <code>each()</code> with the <code>do_all</code> command. However, always fetching the first key and using that is a little better in a multiprocessing environment where other processes might be adding to the queue. If I'm iterating through the list, I won't see new <code>critical</code> jobs if they are added above where I am at. Using <code>keys()</code> though, I'll always grab the most important job next. This code isn't really built for multiprocessing to tell the truth, but let's save that discussion for a later article.</p>
<p>Anyway, here's an example of me playing around with the program above, so you can see how it works in practice:</p>
<pre><code>$ ruby queue.rb add 'puts "An average job."'
$ ruby queue.rb add --low 'puts "This can wait..."'
$ ruby queue.rb add --critical 'puts "Very important."'
$ ruby queue.rb add --critical -100 'puts "Most important!"'
$ ruby queue.rb listcritical:100
puts "Most important!"
critical:10
puts "Very important."
normal:10
puts "An average job."
low:10
puts "This can wait..."
$ ruby queue.rb do_one
Most important!
$ ruby queue.rb do_one
Very important.
$ ruby queue.rb do_all
An average job.
This can wait...
</code></pre>
<p>To summarize, the B+Tree Database gives you ordering, key ranges and cursor based iteration (the ability to skip to a specific key), and duplicate storage. You pay a speed penalty for these added features though. That's the tradeoff.</p>
<h4>The Fixed-length Database</h4>
<p>Another type of database supported by Tokyo Cabinet is the Fixed-length Database. It too is an extension of the Hash Database, supporting most of the features I showed you in that article. However, I'm not going to lie to you, this database type comes with three significant restrictions.</p>
<p>First, all keys are <code>Integer</code>s greater than <code>0</code>. You can't use arbitrary <code>String</code>s as you do with the Hash and B+Tree Databases. As such, you lose the ability to do <code>:prefix</code> queries on <code>keys()</code>. The database is ordered though, similar to the B+Tree Database. You can't change this ordering, but it is done numerically (instead of lexically) since all keys are just <code>Integer</code>s anyway. Given that, <code>:range</code> queries on <code>keys()</code> are supported. Methods like <code>keys()</code> and the iterators will pass you <code>Integer</code> keys in Ruby, instead of the <code>String</code> keys you get with the other database types.</p>
<p>The second downside is that all values stored have a <em>fixed-length</em>, which is what gives the database its name. This length defaults to <code>255</code>, but you can tune it to anything you like with the <code>:width</code> tuning parameter:</p>
<div class="highlight highlight-ruby"><pre><span class="c1">#!/usr/bin/env ruby -wKU</span>
<span class="nb">require</span> <span class="s2">"oklahoma_mixer"</span>
<span class="no">OklahomaMixer</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="s2">"four.tcf"</span><span class="p">,</span> <span class="ss">:width</span> <span class="o">=></span> <span class="mi">4</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">db</span><span class="o">|</span>
<span class="n">db</span><span class="o">.</span><span class="n">update</span><span class="p">(</span> <span class="mi">1</span> <span class="o">=></span> <span class="ss">:one</span><span class="p">,</span>
<span class="mi">2</span> <span class="o">=></span> <span class="ss">:two</span><span class="p">,</span>
<span class="mi">3</span> <span class="o">=></span> <span class="ss">:three</span><span class="p">,</span>
<span class="mi">4</span> <span class="o">=></span> <span class="ss">:four</span><span class="p">,</span>
<span class="mi">5</span> <span class="o">=></span> <span class="ss">:fix</span><span class="p">,</span>
<span class="mi">6</span> <span class="o">=></span> <span class="ss">:six</span><span class="p">,</span>
<span class="mi">7</span> <span class="o">=></span> <span class="ss">:seven</span><span class="p">,</span>
<span class="mi">8</span> <span class="o">=></span> <span class="ss">:eight</span> <span class="p">)</span>
<span class="n">db</span><span class="o">.</span><span class="n">each_value</span> <span class="k">do</span> <span class="o">|</span><span class="n">num</span><span class="o">|</span>
<span class="nb">puts</span> <span class="n">num</span>
<span class="k">end</span>
<span class="c1"># >> one</span>
<span class="c1"># >> two</span>
<span class="c1"># >> thre</span>
<span class="c1"># >> four</span>
<span class="c1"># >> fix</span>
<span class="c1"># >> six</span>
<span class="c1"># >> seve</span>
<span class="c1"># >> eigh</span>
<span class="k">end</span>
</pre></div>
<p>Notice how everything beyond my selected <code>:width</code> of <code>4</code> was just silently discarded. That's the fixed-length at work.</p>
<p>Also note that you create a <b>T</b>okyo <b>C</b>abinet <b>F</b>ixed-length Database as you probably expect by now, with the file extension <code>.tcf</code>.</p>
<p>Finally, the Fixed-length Database has one last size limit. The overall file size of the database is limited to <code>268435456</code> bytes, by default. This too can be tuned using the <code>:limsiz</code> tuning parameter and you are free to make the limit quite large. Just remember that values are fixed length, so setting <code>:width => 1024, :limsiz => 4 * 1024</code> will mean your database only holds four keys. Trying to add data beyond this limit will raise an <code>OklahomaMixer::Error::CabinetError</code>.</p>
<p>That's a lot of minuses and you are probably wondering why anyone would be willing to accept all of these limits when we've already seen that there are more powerful options. The answer is performance. The Fixed-length Database is Tokyo Cabinet's fastest weapon. It treats the database file as a raw array of bytes and it can jump straight to any value with simple math. (Due to this, <code>defrag()</code> isn't supported on a Fixed-length Database, though Oklahoma Mixer does provide a no-op just to match the interface of the other database types.) That makes it wicked quick. If your data storage needs are simple enough to fit within these limitations, you can take advantage of this added speed boost.</p>
<p>The Fixed-length Database interface does have one other neat feature I should mention. It supports four special key names: <code>:min</code>, <code>:max</code>, <code>:prev</code>, and <code>:next</code>. You can use these values in many of the methods that take keys. For example:</p>
<div class="highlight highlight-ruby"><pre><span class="c1">#!/usr/bin/env ruby -wKU</span>
<span class="nb">require</span> <span class="s2">"oklahoma_mixer"</span>
<span class="no">OklahomaMixer</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="s2">"special.tcf"</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">db</span><span class="o">|</span>
<span class="n">db</span><span class="o">.</span><span class="n">update</span><span class="p">(</span> <span class="mi">1</span> <span class="o">=></span> <span class="ss">:first</span><span class="p">,</span>
<span class="mi">2</span> <span class="o">=></span> <span class="ss">:middle</span><span class="p">,</span>
<span class="mi">42</span> <span class="o">=></span> <span class="ss">:last</span> <span class="p">)</span>
<span class="n">db</span><span class="o">[</span><span class="ss">:min</span><span class="o">]</span> <span class="c1"># => "first"</span>
<span class="n">db</span><span class="o">[</span><span class="ss">:max</span><span class="o">]</span> <span class="c1"># => "last"</span>
<span class="n">db</span><span class="o">[</span><span class="ss">:next</span><span class="o">]</span> <span class="o">=</span> <span class="ss">:added</span>
<span class="n">db</span><span class="o">.</span><span class="n">keys</span> <span class="c1"># => [1, 2, 42, 43]</span>
<span class="n">db</span><span class="o">[</span><span class="mi">43</span><span class="o">]</span> <span class="c1"># => "added"</span>
<span class="k">end</span>
</pre></div>
<p>Be careful when using these. <code>:min</code> and <code>:max</code> will raise an <code>OklahomaMixer::Error::CabinetError</code> if there are no keys in the database. <code>:prev</code> (not shown above) is even pickier, requiring a <code>:min</code> key that is above <code>1</code>, so it can safely add below it without hitting <code>0</code>. I find <code>:next</code> pretty useful though, as it makes it possible to queue up values. Here's the simple queue example I showed in the B+Tree code rewritten to use a Fixed-length Database instead:</p>
<div class="highlight highlight-ruby"><pre><span class="c1">#!/usr/bin/env ruby -wKU</span>
<span class="nb">require</span> <span class="s2">"oklahoma_mixer"</span>
<span class="no">OklahomaMixer</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="s2">"todo.tcf"</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">db</span><span class="o">|</span>
<span class="c1"># load the data</span>
<span class="sx">%w[B+tree Fixed-length tuning]</span><span class="o">.</span><span class="n">each</span> <span class="k">do</span> <span class="o">|</span><span class="n">topic</span><span class="o">|</span>
<span class="n">db</span><span class="o">[</span><span class="ss">:next</span><span class="o">]</span> <span class="o">=</span> <span class="s2">"Write about </span><span class="si">#{</span><span class="n">topic</span><span class="si">}</span><span class="s2">."</span>
<span class="k">end</span>
<span class="c1"># read it back</span>
<span class="kp">loop</span> <span class="k">do</span>
<span class="k">begin</span>
<span class="nb">puts</span> <span class="n">db</span><span class="o">.</span><span class="n">delete</span><span class="p">(</span><span class="ss">:min</span><span class="p">)</span>
<span class="k">rescue</span> <span class="no">OklahomaMixer</span><span class="o">::</span><span class="no">Error</span><span class="o">::</span><span class="no">CabinetError</span> <span class="c1"># no keys for :min</span>
<span class="k">break</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="c1"># >> Write about B+tree.</span>
<span class="c1"># >> Write about Fixed-length.</span>
<span class="c1"># >> Write about tuning.</span>
<span class="k">end</span>
</pre></div>
<p>To summarize, the B+Tree Database may slow you down a little, but the Fixed-length speeds you up, as long as you can accept certain restrictions. Select the database type that best fits the needs of your data.</p>
<h3>Tuning Parameters</h3>
<p>We've seen how to set tuning parameters in the examples above and already learned what some do. I'm going to save some parameters for discussions in later articles, when we talk about their specific functions. For now though, here are most of the tuning parameters available to the three databases we've covered so far.</p>
<ul>
<li>
<code>:bnum</code> (for Hash or B+Tree—can be used with <code>optimize()</code>): Specifies the <b>num</b>ber of elements to use in the <b>b</b>ucket array. The default is <code>131071</code> for Hash Databases and <code>32749</code> for B+Tree Databases. The suggested size is from 0.5 to 4 times the total number of records stored for Hash Databases or 1 to 4 times the total for B+Tree Databases.</li>
<li>
<code>:apow</code> (for Hash or B+Tree—can be used with <code>optimize()</code>): Specifies to the size of record <b>a</b>lignment as a <b>pow</b>er of 2. The default is <code>4</code> for Hash Databases and <code>8</code> for B+Tree Databases, meaning <code>2 ** 4 = 16</code> and <code>2 ** 8 = 256</code> respectively.</li>
<li>
<code>:fpow</code> (for Hash or B+Tree—can be used with <code>optimize()</code>): Specifies the maximum number of elements in the <b>f</b>ree block pool as a <b>pow</b>er of 2. The default is <code>10</code>, meaning <code>2 ** 10 = 1024</code>.</li>
<li>
<code>:opts</code> (for Hash or B+Tree—can be used with <code>optimize()</code>): Specifies the <b>opt</b>ion<b>s</b> for the database in a <code>String</code> of recognized character codes. There are no options by default, but this is commonly set to <code>"ld"</code> or <code>"lb"</code> for bigger databases. The options are:
<ul>
<li>
<code>"l"</code> allows the database file to grow <b>l</b>arge (over 2 GB) by using a 64-bit bucket array.</li>
<li>
<code>"d"</code> compresses each record in a Hash Database or page in a B+Tree Database with <b>D</b>eflate compression.</li>
<li>
<code>"b"</code> compresses each record in a Hash Database or page in a B+Tree Database with <b>B</b>ZIP2 compression.</li>
<li>
<code>"t"</code> compresses each record in a Hash Database or page in a B+Tree Database with <b>T</b>CBS compression.</li>
</ul>
</li>
<li>
<code>:rcnum</code> (for Hash): Specifies the maximum <b>num</b>ber of <b>r</b>ecords to be <b>c</b>ached. It is <code>0</code> or disabled by default.</li>
<li>
<code>:xmsiz</code> (for Hash or B+Tree): Specifies the <b>siz</b>e of e<b>x</b>tra mapped <b>m</b>emory. The default is <code>67108864</code> for Hash Databases or <code>0</code> (disabled) for B+Tree Databases.</li>
<li>
<code>:dfunit</code> (for Hash or B+Tree): Specifies the auto <b>d</b>e<b>f</b>ragmentation <b>unit</b> step number. It is <code>0</code> or disabled by default.</li>
<li>
<code>:cmpfunc</code> (for B+tree): Specifies the <b>c</b>o<b>mp</b>arison <b>func</b>tion used to order B+Tree Databases. See the detailed examples above for an explanation.</li>
<li>
<code>:lmemb</code> (for B+Tree—can be used with <code>optimize()</code>): Specifies the number of <b>memb</b>ers in each <b>l</b>eaf page. The default is <code>128</code>.</li>
<li>
<code>:nmemb</code> (for B+Tree—can be used with <code>optimize()</code>): Specifies the number of <b>memb</b>ers in each <b>n</b>on-leaf page. The default is <code>256</code>.</li>
<li>
<code>:lcnum</code> (for B+tree): Specifies the maximum <b>num</b>ber of <b>l</b>eaf nodes to be <b>c</b>ached. The default is <code>1024</code>.</li>
<li>
<code>:ncnum</code> (for B+tree): Specifies the maximum <b>num</b>ber of <b>n</b>on-leaf nodes to be <b>c</b>ached. The default is <code>512</code>.</li>
<li>
<code>:width</code> (for Fixed-length—can be used with <code>optimize()</code>): Specifies the <b>width</b> of values in Fixed-length Databases. See the detailed examples above for an explanation.</li>
<li>
<code>:limsiz</code> (for Fixed-length—can be used with <code>optimize()</code>): Specifies the <b>lim</b>it on database file <b>siz</b>e in Fixed-length Databases. See the detailed examples above for an explanation.</li>
</ul><p>I apologize for keeping the cryptic names in Oklahoma Mixer, but I felt it was better to stick with what Tokyo Cabinet uses so users could read about them in documentation and other resources for that library. Tokyo Tyrant also uses these names to configure a database by command-line, so you will find them in several different contexts.</p>
<p>Database objects have an <code>optimize()</code> method that can be used to modify the tuning parameters of an <code>open()</code> database. The parameters that can be used as such are noted above. There are sometimes additional restrictions though. For example, the <code>:limsiz</code> of a Fixed-length Database usually has to be increased when changed through <code>optimize()</code>.</p>
<p>That covers the various key-value database types in Tokyo Cabinet. The fourth type is quite a bit different from what we've look at so far, so I'll do a full article on it next.</p>James Edward Gray IITokyo Cabinet as a Key-Value Storetag:graysoftinc.com,2010-01-01:/posts/922014-06-05T18:57:10ZThis article covers the usage of Tokyo Cabinet's Hash Database. It shows basic Hash-like storage as well as some special features provided by Tokyo Cabinet.<p>Like most key-value stores, Tokyo Cabinet has a very <code>Hash</code>-like interface from Ruby (assuming you use Oklahoma Mixer). You can almost think of a Tokyo Cabinet database as a <code>Hash</code> that just happens to be stored in a file instead of memory. The advantage of that is that your data doesn't have to fit into memory. Luckily, you don't have to pay a big speed penalty to get this disk-backed storage. Tokyo Cabinet is pretty darn fast.</p>
<h4>Getting and Setting Keys</h4>
<p>Let's have a look at the normal <code>Hash</code>-like methods as well as the file storage aspect:</p>
<div class="highlight highlight-ruby"><pre><span class="c1">#!/usr/bin/env ruby -KU</span>
<span class="nb">require</span> <span class="s2">"oklahoma_mixer"</span>
<span class="no">OklahomaMixer</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="s2">"data.tch"</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">db</span><span class="o">|</span>
<span class="k">if</span> <span class="n">db</span><span class="o">.</span><span class="n">size</span><span class="o">.</span><span class="n">zero?</span>
<span class="nb">puts</span> <span class="s2">"Loading the database. Rerun to read back the data."</span>
<span class="n">db</span><span class="o">[</span><span class="ss">:one</span><span class="o">]</span> <span class="o">=</span> <span class="mi">1</span>
<span class="n">db</span><span class="o">[</span><span class="ss">:two</span><span class="o">]</span> <span class="o">=</span> <span class="mi">2</span>
<span class="n">db</span><span class="o">.</span><span class="n">update</span><span class="p">(</span><span class="ss">:three</span> <span class="o">=></span> <span class="mi">3</span><span class="p">,</span> <span class="ss">:four</span> <span class="o">=></span> <span class="mi">4</span><span class="p">)</span>
<span class="n">db</span><span class="o">[</span><span class="s2">"users:1"</span><span class="o">]</span> <span class="o">=</span> <span class="s2">"James"</span>
<span class="n">db</span><span class="o">[</span><span class="s2">"users:2"</span><span class="o">]</span> <span class="o">=</span> <span class="s2">"Ruby"</span>
<span class="k">else</span>
<span class="nb">puts</span> <span class="s2">"Reading data."</span>
<span class="sx">%w[ db[:one]</span>
<span class="sx"> db["users:2"]</span>
<span class="sx"> -</span>
<span class="sx"> db.keys</span>
<span class="sx"> db.keys(:prefix\ =>\ "users:")</span>
<span class="sx"> db.keys(:limit\ =>\ 2)</span>
<span class="sx"> db.values</span>
<span class="sx"> -</span>
<span class="sx"> db.values_at(:one,\ :two) ]</span><span class="o">.</span><span class="n">each</span> <span class="k">do</span> <span class="o">|</span><span class="n">command</span><span class="o">|</span>
<span class="nb">puts</span><span class="p">(</span><span class="n">command</span> <span class="o">==</span> <span class="s2">"-"</span> <span class="p">?</span> <span class="s2">""</span> <span class="p">:</span> <span class="s2">"</span><span class="si">#{</span><span class="n">command</span><span class="si">}</span><span class="s2"> = %p"</span> <span class="o">%</span> <span class="o">[</span><span class="nb">eval</span><span class="p">(</span><span class="n">command</span><span class="p">)</span><span class="o">]</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">end</span>
</pre></div>
<p>If I run that code twice, I see:</p>
<pre><code>$ ruby tc_example.rb
Loading the database. Rerun to read back the data.
$ ruby tc_example.rb
Reading data.
db[:one] = "1"
db["users:2"] = "Ruby"
db.keys = ["one", "two", "three", "four", "users:1", "users:2"]
db.keys(:prefix => "users:") = ["users:1", "users:2"]
db.keys(:limit => 2) = ["one", "two"]
db.values = ["1", "2", "3", "4", "James", "Ruby"]
db.values_at(:one, :two) = ["1", "2"]
</code></pre>
<p>The file storage should be pretty obvious here. The first run of the program populated the data file and the second run read the data back. Obviously the data exists outside the process. It's actually stored in the file I named in my call to <code>open()</code>: <code>"data.tch"</code>. We will dig a lot more into the meaning of the file extensions later, but for now it's enough to know that <code>.tch</code> stands for <b>T</b>okyo <b>C</b>abinet <b>H</b>ash database. It's also worth pointing out that you don't have to pass a block to <code>open()</code>. When not passed a block <code>open()</code> will return the database reference and expect you to call <code>close()</code> manually when you are done, just as you could with any <code>IO</code> object from Ruby. Tokyo Cabinet can buffer output just like Ruby's <code>IO</code> streams can, so know that your data isn't guaranteed to have hit the disk until after a <code>close()</code>. You can <code>flush()</code> the data to disk before that though, if needed.</p>
<p>The getting and setting methods shouldn't be much of a surprise. I started off by using calling <code>size()</code> to count the pairs already in the database. I then used <code>[]=</code> to set a few keys. Note that I also used <code>update()</code> to add multiple keys at once. (The <code>merge()</code>/<code>merge!()</code> methods of <code>Hash</code> don't really make sense for the database so you do need to use the <code>update()</code> alias.) Later I read the data back with <code>[]</code>. It's all very <code>Hash</code>-like. I was even able to ask for all of the <code>keys()</code> as you can with a <code>Hash</code>, but the Oklahoma Mixer version of that method supports some extra filters like the <code>:prefix</code> and <code>:limit</code> shown above. There's also the matching <code>values()</code> call, though it doesn't have any filters. You can see that Oklahoma Mixer also allows us to fetch multiple keys at once with <code>values_at()</code>.</p>
<p>The last thing to get out of this example is the usual truth of key-value storage: keys and values are generally considered <code>String</code>s. Notice how <code>db[:one] = 1</code> actually stored a value of <code>"1"</code> under the key <code>"one"</code>. Make sure you remember to convert it back when you read it if you really need the number.</p>
<p>Another cool <code>Hash</code>-like feature you can make use of are defaults. You can set a static object to be used as the default value for keys not in the database or provide code to run to generate the default. Here is some code showing the possibilities in action:</p>
<div class="highlight highlight-ruby"><pre><span class="c1">#!/usr/bin/env ruby -KU</span>
<span class="nb">require</span> <span class="s2">"oklahoma_mixer"</span>
<span class="no">OklahomaMixer</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="s2">"data.tch"</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">db</span><span class="o">|</span>
<span class="c1"># no default set</span>
<span class="n">db</span><span class="o">[</span><span class="ss">:missing</span><span class="o">]</span> <span class="c1"># => nil</span>
<span class="c1"># an Object default</span>
<span class="n">db</span><span class="o">.</span><span class="n">default</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">db</span><span class="o">[</span><span class="ss">:missing</span><span class="o">]</span> <span class="c1"># => 0</span>
<span class="k">end</span>
<span class="c1"># another way to set an Object default</span>
<span class="no">OklahomaMixer</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="s2">"data.tch"</span><span class="p">,</span> <span class="ss">:default</span> <span class="o">=></span> <span class="mi">42</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">db</span><span class="o">|</span>
<span class="n">db</span><span class="o">[</span><span class="ss">:missing</span><span class="o">]</span> <span class="c1"># => 42</span>
<span class="k">end</span>
<span class="c1"># a Proc default</span>
<span class="nb">proc</span> <span class="o">=</span> <span class="nb">lambda</span> <span class="p">{</span> <span class="o">|</span><span class="n">key</span><span class="o">|</span>
<span class="n">type</span><span class="p">,</span> <span class="nb">id</span> <span class="o">=</span> <span class="n">key</span><span class="o">.</span><span class="n">to_s</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">":"</span><span class="p">)</span>
<span class="s2">"New </span><span class="si">#{</span><span class="n">type</span><span class="si">}</span><span class="s2"> with id </span><span class="si">#{</span><span class="nb">id</span><span class="si">}</span><span class="s2">"</span>
<span class="p">}</span>
<span class="no">OklahomaMixer</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="s2">"data.tch"</span><span class="p">,</span> <span class="ss">:default</span> <span class="o">=></span> <span class="nb">proc</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">db</span><span class="o">|</span>
<span class="n">db</span><span class="o">[</span><span class="s2">"user:3"</span><span class="o">]</span> <span class="c1"># => nil</span>
<span class="k">end</span>
</pre></div>
<p><code>Proc</code> defaults are always executed, so if you want a default that returns a <code>Proc</code>, just pass a <code>Proc</code> that creates the desired <code>Proc</code>. All other objects are returned when indexing a missing value.</p>
<p>The important thing to remember about the defaults is that they are not stored in the file. They are just a convenience from the Ruby interface and you will need to set them again anytime you make a new connection to the database.</p>
<p>You can also walk the key-value pairs of a Tokyo Cabinet database using the standard iterators you expect in Ruby:</p>
<div class="highlight highlight-ruby"><pre><span class="c1">#!/usr/bin/env ruby -KU</span>
<span class="nb">require</span> <span class="s2">"pp"</span>
<span class="nb">require</span> <span class="s2">"oklahoma_mixer"</span>
<span class="no">OklahomaMixer</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="s2">"data.tch"</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">db</span><span class="o">|</span>
<span class="c1"># a Hash-like each()</span>
<span class="n">db</span><span class="o">.</span><span class="n">each</span> <span class="k">do</span> <span class="o">|</span><span class="n">key</span><span class="p">,</span> <span class="n">value</span><span class="o">|</span>
<span class="nb">puts</span> <span class="s2">"db[%p] = %p"</span> <span class="o">%</span> <span class="o">[</span><span class="n">key</span><span class="p">,</span> <span class="n">value</span><span class="o">]</span>
<span class="k">end</span>
<span class="c1"># other iterators from Enumerable are supported</span>
<span class="nb">puts</span>
<span class="n">pp</span> <span class="n">db</span><span class="o">.</span><span class="n">select</span> <span class="p">{</span> <span class="o">|</span><span class="n">key</span><span class="p">,</span> <span class="n">_</span><span class="o">|</span> <span class="n">key</span> <span class="o">=~</span> <span class="sr">/\Ausers:/</span> <span class="p">}</span>
<span class="n">pp</span> <span class="n">db</span><span class="o">.</span><span class="n">find</span> <span class="p">{</span> <span class="o">|</span><span class="n">_</span><span class="p">,</span> <span class="n">value</span><span class="o">|</span> <span class="n">value</span> <span class="o">=~</span> <span class="sr">/\A\D/</span> <span class="p">}</span>
<span class="k">end</span>
</pre></div>
<p>Running that gives us:</p>
<div class="highlight highlight-ruby"><pre><span class="n">db</span><span class="o">[</span><span class="s2">"one"</span><span class="o">]</span> <span class="o">=</span> <span class="s2">"1"</span>
<span class="n">db</span><span class="o">[</span><span class="s2">"two"</span><span class="o">]</span> <span class="o">=</span> <span class="s2">"2"</span>
<span class="n">db</span><span class="o">[</span><span class="s2">"three"</span><span class="o">]</span> <span class="o">=</span> <span class="s2">"3"</span>
<span class="n">db</span><span class="o">[</span><span class="s2">"four"</span><span class="o">]</span> <span class="o">=</span> <span class="s2">"4"</span>
<span class="n">db</span><span class="o">[</span><span class="s2">"users:1"</span><span class="o">]</span> <span class="o">=</span> <span class="s2">"James"</span>
<span class="n">db</span><span class="o">[</span><span class="s2">"users:2"</span><span class="o">]</span> <span class="o">=</span> <span class="s2">"Ruby"</span>
<span class="o">[[</span><span class="s2">"users:1"</span><span class="p">,</span> <span class="s2">"James"</span><span class="o">]</span><span class="p">,</span> <span class="o">[</span><span class="s2">"users:2"</span><span class="p">,</span> <span class="s2">"Ruby"</span><span class="o">]]</span>
<span class="o">[</span><span class="s2">"users:1"</span><span class="p">,</span> <span class="s2">"James"</span><span class="o">]</span>
</pre></div>
<p>You can see that we get an <code>each()</code> that walks key-value pairs, just as a <code>Hash</code> would. We also get all of the other standard <code>Enumerable</code> iterators. This gives us several different ways to comb the data for specific keys.</p>
<p>When you are done playing around with data, you have multiple options for getting rid of it. You can just <code>clear()</code> all keys if you are sure that's safe. Of course, just deleting the file has pretty much the same effect. If you need to selectively remove data, you can <code>delete()</code> a single key-value pair or use the <code>delete_if()</code> iterator to programmatically remove pairs.</p>
<div class="highlight highlight-ruby"><pre><span class="c1">#!/usr/bin/env ruby -KU</span>
<span class="nb">require</span> <span class="s2">"oklahoma_mixer"</span>
<span class="no">OklahomaMixer</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="s2">"data.tch"</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">db</span><span class="o">|</span>
<span class="n">db</span><span class="o">.</span><span class="n">delete</span><span class="p">(</span><span class="ss">:one</span><span class="p">)</span> <span class="c1"># => "1"</span>
<span class="n">db</span><span class="o">.</span><span class="n">delete_if</span> <span class="p">{</span> <span class="o">|</span><span class="n">key</span><span class="p">,</span> <span class="n">_</span><span class="o">|</span> <span class="n">key</span> <span class="o">=~</span> <span class="sr">/\Ausers:/</span> <span class="p">}</span>
<span class="n">db</span><span class="o">.</span><span class="n">keys</span> <span class="c1"># => ["two", "three", "four"]</span>
<span class="n">db</span><span class="o">.</span><span class="n">clear</span>
<span class="n">db</span><span class="o">.</span><span class="n">keys</span> <span class="c1"># => []</span>
<span class="k">end</span>
</pre></div>
<p>The <code>delete()</code> method does return the value for the removed key, or <code>nil</code> if that key wasn't in the database. That feature isn't really safe if multiple processes are manipulating the data at once, unless you take the right precautions. We will talk a lot more about that later though.</p>
<p>That covers the basic <code>Hash</code> style interface to Tokyo Cabinet. Let's move into some other aspects of the library now.</p>
<h4>Counters and Appended Values</h4>
<p>We've already seen the standard <code>Hash</code>-like method of storing data with <code>db[:key] = :value</code>. The less common <code>store()</code> method from <code>Hash</code> is also supported (as is <code>fetch()</code> for retrieving values), so you can do things like <code>db.store(:key, :value)</code>. The advantage of using <code>store()</code> is that it supports modes. You can use these modes to manipulate values in different ways. Let's look at some of the options.</p>
<p>Most key-values stores provide an action for atomically incrementing a counter and Tokyo Cabinet is no exception. This is important because it allows you to track unique ID's. Have a look at the various ways you can use the <code>store()</code> method to manage counters with the <code>:add</code> mode:</p>
<div class="highlight highlight-ruby"><pre><span class="c1">#!/usr/bin/env ruby -KU</span>
<span class="nb">require</span> <span class="s2">"oklahoma_mixer"</span>
<span class="no">OklahomaMixer</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="s2">"data.tch"</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">db</span><span class="o">|</span>
<span class="n">db</span><span class="o">[</span><span class="s2">"globals:user_id"</span><span class="o">]</span> <span class="c1"># => nil</span>
<span class="n">db</span><span class="o">[</span><span class="s2">"globals:float"</span><span class="o">]</span> <span class="c1"># => nil</span>
<span class="n">db</span><span class="o">.</span><span class="n">store</span><span class="p">(</span><span class="s2">"globals:user_id"</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="ss">:add</span><span class="p">)</span> <span class="c1"># => 1</span>
<span class="n">db</span><span class="o">.</span><span class="n">store</span><span class="p">(</span><span class="s2">"globals:user_id"</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="ss">:add</span><span class="p">)</span> <span class="c1"># => 2</span>
<span class="n">db</span><span class="o">.</span><span class="n">store</span><span class="p">(</span><span class="s2">"globals:float"</span><span class="p">,</span> <span class="mi">2</span><span class="o">.</span><span class="mi">1</span><span class="p">,</span> <span class="ss">:add</span><span class="p">)</span> <span class="c1"># => 2.1</span>
<span class="n">db</span><span class="o">.</span><span class="n">store</span><span class="p">(</span><span class="s2">"globals:user_id"</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="ss">:add</span><span class="p">)</span> <span class="c1"># => 1</span>
<span class="k">end</span>
</pre></div>
<p>While all of that should be pretty obvious, this mode has a few gotchas you want to stay aware of. It's OK to start <code>:add</code>ing to a <code>nil</code> field as I've shown above, but don't try to use a field already set to a non-<code>:add</code>ed value or you will likely get a <code>OklahomaMixer::Error::CabinetError</code>. This is true even if you have what you think is a number in the value. Tokyo Cabinet's numbers are a C-ish chunk of bytes so it won't recognize digits in <code>String</code> form. This also means you don't generally want to read an <code>:add</code>ed value with a normal call to <code>[]</code>. It probably won't look like anything you are expecting. Tokyo Cabinet also uses different formats for <code>Integer</code> and <code>Float</code> values, so you will get the same error if you try to switch. Always add the same type of number to a given field.</p>
<p>Another unusual type of value management can be done in Tokyo Cabinet by appending to a value with <code>:cat</code> mode. For example:</p>
<div class="highlight highlight-ruby"><pre><span class="c1">#!/usr/bin/env ruby -KU</span>
<span class="nb">require</span> <span class="s2">"oklahoma_mixer"</span>
<span class="no">OklahomaMixer</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="s2">"data.tch"</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">db</span><span class="o">|</span>
<span class="n">db</span><span class="o">.</span><span class="n">store</span><span class="p">(</span><span class="ss">:friend_ids</span><span class="p">,</span> <span class="s2">" 1"</span><span class="p">,</span> <span class="ss">:cat</span><span class="p">)</span>
<span class="n">db</span><span class="o">.</span><span class="n">store</span><span class="p">(</span><span class="ss">:friend_ids</span><span class="p">,</span> <span class="s2">" 3"</span><span class="p">,</span> <span class="ss">:cat</span><span class="p">)</span>
<span class="n">db</span><span class="o">.</span><span class="n">store</span><span class="p">(</span><span class="ss">:friend_ids</span><span class="p">,</span> <span class="s2">" 5"</span><span class="p">,</span> <span class="ss">:cat</span><span class="p">)</span>
<span class="n">db</span><span class="o">.</span><span class="n">store</span><span class="p">(</span><span class="ss">:friend_ids</span><span class="p">,</span> <span class="s2">" 3"</span><span class="p">,</span> <span class="ss">:cat</span><span class="p">)</span>
<span class="n">db</span><span class="o">[</span><span class="ss">:friend_ids</span><span class="o">]</span> <span class="c1"># => " 1 3 5 3"</span>
<span class="n">db</span><span class="o">[</span><span class="ss">:friend_ids</span><span class="o">].</span><span class="n">to_s</span><span class="o">.</span><span class="n">scan</span><span class="p">(</span><span class="sr">/\S+/</span><span class="p">)</span><span class="o">.</span><span class="n">uniq</span> <span class="c1"># => ["1", "3", "5"]</span>
<span class="k">end</span>
</pre></div>
<p>As you can see, this method will create a value if it didn't exist and then continue appending to the value after it does. If you need the opposite behavior, to avoid messing with a key that already exists, try <code>:keep</code> mode instead:</p>
<div class="highlight highlight-ruby"><pre><span class="c1">#!/usr/bin/env ruby -KU</span>
<span class="nb">require</span> <span class="s2">"oklahoma_mixer"</span>
<span class="no">OklahomaMixer</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="s2">"data.tch"</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">db</span><span class="o">|</span>
<span class="n">db</span><span class="o">[</span><span class="ss">:exists</span><span class="o">]</span> <span class="o">=</span> <span class="s2">"Can't touch this!"</span>
<span class="n">db</span><span class="o">.</span><span class="n">store</span><span class="p">(</span><span class="ss">:exists</span><span class="p">,</span> <span class="s2">"Lost."</span><span class="p">,</span> <span class="ss">:keep</span><span class="p">)</span> <span class="c1"># => false</span>
<span class="n">db</span><span class="o">[</span><span class="ss">:exists</span><span class="o">]</span> <span class="c1"># => "Can't touch this!"</span>
<span class="k">end</span>
</pre></div>
<p>Similarly, you can just pass a block to <code>store()</code> that will be called if a key already exists. That block is expected to return the value that should be saved to the database:</p>
<div class="highlight highlight-ruby"><pre><span class="c1">#!/usr/bin/env ruby -KU</span>
<span class="nb">require</span> <span class="s2">"oklahoma_mixer"</span>
<span class="no">OklahomaMixer</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="s2">"data.tch"</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">db</span><span class="o">|</span>
<span class="n">adder</span> <span class="o">=</span> <span class="nb">lambda</span> <span class="p">{</span> <span class="o">|</span><span class="n">key</span><span class="p">,</span> <span class="n">old_value</span><span class="p">,</span> <span class="n">new_value</span><span class="o">|</span> <span class="n">old_value</span><span class="o">.</span><span class="n">to_i</span> <span class="o">+</span> <span class="n">new_value</span> <span class="p">}</span>
<span class="n">db</span><span class="o">[</span><span class="ss">:num</span><span class="o">]</span> <span class="c1"># => nil</span>
<span class="n">db</span><span class="o">.</span><span class="n">store</span><span class="p">(</span><span class="ss">:num</span><span class="p">,</span> <span class="mi">41</span><span class="p">,</span> <span class="o">&</span><span class="n">adder</span><span class="p">)</span>
<span class="n">db</span><span class="o">[</span><span class="ss">:num</span><span class="o">]</span> <span class="c1"># => "41"</span>
<span class="n">db</span><span class="o">.</span><span class="n">store</span><span class="p">(</span><span class="ss">:num</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="o">&</span><span class="n">adder</span><span class="p">)</span>
<span class="n">db</span><span class="o">[</span><span class="ss">:num</span><span class="o">]</span> <span class="c1"># => "42"</span>
<span class="k">end</span>
</pre></div>
<p>These modes give you some powerful ways to build up values over time, even with different processes working on the same data. Their effects are atomic and that's important in any multiprocessing environment.</p>
<h4>Transactions</h4>
<p>Transactions are a big part of what makes Tokyo Cabinet great to work with. With them you can define a set of actions that must succeed or fail as a whole. Let's start by considering this from the classical transferring money between accounts example:</p>
<div class="highlight highlight-ruby"><pre><span class="c1">#!/usr/bin/env ruby -KU</span>
<span class="nb">require</span> <span class="s2">"oklahoma_mixer"</span>
<span class="no">OklahomaMixer</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="s2">"data.tch"</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">db</span><span class="o">|</span>
<span class="n">db</span><span class="o">[</span><span class="s2">"accounts:1:balance"</span><span class="o">]</span> <span class="o">=</span> <span class="mi">100</span>
<span class="n">db</span><span class="o">[</span><span class="s2">"accounts:2:balance"</span><span class="o">]</span> <span class="o">=</span> <span class="mi">100</span>
<span class="n">db</span><span class="o">.</span><span class="n">transaction</span> <span class="k">do</span>
<span class="n">db</span><span class="o">[</span><span class="s2">"accounts:1:balance"</span><span class="o">]</span> <span class="o">=</span> <span class="n">db</span><span class="o">[</span><span class="s2">"accounts:1:balance"</span><span class="o">].</span><span class="n">to_i</span> <span class="o">-</span> <span class="mi">10</span>
<span class="n">db</span><span class="o">[</span><span class="s2">"accounts:2:balance"</span><span class="o">]</span> <span class="o">=</span> <span class="n">db</span><span class="o">[</span><span class="s2">"accounts:2:balance"</span><span class="o">].</span><span class="n">to_i</span> <span class="o">+</span> <span class="mi">10</span>
<span class="k">end</span>
<span class="n">db</span><span class="o">[</span><span class="s2">"accounts:1:balance"</span><span class="o">]</span> <span class="c1"># => "90"</span>
<span class="n">db</span><span class="o">[</span><span class="s2">"accounts:2:balance"</span><span class="o">]</span> <span class="c1"># => "110"</span>
<span class="k">end</span>
</pre></div>
<p>That code should be easy to understand. I just removed an amount from one account and added that same amount to the other. I've done this transfer inside of a <code>transaction()</code>, but it doesn't really have any effect when things go right as they did here. Let's break something and see what happens:</p>
<div class="highlight highlight-ruby"><pre><span class="c1">#!/usr/bin/env ruby -KU</span>
<span class="nb">require</span> <span class="s2">"oklahoma_mixer"</span>
<span class="no">OklahomaMixer</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="s2">"data.tch"</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">db</span><span class="o">|</span>
<span class="n">db</span><span class="o">[</span><span class="s2">"accounts:1:balance"</span><span class="o">]</span> <span class="o">=</span> <span class="mi">100</span>
<span class="n">db</span><span class="o">[</span><span class="s2">"accounts:2:balance"</span><span class="o">]</span> <span class="o">=</span> <span class="mi">100</span>
<span class="k">begin</span>
<span class="n">db</span><span class="o">.</span><span class="n">transaction</span> <span class="k">do</span>
<span class="n">db</span><span class="o">[</span><span class="s2">"accounts:1:balance"</span><span class="o">]</span> <span class="o">=</span> <span class="n">db</span><span class="o">[</span><span class="s2">"accounts:1:balance"</span><span class="o">].</span><span class="n">to_i</span> <span class="o">-</span> <span class="mi">10</span>
<span class="n">db</span><span class="o">[</span><span class="s2">"accounts:2:balance"</span><span class="o">]</span> <span class="o">=</span> <span class="n">db</span><span class="o">[</span><span class="s2">"accounts:2:balance"</span><span class="o">].</span><span class="n">to_i</span> <span class="o">+</span> <span class="mi">10</span>
<span class="nb">fail</span> <span class="s2">"Oops!"</span>
<span class="k">end</span>
<span class="k">rescue</span>
<span class="c1"># do nothing: just continue on to checking the balances</span>
<span class="k">end</span>
<span class="n">db</span><span class="o">[</span><span class="s2">"accounts:1:balance"</span><span class="o">]</span> <span class="c1"># => "100"</span>
<span class="n">db</span><span class="o">[</span><span class="s2">"accounts:2:balance"</span><span class="o">]</span> <span class="c1"># => "100"</span>
<span class="k">end</span>
</pre></div>
<p>This time we see the difference. Both of my actions against the database had already been processed. However, my <code>fail()</code> call was part of the same <code>transaction()</code> and the <code>Exception</code> meant everything had to be undone. Notice that the account balances were restored to their previous values.</p>
<p>It is possible for you to cancel a <code>transaction()</code> without triggering an <code>Exception</code>. That's what the <code>abort()</code> method is for:</p>
<div class="highlight highlight-ruby"><pre><span class="c1">#!/usr/bin/env ruby -KU</span>
<span class="nb">require</span> <span class="s2">"oklahoma_mixer"</span>
<span class="no">OklahomaMixer</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="s2">"data.tch"</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">db</span><span class="o">|</span>
<span class="n">db</span><span class="o">.</span><span class="n">store</span><span class="p">(</span><span class="s2">"globals:user_id"</span><span class="p">,</span> <span class="mi">41</span><span class="p">,</span> <span class="ss">:add</span><span class="p">)</span> <span class="c1"># pretend we have a few users</span>
<span class="n">db</span><span class="o">[</span><span class="s2">"users:42:last_name"</span><span class="o">]</span> <span class="o">=</span> <span class="s2">"Nobody"</span> <span class="c1"># and some bad data</span>
<span class="n">user</span> <span class="o">=</span> <span class="p">{</span><span class="ss">:first_name</span> <span class="o">=></span> <span class="s2">"James"</span><span class="p">,</span> <span class="ss">:last_name</span> <span class="o">=></span> <span class="s2">"Gray"</span><span class="p">}</span>
<span class="n">db</span><span class="o">.</span><span class="n">transaction</span> <span class="k">do</span>
<span class="n">user</span><span class="o">[</span><span class="ss">:id</span><span class="o">]</span> <span class="o">=</span> <span class="n">db</span><span class="o">.</span><span class="n">store</span><span class="p">(</span><span class="s2">"globals:user_id"</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="ss">:add</span><span class="p">)</span>
<span class="k">if</span> <span class="n">user</span><span class="o">.</span><span class="n">all?</span> <span class="p">{</span> <span class="o">|</span><span class="n">k</span><span class="p">,</span> <span class="n">v</span><span class="o">|</span> <span class="n">db</span><span class="o">.</span><span class="n">store</span><span class="p">(</span><span class="s2">"users:</span><span class="si">#{</span><span class="n">user</span><span class="o">[</span><span class="ss">:id</span><span class="o">]</span><span class="si">}</span><span class="s2">:</span><span class="si">#{</span><span class="n">k</span><span class="si">}</span><span class="s2">"</span><span class="p">,</span> <span class="n">v</span><span class="p">,</span> <span class="ss">:keep</span><span class="p">)</span> <span class="p">}</span>
<span class="n">user</span><span class="o">[</span><span class="ss">:saved</span><span class="o">]</span> <span class="o">=</span> <span class="kp">true</span>
<span class="k">else</span>
<span class="n">db</span><span class="o">.</span><span class="n">abort</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">unless</span> <span class="n">user</span><span class="o">[</span><span class="ss">:saved</span><span class="o">]</span>
<span class="nb">puts</span> <span class="s2">"Unable to save user. Problem field(s):"</span>
<span class="n">user</span><span class="o">.</span><span class="n">each_key</span> <span class="k">do</span> <span class="o">|</span><span class="n">key</span><span class="o">|</span>
<span class="k">if</span> <span class="n">value</span> <span class="o">=</span> <span class="n">db</span><span class="o">[</span><span class="s2">"users:</span><span class="si">#{</span><span class="n">user</span><span class="o">[</span><span class="ss">:id</span><span class="o">]</span><span class="si">}</span><span class="s2">:</span><span class="si">#{</span><span class="n">key</span><span class="si">}</span><span class="s2">"</span><span class="o">]</span>
<span class="nb">puts</span> <span class="sx">%Q{db["users:</span><span class="si">#{</span><span class="n">user</span><span class="o">[</span><span class="ss">:id</span><span class="o">]</span><span class="si">}</span><span class="sx">:</span><span class="si">#{</span><span class="n">key</span><span class="si">}</span><span class="sx">"] = </span><span class="si">#{</span><span class="n">value</span><span class="o">.</span><span class="n">inspect</span><span class="si">}</span><span class="sx">}</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="c1"># >> Unable to save user. Problem field(s):</span>
<span class="c1"># >> db["users:42:last_name"] = "Nobody"</span>
<span class="k">end</span>
<span class="k">end</span>
</pre></div>
<p>As you can see, <code>abort()</code> didn't toss an <code>Exception</code> but it rolled back my <code>transaction()</code> all the same. None of the new user fields were added to the database because they couldn't all be safely added. I knew that because one of the <code>:keep</code> mode calls to <code>store()</code> returned <code>false</code> when it tried to set an already existing key.</p>
<p>That's the magic of transactions. They are an all-or-nothing thing. Only if your block completes with no <code>Exception</code> thrown and no call to <code>abort()</code> will all of the changes be made.</p>
<h4>Database File Maintenance</h4>
<p>There are a lot of advantages that come with a database that's just one file in the file system. You can build symlinks to it, set permissions on it, and check its size with the normal tools your OS provides (though Oklahoma Mixer does have a <code>file_size()</code> method that returns the file size in bytes, if you need it). Of course, there are also tradeoffs you should stay aware of.</p>
<p>First, The file can get a little bloated over time. The reason is normal fragmentation: Tokyo Cabinet may clear a key freeing up some space and later fill it with a not-quite-as-big item. It may not find a good use for the even smaller remaining space for a long time. This creates small pockets of unused space that grow the file over time.</p>
<p>The easiest way to deal with this is to call <code>defrag()</code> periodically at a slow time. This will lock up the database for a few seconds while Tokyo Cabinet cleans it up. This will take care of the wasted space and shrink the file size back down (assuming it was fragmented).</p>
<p>Another issue to stay aware of is how you make backup copies of the database file. You need to be careful about using standard tools like <code>cp</code> or <code>rsync</code> on a Tokyo Cabinet file. It's fine if you know all connections to it are currently closed, but it's not safe when a connection might be changing the data inside of it mid-copy. If you try that, you will likely get a corrupt copy of the database.</p>
<p>The solution is to call <code>copy()</code> and pass in the path where you would like to create a copy of the database. It will synchronize the data, lock out changes, and then make a full duplicate. This process is quite snappy, even with bigger data sets. If desired, you can ask Oklahoma Mixer for the <code>path()</code> of the original database, edit it in some small way, and use that as the path for the duplicate database.</p>
<p>Just make sure you keep these issues in mind as you plan out your storage.</p>
<p>Those are the basics of using Tokyo Cabinet as a key-value store, but there's really a lot more to what Tokyo Cabinet can do. I'll show you what all is built onto this simple foundation in upcoming articles.</p>James Edward Gray IIUsing Key-Value Stores From Rubytag:graysoftinc.com,2009-09-14:/posts/872014-04-18T21:01:21ZThe table of contents for a series of posts about working with some popular key-value stores from Ruby code.<p>I've been playing with a few different key-value stores recently. My choices are pretty popular and you can find documentation for them. However, it can still be a bit of work to relate everything to Ruby specific usage, which is what I care about. Given that, here are my notes on the systems I've used.</p>
<h4>Redis</h4>
<ol>
<li><a href="/key-value-stores/setting-up-the-redis-server">Setting up the Redis Server</a></li>
<li><a href="/key-value-stores/using-redis-as-a-key-value-store">Using Redis as a Key-Value Store</a></li>
<li><a href="/key-value-stores/lists-and-sets-in-redis">Lists and Sets in Redis</a></li>
<li><a href="/key-value-stores/where-redis-is-a-good-fit">Where Redis is a Good Fit</a></li>
</ol><h4>Tokyo Cabinet, Tokyo Tyrant, and Tokyo Dystopia</h4>
<ol>
<li>Installing the Tokyo Software</li>
<li><a href="/key-value-stores/tokyo-cabinet-as-a-key-value-store">Tokyo Cabinet as a Key-Value Store</a></li>
<li><a href="/key-value-stores/tokyo-cabinets-key-value-database-types">Tokyo Cabinet's Key-Value Database Types</a></li>
<li>Tokyo Cabinet's Tables</li>
<li>Threads and Multiprocessing With Tokyo Cabinet</li>
<li>Tokyo Tyrant as a Network Interface</li>
<li>The Strengths of Tokyo Cabinet</li>
</ol>James Edward Gray II