10
JAN2010
Tokyo Cabinet's Key-Value Database Types
We've taken a good look at Tokyo Cabinet's Hash Database, but there's a lot more to the library than just that. Tokyo Cabinet supports three other kinds of databases. In addition, each database type accepts various tuning parameters that can be used to change its behavior. Each database type and setting involves different tradeoffs so you really have a lot of options for turning Tokyo Cabinet into exactly what you need. Let's look into some of those options now.
The B+Tree Database
Tokyo Cabinet's B+Tree Database is a little slower than the Hash Database we looked at before. That's its downside. However, giving up a little speed gains you several extra features that may just allow you to work smarter instead of faster.
The B+Tree Database is a more advanced form of the Hash Database. What that means is that all of the stuff I showed you in the last article still applies. You can set, read, and remove values by keys, iteration is supported, and you still have access to the neat options like adding to counters. With a B+Tree Database you get all of that and more.
The first major addition is that a B+Tree Database is ordered. You don't really need to do anything to turn this on, it's just the way it is. As you add pairs to the database, they will be ordered by the keys you use. The default ordering is lexical:
#!/usr/bin/env ruby -wKU
require "oklahoma_mixer"
OklahomaMixer.open("ordered.tcb") do |db|
db[:c] = 3
db[:a] = 1
db[:b] = 2
db.to_a # => [["a", "1"], ["b", "2"], ["c", "3"]]
end
This simple example shows us a couple of things. First, creating a B+Tree Database is as simple as changing the file extension. Remember when I said the .tch
stood for Tokyo Cabinet Hash Database? Well, it shouldn't be too surprising that .tcb
stands for Tokyo Cabinet B+Tree Database. Oklahoma Mixer will notice which extension you use and load the right features for that database type.
The other thing to notice here is the ordering. I purposely added the keys out of order, but you can see that to_a()
shows them all lined up correctly. Now to_a()
is really just an iterator the database object inherits from Enumerable
, so we now know that iteration will be in database order. Methods like keys()
and even values()
will also return their listings in order as well.
As I said, the default ordering is lexical, so number keys are little strange:
#!/usr/bin/env ruby -wKU
require "oklahoma_mixer"
OklahomaMixer.open("lexical.tcb") do |db|
db[1] = :first
db[2] = :middle
db[11] = :last
db.to_a # => [["1", "first"], ["11", "last"], ["2", "middle"]]
end
Notice they don't come out in the order we would probably think is most natural, as I described in the values. To fix that we need to change the default ordering and you can do that using a tuning parameter of the B+Tree Database. We are allowed to set a comparison function when we open()
the database that will order the keys however we desire. This function is just like a block you would pass to sort()
in Ruby: it will be handed two keys at a time to compare and it is expected to return negative, zero, or positive for the first argument being less than, equal to, or greater than the second. The good news is, you can usually cheat your way out of remembering these comparison rules by leaning on Ruby's spaceship operator:
#!/usr/bin/env ruby -wKU
require "oklahoma_mixer"
OklahomaMixer.open( "numerical.tcb",
:cmpfunc => lambda { |a, b| a.to_i <=> b.to_i } ) do |db|
db[1] = :first
db[2] = :middle
db[11] = :last
db.to_a # => [["1", "first"], ["2", "middle"], ["11", "last"]]
end
This example shows how tuning parameters get set with Oklahoma Mixer. Just pass some keyword arguments to open()
for each parameter you need to adjust. This allows Oklahoma Mixer to perform the needed setup before connecting to your database. That's critical for things like a B+Tree comparison function that have to be set before the database is accepting data.
It's worth noting that the comparison function is not stored in the database file and needs to be reset (to the same function if you want to avoid unpredictable results) each time you open()
that database.
OK, enough about ordering. What else do we get with the B+Tree Database?
You also get key ranges. Since the database has an inherit order, we're no longer limited to :prefix
searches of the keys()
and we can now ask for all of the keys between two endpoints:
#!/usr/bin/env ruby -wKU
require "oklahoma_mixer"
OklahomaMixer.open("ranges.tcb") do |db|
db.update(:a => 1, :b => 2, :c => 3, :d => 4, :e => 5)
db.keys(:range => "ab".."d") # => ["b", "c", "d"]
db.keys(:range => "ab"..."d") # => ["b", "c"]
end
Note that I used "ab"
in my Range
queries which is really between the actual "a"
and "b"
keys in the database. That works just fine.
You can also pass the :limit
option I've shown before with :range
, but you can't pass :prefix
. It's one or the other: :prefix
or :range
.
This ability to work with a Range
of keys is even extended to the iterators. You've always had the ability to stop iterating whenever you like using Ruby's break
keyword, but now you can tell the iterators where to start, making it possible to iterate over a subset of the pairs in the database:
#!/usr/bin/env ruby -wKU
require "oklahoma_mixer"
OklahomaMixer.open("ranges.tcb") do |db|
db.update(:a => 1, :b => 2, :c => 3, :d => 4, :e => 5)
db.each("ab") do |key, value|
puts "%p => %p" % [key, value]
break if key >= "d"
end
# >> "b" => "2"
# >> "c" => "3"
# >> "d" => "4"
end
Again, I used "ab"
and it jumped to the first key after that. The only place that might get a little confusing is if you try that same trick with the (also added to B+Tree Databases) reverse_each()
iterator:
#!/usr/bin/env ruby -wKU
require "oklahoma_mixer"
OklahomaMixer.open("ranges.tcb") do |db|
db.update(:a => 1, :b => 2, :c => 3, :d => 4, :e => 5)
db.reverse_each("ddd") do |key, value|
puts "%p => %p" % [key, value]
break if key <= "b"
end
# >> "e" => "5"
# >> "d" => "4"
# >> "c" => "3"
# >> "b" => "2"
end
See how it started with "e"
? It always jumps to the first key equal to or after the one you provide, even if you are planning to iterate backwards. Since "ddd"
is between "d"
and "e"
, that means we start on the key after "ddd"
("e"
).
B+Tree Databases have one more feature and it's a wild one. These databases support an additional storage mode that allows duplicate values to be stored under the same key:
#!/usr/bin/env ruby -wKU
require "oklahoma_mixer"
OklahomaMixer.open("dupes.tcb") do |db|
%w[James Dana Baby].each do |name|
db.store("Gray", name, :dup)
end
db.to_a # => [["Gray", "James"], ["Gray", "Dana"], ["Gray", "Baby"]]
end
As you can see, the :dup
storage mode shuts off the normal value replacing behavior and instead inserts the duplicate value after what was already stored for that key.
Several methods in Oklahoma Mixer have been expanded to support these duplicate values. For example, with a B+Tree Database you can scope values()
or size()
to a specific key, retrieving just the values()
stored under that key or getting a count of how many values there are for that key:
#!/usr/bin/env ruby -wKU
require "oklahoma_mixer"
OklahomaMixer.open("names.tcb") do |db|
db["Matsumoto"] = "Yukihiro"
%w[James Dana].each do |name|
db.store("Gray", name, :dup)
end
db.values # => ["James", "Dana", "Yukihiro"]
db.values("Gray") # => ["James", "Dana"]
db.size # => 3
db.size("Gray") # => 2
end
You will need to use these methods to work with duplicate values because normal indexing, fetch()
, and delete()
still just work with the first value stored under a key. That behavior can be valuable too though:
#!/usr/bin/env ruby -wKU
require "oklahoma_mixer"
OklahomaMixer.open("todo.tcb") do |db|
if db.size.zero?
%w[B+tree Fixed-length tuning].each do |topic|
db.store(:blog, topic, :dup)
end
end
puts "Write about #{db.delete(:blog)}."
end
If I run that program three times, this is what you see:
$ ruby tc_example.rb
Write about B+tree.
$ ruby tc_example.rb
Write about Fixed-length.
$ ruby tc_example.rb
Write about tuning.
See how delete()
just kept pulling the first value that was left? That allowed us to use it as a simple queue in this case.
The delete()
method can be passed the :dup
storage mode as a second argument. When you do, all values under the passed key will be removed.
When working with duplicates, be aware that keys()
and each_key()
(or any iterator) behave differently. keys()
returns a unique list, so keys with duplicate values under them will only be listed once. Iteration walks each pair in the database though, so a key will come up once for each value stored under it. Put another way, iteration does show duplicates while keys()
won't.
Let me show one last, slightly bigger example to bring together all of the features discussed above. Here's a little more involved queuing system:
#!/usr/bin/env ruby -wKU
require "oklahoma_mixer"
GROUPS = {nil => 0, "critical" => 1, "normal" => 2, "low" => 3}
order = lambda { |a, b|
a_group, a_priority = a.split(":")
b_group, b_priority = b.split(":")
[GROUPS[a_group], -a_priority.to_i] <=> [GROUPS[b_group], -b_priority.to_i]
}
OklahomaMixer.open("queue.tcb", :cmpfunc => order) do |db|
case ARGV.shift
when "add"
group = "normal"
ARGV.delete_if { |o| o =~ /\A--(critical|low)\z/ and group = $1 }
priority = 10
ARGV.delete_if { |o| o =~ /\A-(\d+)\z/ and priority = $1.to_i }
db.store("#{group}:#{priority}", ARGV.join("; "), :dup)
when "list"
db.each do |key, value|
puts key
puts " #{value}"
end
when "do_one"
if key = db.keys(:limit => 1).first and job = db.delete(key)
eval(job)
end
when "do_all"
loop do
if key = db.keys(:limit => 1).first and job = db.delete(key)
eval(job)
else
break
end
end
else
abort "Usage: #{$PROGRAM_NAME} add|list|do_one|do_all [OPTIONS]"
end
end
Most of that should be pretty straightforward code after all we've talked about, but let me point out one tricky spot. I had to add the nil => 0
entry to my GROUPS
because fetching a full, or in this case :limit
ed, set of keys()
is really just a :prefix
query with an empty :prefix
. Because of that, you want to make sure your ordering functions always order an empty String
key before anything else. Calling split()
on the empty String
gives me a nil
group, which is converted to a 0
so it will come out first.
It's probably also worth pointing out that I could have just used each()
with the do_all
command. However, always fetching the first key and using that is a little better in a multiprocessing environment where other processes might be adding to the queue. If I'm iterating through the list, I won't see new critical
jobs if they are added above where I am at. Using keys()
though, I'll always grab the most important job next. This code isn't really built for multiprocessing to tell the truth, but let's save that discussion for a later article.
Anyway, here's an example of me playing around with the program above, so you can see how it works in practice:
$ ruby queue.rb add 'puts "An average job."'
$ ruby queue.rb add --low 'puts "This can wait..."'
$ ruby queue.rb add --critical 'puts "Very important."'
$ ruby queue.rb add --critical -100 'puts "Most important!"'
$ ruby queue.rb listcritical:100
puts "Most important!"
critical:10
puts "Very important."
normal:10
puts "An average job."
low:10
puts "This can wait..."
$ ruby queue.rb do_one
Most important!
$ ruby queue.rb do_one
Very important.
$ ruby queue.rb do_all
An average job.
This can wait...
To summarize, the B+Tree Database gives you ordering, key ranges and cursor based iteration (the ability to skip to a specific key), and duplicate storage. You pay a speed penalty for these added features though. That's the tradeoff.
The Fixed-length Database
Another type of database supported by Tokyo Cabinet is the Fixed-length Database. It too is an extension of the Hash Database, supporting most of the features I showed you in that article. However, I'm not going to lie to you, this database type comes with three significant restrictions.
First, all keys are Integer
s greater than 0
. You can't use arbitrary String
s as you do with the Hash and B+Tree Databases. As such, you lose the ability to do :prefix
queries on keys()
. The database is ordered though, similar to the B+Tree Database. You can't change this ordering, but it is done numerically (instead of lexically) since all keys are just Integer
s anyway. Given that, :range
queries on keys()
are supported. Methods like keys()
and the iterators will pass you Integer
keys in Ruby, instead of the String
keys you get with the other database types.
The second downside is that all values stored have a fixed-length, which is what gives the database its name. This length defaults to 255
, but you can tune it to anything you like with the :width
tuning parameter:
#!/usr/bin/env ruby -wKU
require "oklahoma_mixer"
OklahomaMixer.open("four.tcf", :width => 4) do |db|
db.update( 1 => :one,
2 => :two,
3 => :three,
4 => :four,
5 => :fix,
6 => :six,
7 => :seven,
8 => :eight )
db.each_value do |num|
puts num
end
# >> one
# >> two
# >> thre
# >> four
# >> fix
# >> six
# >> seve
# >> eigh
end
Notice how everything beyond my selected :width
of 4
was just silently discarded. That's the fixed-length at work.
Also note that you create a Tokyo Cabinet Fixed-length Database as you probably expect by now, with the file extension .tcf
.
Finally, the Fixed-length Database has one last size limit. The overall file size of the database is limited to 268435456
bytes, by default. This too can be tuned using the :limsiz
tuning parameter and you are free to make the limit quite large. Just remember that values are fixed length, so setting :width => 1024, :limsiz => 4 * 1024
will mean your database only holds four keys. Trying to add data beyond this limit will raise an OklahomaMixer::Error::CabinetError
.
That's a lot of minuses and you are probably wondering why anyone would be willing to accept all of these limits when we've already seen that there are more powerful options. The answer is performance. The Fixed-length Database is Tokyo Cabinet's fastest weapon. It treats the database file as a raw array of bytes and it can jump straight to any value with simple math. (Due to this, defrag()
isn't supported on a Fixed-length Database, though Oklahoma Mixer does provide a no-op just to match the interface of the other database types.) That makes it wicked quick. If your data storage needs are simple enough to fit within these limitations, you can take advantage of this added speed boost.
The Fixed-length Database interface does have one other neat feature I should mention. It supports four special key names: :min
, :max
, :prev
, and :next
. You can use these values in many of the methods that take keys. For example:
#!/usr/bin/env ruby -wKU
require "oklahoma_mixer"
OklahomaMixer.open("special.tcf") do |db|
db.update( 1 => :first,
2 => :middle,
42 => :last )
db[:min] # => "first"
db[:max] # => "last"
db[:next] = :added
db.keys # => [1, 2, 42, 43]
db[43] # => "added"
end
Be careful when using these. :min
and :max
will raise an OklahomaMixer::Error::CabinetError
if there are no keys in the database. :prev
(not shown above) is even pickier, requiring a :min
key that is above 1
, so it can safely add below it without hitting 0
. I find :next
pretty useful though, as it makes it possible to queue up values. Here's the simple queue example I showed in the B+Tree code rewritten to use a Fixed-length Database instead:
#!/usr/bin/env ruby -wKU
require "oklahoma_mixer"
OklahomaMixer.open("todo.tcf") do |db|
# load the data
%w[B+tree Fixed-length tuning].each do |topic|
db[:next] = "Write about #{topic}."
end
# read it back
loop do
begin
puts db.delete(:min)
rescue OklahomaMixer::Error::CabinetError # no keys for :min
break
end
end
# >> Write about B+tree.
# >> Write about Fixed-length.
# >> Write about tuning.
end
To summarize, the B+Tree Database may slow you down a little, but the Fixed-length speeds you up, as long as you can accept certain restrictions. Select the database type that best fits the needs of your data.
Tuning Parameters
We've seen how to set tuning parameters in the examples above and already learned what some do. I'm going to save some parameters for discussions in later articles, when we talk about their specific functions. For now though, here are most of the tuning parameters available to the three databases we've covered so far.
-
:bnum
(for Hash or B+Tree—can be used withoptimize()
): Specifies the number of elements to use in the bucket array. The default is131071
for Hash Databases and32749
for B+Tree Databases. The suggested size is from 0.5 to 4 times the total number of records stored for Hash Databases or 1 to 4 times the total for B+Tree Databases. -
:apow
(for Hash or B+Tree—can be used withoptimize()
): Specifies to the size of record alignment as a power of 2. The default is4
for Hash Databases and8
for B+Tree Databases, meaning2 ** 4 = 16
and2 ** 8 = 256
respectively. -
:fpow
(for Hash or B+Tree—can be used withoptimize()
): Specifies the maximum number of elements in the free block pool as a power of 2. The default is10
, meaning2 ** 10 = 1024
. -
:opts
(for Hash or B+Tree—can be used withoptimize()
): Specifies the options for the database in aString
of recognized character codes. There are no options by default, but this is commonly set to"ld"
or"lb"
for bigger databases. The options are:-
"l"
allows the database file to grow large (over 2 GB) by using a 64-bit bucket array. -
"d"
compresses each record in a Hash Database or page in a B+Tree Database with Deflate compression. -
"b"
compresses each record in a Hash Database or page in a B+Tree Database with BZIP2 compression. -
"t"
compresses each record in a Hash Database or page in a B+Tree Database with TCBS compression.
-
-
:rcnum
(for Hash): Specifies the maximum number of records to be cached. It is0
or disabled by default. -
:xmsiz
(for Hash or B+Tree): Specifies the size of extra mapped memory. The default is67108864
for Hash Databases or0
(disabled) for B+Tree Databases. -
:dfunit
(for Hash or B+Tree): Specifies the auto defragmentation unit step number. It is0
or disabled by default. -
:cmpfunc
(for B+tree): Specifies the comparison function used to order B+Tree Databases. See the detailed examples above for an explanation. -
:lmemb
(for B+Tree—can be used withoptimize()
): Specifies the number of members in each leaf page. The default is128
. -
:nmemb
(for B+Tree—can be used withoptimize()
): Specifies the number of members in each non-leaf page. The default is256
. -
:lcnum
(for B+tree): Specifies the maximum number of leaf nodes to be cached. The default is1024
. -
:ncnum
(for B+tree): Specifies the maximum number of non-leaf nodes to be cached. The default is512
. -
:width
(for Fixed-length—can be used withoptimize()
): Specifies the width of values in Fixed-length Databases. See the detailed examples above for an explanation. -
:limsiz
(for Fixed-length—can be used withoptimize()
): Specifies the limit on database file size in Fixed-length Databases. See the detailed examples above for an explanation.
I apologize for keeping the cryptic names in Oklahoma Mixer, but I felt it was better to stick with what Tokyo Cabinet uses so users could read about them in documentation and other resources for that library. Tokyo Tyrant also uses these names to configure a database by command-line, so you will find them in several different contexts.
Database objects have an optimize()
method that can be used to modify the tuning parameters of an open()
database. The parameters that can be used as such are noted above. There are sometimes additional restrictions though. For example, the :limsiz
of a Fixed-length Database usually has to be increased when changed through optimize()
.
That covers the various key-value database types in Tokyo Cabinet. The fourth type is quite a bit different from what we've look at so far, so I'll do a full article on it next.
Comments (4)
-
Ilya Grigorik January 16th, 2010 Reply Link
Awesome stuff James. Great to see a good explanation on all the tuning parameters - though to be honest, I think those deserve an entire post by themselves! Lots of people get confused and lost in them.
-
I'm told there is some good documentation for them in Japanese, but I haven't seen a lot of great stuff in English, no.
-
-
James, I found your two posts on on Tokyo Cabinet to be priceless. Any thoughts on building out a Kyoto Cabinet equivalent ?
-
I may do that some day. I have yet to play with it myself yet, but I want to.
-