Key-Value Stores

Notes from my learning about simple NoSQL storage solutions.

16

SEP
2009

Lists and Sets in Redis

[Update: though all of the techniques I show here still apply, many methods of the Redis gem have changed names to match the actual Redis commands they call. There are also easier and more powerful ways to do some of what I show in here, thanks to additions to Redis.]

Redis adds one huge twist to traditional key-value storage: collections. Supporting both lists and sets through some very powerful atomic operations allows for advanced key-value usage.

Lists

Redis allows a single key to hold a list of values. This is your typical ordered list with the operations you would expect: appending, indexed access, and access to a range of values.

This has many potential uses. I'll cover two that I think will be very common. First, if you are going to use Redis as a full database, you store things that are naturally a list of items, like comments, in a real list. Let's look at some code:

#!/usr/bin/env ruby -wKU

require "redis"

CLEAR = `clear`

# create an article to comment on
db                = Redis.new
article_id        = db.incr("global:next_article_id")
article           = "article:#{article_id}"
class << article
  def method_missing(field, *args, &blk)
    return super unless field.to_s !~ /[!?=]\z/ && args.empty? && blk.nil?
    "#{self}:#{field}"
  end
end
db[article.title] = "My Favorite Language"
db[article.body]  = "I love Ruby!"
# initialize some session details
comments_per_page = 2
comment_page      = 1
login             = ARGV.shift || "JEG2"

loop do
  # show article
  print CLEAR
  puts "#{db[article.title]}:"
  puts "  #{db[article.body]}"

  # paginate comments
  start      =  comments_per_page * (comment_page - 1)
  finish     =  start + comments_per_page - 1
  comments   =  db.list_range(article.comments, start, finish)
  pagination =  Array(start.zero? ? nil : "(p)revious")
  pagination << "(n)ext" if db.list_length(article.comments) - 1 > finish
  # show comments
  comments.each do |comment|
    posted, user, body = comment.split("|", 3)
    puts "----"
    puts "  #{body}"
    puts "  posted by #{user} on #{posted}"
  end

  # handle commands
  puts
  print "Command? [#{(%w[(c)omment (q)uit] + pagination).join(', ')}]  "
  case (command = gets)
  when /\Ac(?:omment)?\Z/i  # add a comment
    print "Your comment?  "
    comment = gets or break
    posted  = Time.now.strftime('%m/%d/%Y at %H:%M:%S')
    db.push_tail(article.comments, "#{posted}|#{login}|#{comment.strip}")
  when /\Ap(?:revious)?\Z/i  # view previous page of comments
    if pagination.first =~ /\A\(p\)/
      comment_page -= 1
    else
      puts "You are on the first page of comments."
      gets or break
    end
  when /\An(?:ext)?\Z/i  # view next page of comments
    if pagination.last =~ /\A\(n\)/
      comment_page += 1
    else
      puts "You are on the last page of comments."
      gets or break
    end
  when /\Aq(?:uit)?\Z/i, nil  # exit program
    break
  end
end

I know that looks like a lot of code, but it's mostly interface. It also shows much of the common list interactions in just three methods.

You can see that adding to the list was a simple matter of calling push_tail(). Similar to how incr()/decr() initialize counters to 0, list actions will default to an empty list if the key is undefined when they are triggered. You will get an error if you use the operations on keys that have already been set to non-list values though, so be careful with that.

When I was ready to read the list back, I paginated through the results using list_length() and list_range(). You pass a starting and ending index to list_range() and negative indices can be used to count backwards from the end just as Ruby allows. Redis doesn't allow you to read a full list with a simple key lookup ([] or get()), so use list_range(…, 0, -1) instead.

Here is what this example looks like in practice after I've entered a few comments:

My Favorite Language:
  I love Ruby!
----
  First!
  posted by JEG2 on 09/07/2009 at 15:35:11
----
  Yeah, we know.
  posted by JEG2 on 09/07/2009 at 15:35:29

Command? [(c)omment, (q)uit, (n)ext]

Then if I type an n followed by a return:

My Favorite Language:
  I love Ruby!
----
  ruby { |love| love + 1 }
  posted by JEG2 on 09/07/2009 at 15:35:52
----
  Who doesn't?
  posted by JEG2 on 09/07/2009 at 15:36:20

Command? [(c)omment, (q)uit, (p)revious, (n)ext]

You get the idea.

Let's go back to how I added items to the list for just a moment though.

As the name push_tail() might lead you to guess, there are three related methods: pop_tail(), push_head(), pop_head(). The push_head() method will add items to the beginning of the list, instead of the end as I did above. Then pop_head() and pop_tail() can be used to remove entries from either end. This means that lists can function as stacks and queues, just as Ruby's Array does.

I believe using Redis lists as queues will be another popular setup:

#!/usr/bin/env ruby -wKU

require "redis"

WORKER_COUNT = 3

# spawn some workers
WORKER_COUNT.times do
  fork do
    db = Redis.new
    loop do
      next  unless work = db.pop_head(:work)
      break if     work == "QUIT"
      puts "Work from #{Process.pid}: #{work} = #{eval work}"
    end
  end
end

# generate some work
db = Redis.new
10.times do
  db.push_tail( :work,
                "#{rand(10)} #{%w[+ - * / %][rand(5)]} #{rand(9) + 1}" )
end

# finish off all processes
WORKER_COUNT.times do
  db.push_tail(:work, "QUIT")
end
Process.waitall

When I run that, we can see the workers pulling off their assignments and taking care of the work:

Work from 1478: 9 + 9 = 18
Work from 1479: 0 / 7 = 0
Work from 1478: 1 % 3 = 1
Work from 1479: 2 % 4 = 2
Work from 1479: 4 * 7 = 28
Work from 1478: 1 / 9 = 0
Work from 1479: 1 + 1 = 2
Work from 1479: 3 * 4 = 12
Work from 1478: 6 - 9 = -3
Work from 1480: 0 / 1 = 0

Adding and removing work are both atomic actions. Once a work has a job, it's guaranteed other workers won't receive it.

You aren't limited to a single queue either, of course. Each key can hold a queue, so you can divide jobs up by priority, time required, or anything else that makes sense.

I know of at least one big site using this kind of setup to process their background jobs. They switched to this approach because just saving the jobs to a relational database was too much of a time penalty for their needs.

You can edit a list by index using list_set(), but unlike in Ruby, you will get an error if you try to set an index that doesn't already exist. Use list_index() to read a single value at an index.

If you want to use lists as a circular buffer, say for storing recent log entries or notices without allowing the data to grow infinitely, the list_trim() method is what you need. It takes identical arguments to list_range(), but instead of returning the indicated keys it shaves the list down to include only those keys. Thus, if you want to keep a list at no more than 100 entries, you can use code like:

db.push_tail(:list, "whatever")
db.list_trim(:list, -100, -1)

If you need to delete items out of the middle of a list, use the list_rm() method. It takes the key, a count of how many matching values to remove, and the value to match against. A count of 0 will remove all matching values from within the list and negative counts will start removing at the tail of the list, moving toward the head.

Finally, you may want to know about the rename() method, in case you need to atomically replace an entire list. You can build up the new list under a separate key an then just use rename() to replace the old key. Be careful with rename() though because it expects the old key name followed by the new key name, which is backwards from how we usually do things in Ruby. If you need a rename() that won't destroy old data, there's also a rename_unless_exists() method.

Sets

Redis has one more type of collection: sets. Sets are probably even simpler to work with operation wise, but are a large source in the power of Redis over normal key-value stores.

To Redis a set is an unordered collection with no duplicate members. If you add an item to a set more than once, it will still just be listed one time.

Let's begin by looking at some basic set operations:

#!/usr/bin/env ruby -wKU

require "redis"

db = Redis.new

db.set_member?(:nums, "one")  # => false
db.set_add(:nums, "one")
db.set_member?(:nums, "one")  # => 1
db.set_delete(:nums, "one")
db.set_member?(:nums, "one")  # => false

50.times do
  db.set_add(:nums, "one")
end
db.set_add(:nums, "two")
db.set_count(:nums)    # => 2
db.set_members(:nums)  # => ["two", "one"]

db.spop(:nums)  # => "two"
db.spop(:nums)  # => "one"
db.spop(:nums)  # => nil

db.set_add(:work, "write about sets")
db.set_add(:work, "write about sorting")
db.set_members(:work)  # => ["write about sorting", "write about sets"]
db.set_move(:work, :finished, "write about sets")
db.set_members(:work)      # => ["write about sorting"]
db.set_members(:finished)  # => ["write about sets"]

I assume each of those examples is pretty easy to follow. The first chunk of code shows adding to and deleting from a set, plus testing membership. After that we see the rules I mentioned earlier: no duplicates and unordered. Next we see the spop() method which is just a random member remover function. The final chunk of code shows how you can atomically move members from one set to another. Those are the basics of sets.

The real power of sets comes from how you can use them to build simple queries in Redis. That's possible through the support of some atomic set operations: intersect(ion), union, and diff(erence). Here are some examples:

#!/usr/bin/env ruby -wKU

require "redis"

db = Redis.new

db.set_add(:odd,            "one")
db.set_add(:even,           "two")
db.set_add(:odd,            "three")
db.set_add(:divisible_by_3, "three")
db.set_add(:even,           "four")
db.set_add(:divisible_by_4, "four")
db.set_add(:odd,            "five")
db.set_add(:even,           "six")
db.set_add(:divisible_by_3, "six")
db.set_add(:odd,            "seven")
db.set_add(:even,           "eight")
db.set_add(:divisible_by_4, "eight")
db.set_add(:odd,            "nine")
db.set_add(:divisible_by_3, "nine")
db.set_add(:even,           "ten")
db.set_add(:odd,            "eleven")
db.set_add(:even,           "twelve")
db.set_add(:divisible_by_3, "twelve")
db.set_add(:divisible_by_4, "twelve")

db.set_intersect(:even, :divisible_by_3)                   # => ["six",
                                                           #     "twelve"]
db.set_intersect(:even, :divisible_by_3, :divisible_by_4)  # => ["twelve"]

db.set_diff(:divisible_by_3, :even)                   # => ["three", "nine"]
db.set_diff(:even, :divisible_by_3)                   # => ["four", "ten",
                                                      #     "eight", "two"]
db.set_diff(:even, :divisible_by_3, :divisible_by_4)  # => ["ten", "two"]

db.set_union(:divisible_by_3, :divisible_by_4)  # => ["four", "six",
                                                #     "twelve", "three",
                                                #     "eight", "nine"]

db.set_union_store(:all, :even, :odd)
db.set_members(:all)  # => ["four", "eleven", "seven", "eight", "one",
                      #     "six", "ten", "twelve", "three", "five", "nine",
                      #     "two"]

Again, I assume the results of these examples are pretty straight forward. set_intersect() returns the members in all listed sets, set_diff() subtracts the members of each successive set listed from the first set, and set_union() returns all members present in any of the listed sets. Note that order is significant in set_diff() due to the way it is defined. Finally, I showed a variant that stores the results instead of returning the entire set. Though I only showed set_union_store(), set_intersect_store() and set_diff_store() do exist.

A typical usage for these methods is to store unique identifiers of records in the system by various categorical breakdowns. You can then use the set operations to find simple query results of those present in more than one category, in one category but not others, or those present in any category. This may be able to save you a trip to a more powerful but expensive tool, like SQL, for some queries.

To give an example of building queries with sets, let's look at a simple program. This code will load some first and last names from the U.S Census Bureau into a Redis database. We will then allow the user to make queries by name position (first or last), gender, popularity, and any leading prefix.

To keep the code simple, we're going to adopt a pretty crude definition of popularity. The files are already in order of rank, so we will just call the first third of the file popular, the second third average, and the rest uncommon.

We're also going to need to be a bit clever to handle prefix searches with just sets. Redis doesn't really have text search features, so we will need build an index we can use. The idea is simple: given the name "James" we will add it to the sets "prefix:j", "prefix:ja", "prefix:jam", "prefix:jame", and "prefix:james". Later, when we are answering queries, we can just add the set for the input the user gave us into our set intersection.

Here's the code:

#!/usr/bin/env ruby -wKU

require "abbrev"

require "redis"

# prepare the database
db = Redis.new
if db.dbsize.zero?
  puts "Loading names into the database..."
  started, count = Time.now, 0
  # load names
  %w[all.last female.first male.first].each do |group|
    gender, position = group.split(".")
    category         = ([position, gender] - %w[all]).join(":")
    top_third        = File.size("dist.#{group}.txt") / 3
    File.open("dist.#{group}.txt") do |names|
      names.each do |given|
        name = given[/\w+/]
        db.pipelined do |commands|
          commands.set_add("position:#{position}", name)
          commands.set_add("gender:#{gender}",     name) \
            unless gender == "all"
          Abbrev.abbrev(name.downcase).keys.each do |prefix|
            commands.set_add("prefix:#{prefix}", name)
          end
          popularity = if    names.pos < top_third     then "popular"
                       elsif names.pos < top_third * 2 then "average"
                       else                                 "uncommon"
                       end
          commands.set_add("popularity:#{category}:#{popularity}", name)
        end
        count += 1
      end
    end
  end
  # save and report
  db.save
  puts "Loaded #{count} names in #{Time.now - started} seconds."
  puts
end

# perform queries
position   = %w[first last]
gender     = %w[male female]
popularity = %w[popular average uncommon]
# UI for entering queries
def ask(choices)
  print [ choices[0..-2].join(", "),
          choices[-1] ].join(", or ").capitalize + "?  "
  choice = gets.to_s.strip
  choice.empty? ? nil : Abbrev.abbrev(choices)[choice]
end
query = [ ]
if choice = ask(position)
  query << "position:#{choice}"
end
unless choice == "last"
  if choice = ask(gender)
    query << "gender:#{choice}"
  end
end
if choice = ask(popularity)
  query << "popularity:#{query.map { |f| f[/\w+\z/] }.join(':')}:#{choice}"
end
print "Prefix?  "
prefix = gets.to_s.strip
unless prefix.empty?
  query << "prefix:#{prefix.downcase}"
end
puts
# execute query and show stats
puts "Running query..."
width = query.map { |set| set.size }.max
query.each do |set|
  puts "  %#{width}s:  #{db.set_count(set)} names" % set
end
started = Time.now
puts
puts( query.empty? ? db.set_union("position:first", "position:last") :
                     db.set_intersect(*query) )
puts
puts "#{Time.now - started} seconds."

Using that code, we can load the database and search for names like mine:

$ ruby names.rb
Loading names into the database...
Loaded 94293 names in 58.25652 seconds.

First, or last?  f
Male, or female?  m
Popular, average, or uncommon?  p
Prefix?  Ja

Running query...
                 position:first:  5163 names
                    gender:male:  1219 names
  popularity:first:male:popular:  406 names
                      prefix:ja:  501 names

JARED
JACKIE
JAVIER
JAY
JAIME
JAMES
JACK
JAMIE
JACOB
JASON

0.003345 seconds.

As you can see, a little setup work really pays us back. Redis can chew through those multi-set queries in no time at all.

Since we've now seen all of the Redis value types, it's worth a quick mention that you can ask the database for the type() of a given key. The response will be one of: "none" (key not set), "string", "list", or "set".

Sorting

As soon as you have collections of data, you will want tools for controlling the order of those collections. This is especially important with things like set that don't have an order until you define one.

Redis has a single method for this, called sort(), but it's by far the most complex and powerful method in the entire API. Let's examine what it can do piece by piece. First, here is a basic sort:

#!/usr/bin/env ruby -wKU

require "redis"

db = Redis.new

db.set_add(:nums, "1")
db.set_add(:nums, "2")
db.set_add(:nums, "10")
db.set_add(:nums, "11")

db.sort(:nums)  # => ["1", "2", "10", "11"]

That's easy enough to understand. I built a set (note that sort() works with lists too) and asked Redis to sort the members. It did.

Here's a question though: did the order surprise you? Values are generally considered Strings in Redis, but those numbers weren't sorted as Strings. This is another one of those sometimes numeric meaning exceptions I've mentioned before. Redis goes one step further with sort() though and even recognizes Floats:

#!/usr/bin/env ruby -wKU

require "redis"

db = Redis.new

db.set_add(:floats, "3")
db.set_add(:floats, "3.02")
db.set_add(:floats, "3.14")
db.set_add(:floats, "3.2")
db.set_add(:floats, "4")

db.sort(:floats)  # => ["3", "3.02", "3.14", "3.2", "4"]

Now, if you want a String ordering, you can ask for it. You can also reverse either order. Here's how those options work in our original example:

#!/usr/bin/env ruby -wKU

require "redis"

db = Redis.new

db.set_add(:nums, "1")
db.set_add(:nums, "2")
db.set_add(:nums, "10")
db.set_add(:nums, "11")

db.sort(:nums, :order => "ALPHA")       # => ["1", "10", "11", "2"]
db.sort(:nums, :order => "ALPHA DESC")  # => ["2", "11", "10", "1"]

db.sort(:nums, :order => "DESC")  # => ["11", "10", "2", "1"]

The sort() method also supports a limit with offset that can be used to fetch a subset of entries and also handle pagination:

#!/usr/bin/env ruby -wKU

require "redis"

db = Redis.new

db.set_add(:nums, "1")
db.set_add(:nums, "2")
db.set_add(:nums, "10")
db.set_add(:nums, "11")

db.sort(:nums, :limit => [0, 3])  # => ["1", "2", "10"]
db.sort(:nums, :limit => [3, 3])  # => ["11"]

There are two more options you can set with sort(). The first option allows you to use a key lookup to find the actual value to order by. Thus, if you had a set of ID's, but actually wanted to sort() by an attribute associated with those ID's, you could look up the attribute. Let me show you what I mean:

#!/usr/bin/env ruby -wKU

require "redis"

db = Redis.new

%w[one two three four].each do |num|
  id = db.incr("global:num_id")
  db["num:#{id}:word"]      = num
  db["num:#{id}:word_size"] = num.size
  db.set_add(:nums, id)
end

db.sort(:nums, :by => "num:*:word_size")  # => ["1", "2", "4", "3"]

I made the ID's match up with the numbers here just to make the example easy to follow, but notice how those ID's where actually orders by their "num:ID:word_size" key. As you can see, sort() replaced the * in my key name with the actual member of the set, which was the ID in this case.

The final feature lets us take that one step further. Just as we can fetch a key for the order, we can also fetch a key for the result. That allows us to return not just the ID, but the actual data. For example:

#!/usr/bin/env ruby -wKU

require "redis"

db = Redis.new

%w[one two three four].each do |num|
  id = db.incr("global:num_id")
  db["num:#{id}:word"]      = num
  db["num:#{id}:word_size"] = num.size
  db.set_add(:nums, id)
end

db.sort( :nums, :by  => "num:*:word_size",
                :get => "num:*:word" )  # => ["one", "two", "four", "three"]

So the sort() fetched the word_size keys to build the ordering and then fetched the word keys as the actual result. If you want, you can even fetch multiple keys for each result:

#!/usr/bin/env ruby -wKU

require "redis"

db = Redis.new

%w[one two three four].each do |num|
  id = db.incr("global:num_id")
  db["num:#{id}:word"]      = num
  db["num:#{id}:word_size"] = num.size
  db.set_add(:nums, id)
end

db.sort( :nums, :by  => "num:*:word_size",
                :get => %w[ num:*:word
                            num:*:word_size ] )  # => ["one", "3",
                                                 #     "two", "3",
                                                 #     "four", "4",
                                                 #     "three", "5"]

sort() is the power tool of collection ordering and fetching. I've shown most of the options separately to make things easier to understand, but you can combine them as needed to order, limit, and even lookup your data.

The collections types add quite a bit of flexibility to simple key-value storage. You can use these tools to group keys and process them by various criteria. This makes Redis useable in a wider range of scenarios than some simple key-value stores.

Comments (4)
  1. Keith Gautreaux
    Keith Gautreaux September 16th, 2009 Reply Link

    I am enjoying this series of posts. But, am I confused or are your comments for the command case statement for (p)revious and (n)ext reversed?

    1. Reply (using GitHub Flavored Markdown)

      Comments on this blog are moderated. Spam is removed, formatting is fixed, and there's a zero tolerance policy on intolerance.

      Ajax loader
    2. James Edward Gray II
      James Edward Gray II September 16th, 2009 Reply Link

      You are right that they were. All fixed. Thanks!

      1. Reply (using GitHub Flavored Markdown)

        Comments on this blog are moderated. Spam is removed, formatting is fixed, and there's a zero tolerance policy on intolerance.

        Ajax loader
  2. WK
    WK March 28th, 2011 Reply Link

    As not Ruby user I found Redis gem methods cleaner than DSL itself. Despite it, it seems that nowadays Redis gem uses same syntax as DSL has. So it was bit confusing to me not getting your code to work as prompt. Like I said, it was second or third time to see Ruby code to me.

    If I am right and there is some change in gem API, maybe could you add some warning in the beginning of your enlightening posts. I am really grateful for your shared knowledge!

    1. Reply (using GitHub Flavored Markdown)

      Comments on this blog are moderated. Spam is removed, formatting is fixed, and there's a zero tolerance policy on intolerance.

      Ajax loader
    2. James Edward Gray II
      James Edward Gray II March 28th, 2011 Reply Link

      Yeah, these are definitely old. I hope to redo them someday to improve the information and show off some new features.

      1. Reply (using GitHub Flavored Markdown)

        Comments on this blog are moderated. Spam is removed, formatting is fixed, and there's a zero tolerance policy on intolerance.

        Ajax loader
Leave a Comment (using GitHub Flavored Markdown)

Comments on this blog are moderated. Spam is removed, formatting is fixed, and there's a zero tolerance policy on intolerance.

Ajax loader