My Projects

The projects that I have spent significant time on.

16

APR
2007

No Longer the Fastest Game in Town

If your number one concern when working with CSV data in Ruby is raw speed, you might want to know that FasterCSV is no longer the fastest option.

There are a couple of new contenders for Ruby CSV processing including a C extension called SimpleCSV and a pure Ruby library called LightCsv. I haven't been able to test SimpleCSV locally, because I can't get it to build on my box, but users do tell me it's faster. I have run some trivial benchmarks for LightCsv though and it too is pretty quick:

$ rake benchmark
(in /Users/james/Documents/faster_csv)
time ruby -r csv -e '6.times { CSV.foreach("test/test_data.csv") { |row| } }'

real    0m5.481s
user    0m5.468s
sys     0m0.010s
time ruby -r lightcsv -e \
'6.times { LightCsv.foreach("test/test_data.csv") { |row| } }'

real    0m0.358s
user    0m0.349s
sys     0m0.008s
time ruby -r lib/faster_csv -e \
'6.times { FasterCSV.foreach("test/test_data.csv") { |row| } }'

real    0m0.742s
user    0m0.732s
sys     0m0.009s

It's important to note that LightCsv is indeed very "light." FasterCSV has grown up into a feature rich library that provides many different ways to look at your data. In contrast, LightCsv doesn't yet allow you to set column or row separators. Given that, it's only an option for vanilla CSV you just need to iterate over. If that's what you have though, and speed counts, it might just be the right choice.

For the curious, LightCsv achieves its speed advantage in two ways. First, it uses StringScanner to manage the parsing. StringScanner is a C extension, though it is a standard library installed with Ruby.

More importantly, I suspect, LightCsv uses an input buffer for reading while FasterCSV works line by line. I suspect this second difference accounts for the majority of the speed increase since the buffered code will hit the hard drive quite a bit less for the average CSV file. This does require more memory though, of course.

Aside from these differences, FasterCSV and LightCsv have very similar parsers.

Comments (2)
  1. tommy
    tommy April 18th, 2007 Reply Link

    LightCsv do not use StringIO.
    It use StringScanner.

    1. Reply (using GitHub Flavored Markdown)

      Comments on this blog are moderated. Spam is removed, formatting is fixed, and there's a zero tolerance policy on intolerance.

      Ajax loader
    2. James Edward Gray II
      James Edward Gray II April 18th, 2007 Reply Link

      Oops. Good catch. I have corrected the article.

      1. Reply (using GitHub Flavored Markdown)

        Comments on this blog are moderated. Spam is removed, formatting is fixed, and there's a zero tolerance policy on intolerance.

        Ajax loader
Leave a Comment (using GitHub Flavored Markdown)

Comments on this blog are moderated. Spam is removed, formatting is fixed, and there's a zero tolerance policy on intolerance.

Ajax loader