FasterCSV

Posts tagged with "FasterCSV."
  • 13

    OCT
    2008

    The Secret Shell Helper

    Someone pops onto the Ruby Talk mailing list fairly regularly asking how to break up content like:

    one "two" "a longer three"
    

    They expect to end with a three element Array, where the third item will contain spaces. They generally expect the quotes will have been removed as well.

    If your needs are very, very simple you may be able to handle this with a regular expression:

    data = 'one "two" "a longer three"'
    p data.scan(/"([^"]*)"|(\S+)/).flatten.compact
    # >> ["one", "two", "a longer three"]
    

    That just searches for either a set of quotes with some non-quote characters between them or a run of non-whitespace characters. Those are the two possibilities for the fields. Note that the two separate capture here mean scan() will returns contents in the form:

    [[nil, "one"], ["two", nil], ["a longer three", nil]]
    

    That's why I added a flatten() and compact() to get down to the actual matches.

    The regular expression approach can get pretty complex though if any kind of escaping for quotes is involved. When that happens, you may need to step up to a parser.

    Read more…

  • 10

    APR
    2008

    Five ActiveRecord Tips

    This article was written for the Railcasts 100th Episode Contest. I think the idea is great and I look forward to reading great tips from all who decide to participate.

    1. create_or_find_by_…

    I imagine most of you know that ActiveRecord can handle finders like:

    MyARClass.find_or_create_by_name(some_name)
    

    This will attempt to find the object that has some_name in its name field or, if the find fails, a new object will be created with that name. It's important to note that the order is exactly as I just listed it: find then create. Here are the relevant lines from the current Rails source showing the process:

    record = find_initial(options)
    
    if record.nil?
      record = self.new { |r| r.send(:attributes=, attributes, guard_protected_attributes) }
      #{'yield(record) if block_given?'}
      #{'record.save' if instantiator == :create}
      record
    else
      record
    end
    

    The above code is inside a String literal fed to class_eval(), which is why you see interpolation being used.

    Unfortunately, this process is subject to race conditions because the object could be created by another process (or Thread) between the find and the creation. If that happens, you are likely to run into another hardship in that calls to create() fail quietly (returning the unsaved object). These are some pretty rare happenings for sure, but they can be avoided under certain conditions.

    Read more…

  • 2

    JAN
    2008

    Getting FasterCSV Ready for Ruby 1.9

    The call came down from on high just before the Ruby 1.9 release: replace the standard csv.rb library with faster_csv.rb. With only hours to make the change it was a little harder than I expected. The FasterCSV code base was pretty vanilla Ruby, but it required more work than I would have guessed to get running on Ruby 1.9. Let me share a few of the tips I learned while doctoring the code in the hope that it will help others get their code ready for Ruby 1.9.

    Ruby's String Class Grows Up

    One of the biggest changes in Ruby 1.9 is the addition of m17n (multilingualization). This means that Ruby's Strings are now encoding aware and we must clarify in our code if we are working with bytes, characters, or lines.

    This is a good change, but the odds are that most of us have lazily used the old way to our advantage in the past. If you've ever written code like:

    lines = str.to_a
    

    you have bad habits to break. I sure did. Under Ruby 1.9 that code would translate to:

    lines = str.lines.to_a
    

    Read more…

  • 16

    APR
    2007

    No Longer the Fastest Game in Town

    If your number one concern when working with CSV data in Ruby is raw speed, you might want to know that FasterCSV is no longer the fastest option.

    There are a couple of new contenders for Ruby CSV processing including a C extension called SimpleCSV and a pure Ruby library called LightCsv. I haven't been able to test SimpleCSV locally, because I can't get it to build on my box, but users do tell me it's faster. I have run some trivial benchmarks for LightCsv though and it too is pretty quick:

    $ rake benchmark
    (in /Users/james/Documents/faster_csv)
    time ruby -r csv -e '6.times { CSV.foreach("test/test_data.csv") { |row| } }'
    
    real    0m5.481s
    user    0m5.468s
    sys     0m0.010s
    time ruby -r lightcsv -e \
    '6.times { LightCsv.foreach("test/test_data.csv") { |row| } }'
    
    real    0m0.358s
    user    0m0.349s
    sys     0m0.008s
    time ruby -r lib/faster_csv -e \
    '6.times { FasterCSV.foreach("test/test_data.csv") { |row| } }'
    
    real    0m0.742s
    user    0m0.732s
    sys     0m0.009s
    

    It's important to note that LightCsv is indeed very "light." FasterCSV has grown up into a feature rich library that provides many different ways to look at your data. In contrast, LightCsv doesn't yet allow you to set column or row separators. Given that, it's only an option for vanilla CSV you just need to iterate over. If that's what you have though, and speed counts, it might just be the right choice.

    Read more…