Email: james@graysoftinc.com

The Ruby Rogues Podcast

FasterCSV

Posts tagged with "FasterCSV."

13
OCT
2008
The Secret Shell Helper

Someone pops onto the Ruby Talk mailing list fairly regularly asking how to break up content like:
```
one "two" "a longer three"
```
They expect to end with a three element Array, where the third item will contain spaces. They generally expect the quotes will have been removed as well.

If your needs are very, very simple you may be able to handle this with a regular expression:
```
data = 'one "two" "a longer three"'
p data.scan(/"([^"]*)"|(\S+)/).flatten.compact
# >> ["one", "two", "a longer three"]
```
That just searches for either a set of quotes with some non-quote characters between them or a run of non-whitespace characters. Those are the two possibilities for the fields. Note that the two separate capture here mean scan() will returns contents in the form:
```
[[nil, "one"], ["two", nil], ["a longer three", nil]]
```
That's why I added a flatten() and compact() to get down to the actual matches.

The regular expression approach can get pretty complex though if any kind of escaping for quotes is involved. When that happens, you may need to step up to a parser.
Read more…
In: The Standard Library | Tags: FasterCSV, Hidden Features & Unix Shells | 1 Comment
10
APR
2008
Five ActiveRecord Tips

This article was written for the Railcasts 100th Episode Contest. I think the idea is great and I look forward to reading great tips from all who decide to participate.

1. create_or_find_by_…

I imagine most of you know that ActiveRecord can handle finders like:
```
MyARClass.find_or_create_by_name(some_name)
```
This will attempt to find the object that has some_name in its name field or, if the find fails, a new object will be created with that name. It's important to note that the order is exactly as I just listed it: find then create. Here are the relevant lines from the current Rails source showing the process:
```
record = find_initial(options)

if record.nil?
  record = self.new { |r| r.send(:attributes=, attributes, guard_protected_attributes) }
  #{'yield(record) if block_given?'}
  #{'record.save' if instantiator == :create}
  record
else
  record
end
```
The above code is inside a String literal fed to class_eval(), which is why you see interpolation being used.

Unfortunately, this process is subject to race conditions because the object could be created by another process (or Thread) between the find and the creation. If that happens, you are likely to run into another hardship in that calls to create() fail quietly (returning the unsaved object). These are some pretty rare happenings for sure, but they can be avoided under certain conditions.
Read more…
In: Rails | Tags: ActiveRecord & FasterCSV | 11 Comments
2
JAN
2008
Getting FasterCSV Ready for Ruby 1.9

The call came down from on high just before the Ruby 1.9 release: replace the standard csv.rb library with faster_csv.rb. With only hours to make the change it was a little harder than I expected. The FasterCSV code base was pretty vanilla Ruby, but it required more work than I would have guessed to get running on Ruby 1.9. Let me share a few of the tips I learned while doctoring the code in the hope that it will help others get their code ready for Ruby 1.9.

Ruby's String Class Grows Up

One of the biggest changes in Ruby 1.9 is the addition of m17n (multilingualization). This means that Ruby's Strings are now encoding aware and we must clarify in our code if we are working with bytes, characters, or lines.

This is a good change, but the odds are that most of us have lazily used the old way to our advantage in the past. If you've ever written code like:
```
lines = str.to_a
```
you have bad habits to break. I sure did. Under Ruby 1.9 that code would translate to:
```
lines = str.lines.to_a
```
Read more…
In: My Projects | Tags: FasterCSV, Iterators & Multilingualization | 33 Comments
16
APR
2007
No Longer the Fastest Game in Town

If your number one concern when working with CSV data in Ruby is raw speed, you might want to know that FasterCSV is no longer the fastest option.

There are a couple of new contenders for Ruby CSV processing including a C extension called SimpleCSV and a pure Ruby library called LightCsv. I haven't been able to test SimpleCSV locally, because I can't get it to build on my box, but users do tell me it's faster. I have run some trivial benchmarks for LightCsv though and it too is pretty quick:
```
$ rake benchmark
(in /Users/james/Documents/faster_csv)
time ruby -r csv -e '6.times { CSV.foreach("test/test_data.csv") { |row| } }'

real    0m5.481s
user    0m5.468s
sys     0m0.010s
time ruby -r lightcsv -e \
'6.times { LightCsv.foreach("test/test_data.csv") { |row| } }'

real    0m0.358s
user    0m0.349s
sys     0m0.008s
time ruby -r lib/faster_csv -e \
'6.times { FasterCSV.foreach("test/test_data.csv") { |row| } }'

real    0m0.742s
user    0m0.732s
sys     0m0.009s
```
It's important to note that LightCsv is indeed very "light." FasterCSV has grown up into a feature rich library that provides many different ways to look at your data. In contrast, LightCsv doesn't yet allow you to set column or row separators. Given that, it's only an option for vanilla CSV you just need to iterate over. If that's what you have though, and speed counts, it might just be the right choice.
Read more…
In: My Projects | Tags: FasterCSV & Performance | 2 Comments