-
13
OCT
2008The Secret Shell Helper
Someone pops onto the Ruby Talk mailing list fairly regularly asking how to break up content like:
one "two" "a longer three"
They expect to end with a three element
Array
, where the third item will contain spaces. They generally expect the quotes will have been removed as well.If your needs are very, very simple you may be able to handle this with a regular expression:
data = 'one "two" "a longer three"' p data.scan(/"([^"]*)"|(\S+)/).flatten.compact # >> ["one", "two", "a longer three"]
That just searches for either a set of quotes with some non-quote characters between them or a run of non-whitespace characters. Those are the two possibilities for the fields. Note that the two separate capture here mean
scan()
will returns contents in the form:[[nil, "one"], ["two", nil], ["a longer three", nil]]
That's why I added a
flatten()
andcompact()
to get down to the actual matches.The regular expression approach can get pretty complex though if any kind of escaping for quotes is involved. When that happens, you may need to step up to a parser.
-
10
APR
2008Five ActiveRecord Tips
This article was written for the Railcasts 100th Episode Contest. I think the idea is great and I look forward to reading great tips from all who decide to participate.
1.
create_or_find_by_…
I imagine most of you know that
ActiveRecord
can handle finders like:MyARClass.find_or_create_by_name(some_name)
This will attempt to find the object that has
some_name
in itsname
field or, if the find fails, a new object will be created with thatname
. It's important to note that the order is exactly as I just listed it: find then create. Here are the relevant lines from the current Rails source showing the process:record = find_initial(options) if record.nil? record = self.new { |r| r.send(:attributes=, attributes, guard_protected_attributes) } #{'yield(record) if block_given?'} #{'record.save' if instantiator == :create} record else record end
The above code is inside a
String
literal fed toclass_eval()
, which is why you see interpolation being used.Unfortunately, this process is subject to race conditions because the object could be created by another process (or
Thread
) between the find and the creation. If that happens, you are likely to run into another hardship in that calls tocreate()
fail quietly (returning the unsaved object). These are some pretty rare happenings for sure, but they can be avoided under certain conditions. -
2
JAN
2008Getting FasterCSV Ready for Ruby 1.9
The call came down from on high just before the Ruby 1.9 release: replace the standard
csv.rb
library withfaster_csv.rb
. With only hours to make the change it was a little harder than I expected. TheFasterCSV
code base was pretty vanilla Ruby, but it required more work than I would have guessed to get running on Ruby 1.9. Let me share a few of the tips I learned while doctoring the code in the hope that it will help others get their code ready for Ruby 1.9.Ruby's
String
Class Grows UpOne of the biggest changes in Ruby 1.9 is the addition of m17n (multilingualization). This means that Ruby's Strings are now encoding aware and we must clarify in our code if we are working with bytes, characters, or lines.
This is a good change, but the odds are that most of us have lazily used the old way to our advantage in the past. If you've ever written code like:
lines = str.to_a
you have bad habits to break. I sure did. Under Ruby 1.9 that code would translate to:
lines = str.lines.to_a
-
16
APR
2007No Longer the Fastest Game in Town
If your number one concern when working with CSV data in Ruby is raw speed, you might want to know that FasterCSV is no longer the fastest option.
There are a couple of new contenders for Ruby CSV processing including a C extension called SimpleCSV and a pure Ruby library called LightCsv. I haven't been able to test
SimpleCSV
locally, because I can't get it to build on my box, but users do tell me it's faster. I have run some trivial benchmarks forLightCsv
though and it too is pretty quick:$ rake benchmark (in /Users/james/Documents/faster_csv) time ruby -r csv -e '6.times { CSV.foreach("test/test_data.csv") { |row| } }' real 0m5.481s user 0m5.468s sys 0m0.010s time ruby -r lightcsv -e \ '6.times { LightCsv.foreach("test/test_data.csv") { |row| } }' real 0m0.358s user 0m0.349s sys 0m0.008s time ruby -r lib/faster_csv -e \ '6.times { FasterCSV.foreach("test/test_data.csv") { |row| } }' real 0m0.742s user 0m0.732s sys 0m0.009s
It's important to note that
LightCsv
is indeed very "light."FasterCSV
has grown up into a feature rich library that provides many different ways to look at your data. In contrast,LightCsv
doesn't yet allow you to set column or row separators. Given that, it's only an option for vanilla CSV you just need to iterate over. If that's what you have though, and speed counts, it might just be the right choice.