2
JAN2008
Getting FasterCSV Ready for Ruby 1.9
The call came down from on high just before the Ruby 1.9 release: replace the standard csv.rb
library with faster_csv.rb
. With only hours to make the change it was a little harder than I expected. The FasterCSV
code base was pretty vanilla Ruby, but it required more work than I would have guessed to get running on Ruby 1.9. Let me share a few of the tips I learned while doctoring the code in the hope that it will help others get their code ready for Ruby 1.9.
Ruby's String
Class Grows Up
One of the biggest changes in Ruby 1.9 is the addition of m17n (multilingualization). This means that Ruby's Strings are now encoding aware and we must clarify in our code if we are working with bytes, characters, or lines.
This is a good change, but the odds are that most of us have lazily used the old way to our advantage in the past. If you've ever written code like:
lines = str.to_a
you have bad habits to break. I sure did. Under Ruby 1.9 that code would translate to:
lines = str.lines.to_a
String#lines()
returns an Enumerable::Enumerator
by default (more on that shortly), so you need to add the to_a()
call unless you are going to follow-up with other iteration methods.
Now, if you need the code to run on both 1.8 and 1.9, you will need one more trick. First, if you just need to iterate over the lines you can use String#each_line()
which is present in both versions. For less basic iterations, I recommend:
lines = str.send(str.respond_to?(:lines) ? :lines : :to_s).to_a
Here I just call String#lines()
if it is available and a no-op String#to_s()
when it's not. You can safely follow that with any Enumerable
method and it will work in Ruby 1.8 and Ruby 1.9.
Enumerable#zip()
Took a Beating
[Update: Both of my complaints about zip()
were eventually addressed. The 1.8 behavior has been restored.]
If you were a fan of Enumerable#zip()
under Ruby 1.8, odds are good that it's going to surprise you under Ruby 1.9.
First, the standard Enumerable::Enumerator
library has been moved into the core as we already saw with String#lines()
. With this move the core iteration methods have been enhanced to return an Enumerable::Enumerator
, if called without a block. This is generally a nice iterator chaining feature. For example, making the fictional but oft-requested map_with_index()
is now as easy as:
enum.each_with_index.map { … }
Enumerable#zip()
may be the exception though. It already had a meaningful return value when called without a block. That has been overridden by the new behavior though, so you will now get an Enumerable::Enumerator
when you probably expected an Array
. I've found that I now need to type the following to get what I usually want:
enum.zip(other_enum).to_a
It's hard to see that as an improvement, but the fact is that it gets worse. For some reason I can't justify, another change was made to Enumerable#zip()
. Let's look at what happens with Enumerable
objects of different sizes under Ruby 1.8:
>> short = [1, 2]
=> [1, 2]
>> long = %w[one two three four]
=> ["one", "two", "three", "four"]
>> short.zip(long)
=> [[1, "one"], [2, "two"]]
>> long.zip(short)
=> [["one", 1], ["two", 2], ["three", nil], ["four", nil]]
Note that the size of the result set is based on the size of the Enumerable
that is used as the receiver for the Enumerable#zip()
call. This works out well in practice, because you can always find the longer count if you need to preserve all of the data. If you want the shorter results, you can lead with the smaller set or filter out the nil
objects. The choice is in your hands.
Unfortunately, Ruby 1.9 changes the rules:
>> short.zip(long).to_a
=> [[1, "one"], [2, "two"]]
>> long.zip(short).to_a
=> [["one", 1], ["two", 2]]
As you can see, the shortest Enumerable
now limits the results no matter where it occurs. The problem with this change is that it discards data and you have to go out of your way to save it. This new behavior is documented though, so I assume it's intentional.
What do you do if you want a safe 1.8 data preserving Enumerable#zip()
that works on 1.8 and 1.9? About the best I can come up with is:
require "enumerator"
zipped = long.enum_for(:each_with_index).
map { |e, i| [e, short.to_a[i]] }
Obviously, I'm open to better ideas.
FasterCSV
is the New CSV
I found the above incompatibilities by introducing a new one. FasterCSV
has replaced the standard CSV
class in the standard library. By replaced, I mean that it is now called CSV
. This will cause code that used the old library problems.
The methods provided on the CSV
object are similar, but the old CSV
code used positional parameters where as the new library uses a Hash
argument syntax (e.g., row_sep: "\r\n"
). That's going to trip up any non-trivial usage.
The new library is feature rich and fully documented, so I don't expect anyone to have trouble getting their code working under 1.9. The problem will be writing code that works on both versions. For that, I recommend using code like the following to determined which library you are working with:
require "csv"
if CSV.const_defined? :Reader
# use old CSV code here…
else
# use FasterCSV style code, but with CSV class, here…
end
Feel free to email me with any other CSV
compatibility questions.
This is Just a Start
The above is a short list of issues I've run into a couple of times now. Please feel free to add your own observations about Ruby 1.9 compatibility in the comments below. Let's do our best to make this post a generally useful resource for all.
Comments (33)
-
Sam Ruby January 2nd, 2008 Reply Link
Porting REXML to Ruby 1.9 overlaps slightly and covers some additional ground.
-
Ruby 1.9 introduces an incompatible syntax change for conditional statements such as
if
andcase
/when
. Previously a colon could be used as a shorthand for athen
statement; this is perhaps most useful with multiplewhen
statements on one line.The following is legitimate Ruby in 1.8:
case x when Regexp : puts 'a regex' when Hash : puts 'a regex' when Numeric : puts 'a number' when String : puts 'a string' end
But not in Ruby 1.9; now an explicit
then
statement must be used:case x when Regexp then puts 'a regex' ...
-
Just to be clear the
then
keyword was also supported in Ruby 1.8 so using it for conditionals is fine for both versions. -
But not in Ruby 1.9; now an explicit
then
statement must be usedOr you could just do what everyone else does, and put what happens
then
on a new line. Then, you won't needthen
, and your code is more consistent and readable.I prefer this implementation. Allowing same-line
then
with a colon was bad style, IMO - as is the use of colons as meaningful operators in general. -
If you still prefer the single character single line notation you can just substitute the colon with a semi-colon
1.9:
case sound when /bamf/i; puts 'Nightcrawler' when /boff/i; puts 'Batman' end
-
I thought the semi colon functioning as an alias for
then
was great, its easier to look read IMO-
It's actually the colon, not semicolon, that use to stand in for
then
. It was removed because it is being used in other ways, like the newHash
syntax.
-
-
-
The
:
for case statements was removed from Ruby syntax because;
works in all versions and does not require special syntax.case x when Hash ; puts 'a hash'
I prefer it to
then
.
-
-
A bunch of methods like
instance_variables
,constants
, etc… that used to return strings now return symbols.-
I found what Frederick said is especially important for the typical
BlankSlate
type of class. What was in 1.8:class BlankSlate instance_methods.each { |meth| undef_method(meth) unless meth =~ /\A__/ } ... end
becomes in 1.9 (for example):
class BlankSlate instance_methods.each { |meth| undef_method(meth) unless meth.to_s =~ /\A__/ } ... end
Other than that, I noticed that
Thread#critical
andThread#critical=
went away, but for those of us who want to explicitely schedule stuff,Fiber
s are nicer anyway.IO.getc
will return aString
thats one character long instead of the ASCII value of the character itself.1.8:
STDIN.getc a => 97
1.9:
irb(main):002:0> STDIN.getc a => "a"
This also breaks the excellent
HighLine
lib. hint hint-
Yes, I do need to get
HighLine
working under 1.9. I'll try to get to that before too long now. Thanks for reminding me. -
Francisco Laguna, since symbols now respond to
#=~
, I don't think that change is necessary, unless I am missing something. It should be noted, though, that for some reason Ruby 1.9 now emits a warning aboutundef
'ingobject_id
, so you may want to preserve it too.
-
-
-
Using each or map on a result from
Enumerable#zip
is also extremely slow in 1.9. I discovered this when an application took twice as long in 1.9 as in 1.8. The innermost loop had azip_with()
call (zip
->map
) which caused this.require 'benchmark' a = Array.new(25){rand} Benchmark.bmbm{|x| x.report("zip"){ 1_000_000.times { a.zip(a).map{} } } }
Results in 1.8.6
user system total real zip 16.170000 0.320000 16.490000 ( 16.576643)
Results in 1.9
user system total real zip 192.360000 1.430000 193.790000 (195.467429)
Using
to_a
gives the same results.And this is with the slow 1.8.6 ubuntu/enable-pthread version vs an -O3/no-pthread compiled 1.9. On most code the 1.9 version is about four times as fast as the 1.8.6 version.
-
For extension writers, ruby1.9 has, incorrectly in my opinion, deprecated Ruby's
version.h
file.This means it is not possible to know the ruby version easily and you now MUST write a Makefile of some sort to pass the proper defines or to check if your ruby supports some feature through some try-compile checks.
This probably ranks as one of the worst changes in ruby 1.9.
This obviously begs the question why this was done (as there's no benefit) and what should extension developers do if some function exists in both ruby1.8 and ruby1.9 but has different functionality (as some of the cases show here).
-
James I am just working on RQ#151 and I want my solution to be version agnostic, up to now the following was my idea:
Write the code in v1.9 and just require a file to upgrade 1.8 ruby just enough to run your code, such the require can go away one day, here is a very first shot:
class String unless instance_methods.include?( "to_char" ) then require "enumerator" def each_char &blk return enum_for(:each_byte).map{ |b| b.chr } unless blk enum_for(:each_byte).each do |b| blk.call b.chr end end end end
of course it would be much better to wrap the whole include
file into a version test, but that very test might be a tough one, the following is rather a bad example:begin "".to_a def to_char... ... end rescue nil end
Going for the Ruby version constant
if /^1\.8/ === RUBY_VERSION then ... end
is probably a sound decision after all.
What do you think?
Cheers
-
I would probable just do:
require "jcode" unless "".respond_to?(:each_char)
-
James
Now for the idea of saying
require "jcode" unless "".respond_to?(:each_char)
This is an approach I have seen first in Javascript for Browser Quirks but after some thoughts I believe that it is a bad idea for libraries. What if a
require
before ourrequire
just addedeach_char
toString
? And that is not exactly far fetched an idea either.For applications however it will work, unless you
require
third part libraries carelessly before the code above, this however can be debugged easily…For libraries there would be no way to debug or even fix it in a general manner.
One could of course argue that someone could tamper with
RUBY_VERSION
too, but well we still have to let people kill themselves if they insist, sigh!-
Well, I hope that any
each_char()
implementation would give me the expected one character at a time.My main point though was that I felt safer using the
each_char()
method that comes with Ruby 1.8 than building my own.-
I see we are talking about two different things.
- I wanted a guard against the ruby version for lots of definitions, not only
String#to_char
. - I cannot use Ruby's
String#to_char
because I need the 1.9 functionality of the returnedEnumerator
in case it is called without a block.
Maybe the idea to write version agnostic code was not really what you are after here, and you emphasis on 1.9, in that case I am a little bit OT, as usual...
-
Robert: I guess I am still a little confused about our discussion. You've mentioned both
String#to_char()
andString#each_char()
and your code checks for one but creates the other.String#to_char()
isn't a method I'm familiar with and I can't locate and documentation on it.Just FYI, I believe your code also has a bug in it. Checks like
instance_methods.include?("some_str")
don't work as expected in Ruby 1.9. Those method names are now returned asSymbol
objects soinclude?()
will fail to match theString
.You can load
jcode
andenumerator
and useenum_for(:each_char)
to get anEnumerable::Enumerator
in Ruby 1.8 or 1.9. I do now understand that we were discussing many methods instead of a specific example though, so that may not help.-
James
many apologies about such many typos, I was referring to
#each_char
only. Thanks for the hint withjcode
andinstance_methods.include?
. I missedjcode
's functionality.
-
- I wanted a guard against the ruby version for lots of definitions, not only
-
-
-
-
-
The
zip
problems appear to be fixed with the January 8th version, it's still ~50% slower than 1.8, but that's manageable.James, I saw you post about this on Ruby-CORE mailing list. Is this the way to go for posting bugs?
I'm asking this because I discovered what I think is a rather serious bug and posted a bug report on Rubyforge about three weeks ago, but there is no reply to the report or any of my follow-ups, other than the bug getting assigned to Matz.-
Yes, I was able to sway Matz and
Enumerable#zip()
has been "repaired."Opinions seem to differ on whether on not to use the bug tracker on Rubyforge or the Ruby Core mailing list. I believe the core team is trying to get more into the bug tacker habit, so it's probably best to start there for most things. I find I have more success with topics that should be discussed, like the
Enumerable#zip()
issue, on Ruby Core though. For serious issues, I recommend putting it on in the bug tracker then drawing attention to it on Ruby Core.
-
-
When Ruby 1.8 was on the horizon and 1.6 was the normal version for people to use, someone created a library called "shim", which allowed the use of 1.7/1.8-style features in 1.6 code.
With compatibility between 1.8 and 1.9 a key issue for many people, such a "shim" library could be very useful.
-
When Ruby 1.8 was on the horizon and 1.6 was the normal version for people to use, someone created a library called "shim", which allowed the use of 1.7/1.8-style features in 1.6 code.
With compatibility between 1.8 and 1.9 a key issue for many people, such a "shim" library could be very useful.
I would recommend you backports
http://github.com/marcandre/backportsAs it is pure Ruby, I believe you should still care where speed is important (though it would be better if users upgrade to 1.9 of course)
I found these "hacks" very bad to read, I would personally be tempted to use sth like backports instead (or maybe only a part of it)
-
-
Hi James,
Thanks for sharing. Based on your code I've tried the following for backwards compatiblity with Ruby 1.8 where everything uses the
CSV
class constant.require "csv" if CSV.const_defined? :Reader # Ruby 1.8 compatible require 'fastercsv' Object.send(:remove_const, :CSV) CSV = FasterCSV else # CSV is now FasterCSV in ruby 1.9 end
-
Thanks to Michael Barton. His fix is just what I sought.
-
-
Hi James,
I just upgraded from Ruby 1.8.6 to 1.9.1 and I found couple of issue while starting
script/server
.- Case stmt with
:
was throwing the error. I changed:
tothen
, it worked - Just below the case stmt I had the below line
<%= link_to ( h(oval), :action => 'find_oval', :id => [@sas_report_id, new_oval , the_cve ].join("_"))%><br>
It thrown me the errors like
------------------------------------------------------------- /home/ton/vsweb/nvd/app/views/report/_sas_manual_show.rhtml:16: syntax error, unexpected ',', expecting ')' ...( oval, :action => 'find_oval',:id => [@sas_report_id, new_o... ... ^ /home/ton/vsweb/nvd/app/views/report/_sas_manual_show.rhtml:16: syntax error, unexpected ')', expecting keyword_end ...l , the_cve ].join("_"))).to_s); @output_buffer.concat "<br>... ---------------------------------------------------------
I then removed the opening and closing parentheses
<%= link_to h(oval), :action => 'find_oval', :id => [@sas_report_id, new_oval , the_cve ].join("_")%><br>
It worked !!
Wondering what has got changed here in Ruby 1.9.1.
What else syntax changes are present in 1.9.1. I don't know which else places we need to change our code.
Anybody else got this error with 1.9.1. Have you got any solution. Please let me know.
Thanks
Chandu-
I'm just guessing, but I believe the issue was that you had a space between the method name (
link_to
) and the opening parenthesis ((
). Try taking out the space and I bet that fixes it up.-
I think, You are right. After removing spaces it looks good. It seems to be working.
But I am not sure how the same code was working with Ruby 1.8.6.
Thanks
Chandan-
It was allowed in Ruby 1.8.6, though it did print a warning back then that it was deprecated.
-
-
- Case stmt with
-
Hi James,
I am now getting the following error
============================================================= formal argument cannot be an instance variable ...@checklists.each do |@checklist|; @output_buffer.concat "\r\... ... ^ Extracted source (around line #14): 11: <%= sort_header_tag('last_modified_datetime', :title => 'Last Modified') %> 12: <%= sort_header_tag('last_import_datetime', :title => 'Last Import') %> 13: <th>Resources</th> 14: <% @checklists.each do |@checklist|%> 15: <tr> 16: <td> 17: <% ==============================================================
Have you ever got this error. What could be the issue.
All these errors I am getting after migrating to Ruby 1.91
Thanks
Chandu-
Yeah, blocks are no longer allowed to abuse variables like that (since they stick around after the iteration). You will need to switch to an explicit assignment, if that's what's really needed:
@checklists.each do |checklist| @checklist = checklist # ... end
Hope that helps.
-