13
OCT2008
The Secret Shell Helper
Someone pops onto the Ruby Talk mailing list fairly regularly asking how to break up content like:
one "two" "a longer three"
They expect to end with a three element Array
, where the third item will contain spaces. They generally expect the quotes will have been removed as well.
If your needs are very, very simple you may be able to handle this with a regular expression:
data = 'one "two" "a longer three"'
p data.scan(/"([^"]*)"|(\S+)/).flatten.compact
# >> ["one", "two", "a longer three"]
That just searches for either a set of quotes with some non-quote characters between them or a run of non-whitespace characters. Those are the two possibilities for the fields. Note that the two separate capture here mean scan()
will returns contents in the form:
[[nil, "one"], ["two", nil], ["a longer three", nil]]
That's why I added a flatten()
and compact()
to get down to the actual matches.
The regular expression approach can get pretty complex though if any kind of escaping for quotes is involved. When that happens, you may need to step up to a parser.
One choice for that would be to abuse a CSV parser to get it to divide up the data for you. Here's how you would do that with FasterCSV
:
require "rubygems"
require "faster_csv"
data = 'one "two" "a longer three"'
p data.parse_csv(:col_sep => " ")
# >> ["one", "two", "a longer three"]
As you see, replacing the column separator (traditionally a comma) with a simple space gets FasterCSV
to break down this data correctly.
This parser will handle escaping, though it's CSV style escaping. That means that quotes will need to be doubled:
require "rubygems"
require "faster_csv"
data = 'simple "embedded ""quote"" characters"'
p data.parse_csv(:col_sep => " ")
# >> ["simple", "embedded \"quote\" characters"]
I doubt that fits the data well too often. I suspect that an escaped quote is more often \"
than ""
. Why is that? Well, data of this type isn't typically CSV data. Which leads us to the natural question, what kind of data are we really working with here?
I'm guessing it's shell data more often that not. Most shells handle quoting like this:
$ ruby -e 'p ARGV' one "two" "a longer three"
["one", "two", "a longer three"]
If that's really the case, we're going to need a shell oriented parser. It's sadly not well known, I assume because it's strangely absent from http://ruby-doc.org/, but Ruby ships with such a parser. The standard Shellwords
library will break these down for you:
require "shellwords"
data = "one 'two' 'a longer three'"
p Shellwords.shellwords(data)
# >> ["one", "two", "a longer three"]
If your data really is shell content, you'll be glad to know that Shellwords
will handle all the special cases:
require "shellwords"
s = lambda { |shell| p Shellwords.shellwords(shell) }
s[%Q{"escaped \\"quote\\" characters"}]
s[%Q{escaped\\ spaces}]
s[%Q{'back to'" back quoting"}]
# >> ["escaped \"quote\" characters"]
# >> ["escaped spaces"]
# >> ["back to back quoting"]
Shellwords
has some new features in Ruby 1.9 as well:
-
shellsplit()
is added as an alias forshellwords()
-
shellescape()
was added to escape aString
for use in Bash -
shelljoin()
was added to escape and join anArray
of arguments -
shellsplit()
andshellescape()
are added toString
andshelljoin()
is added toArray
for easier access
Hopefully this helps you find the right parser for your data.
Comments (1)
-
Kelsey May 13th, 2009 Reply Link
Thanks for posting this, James. I never knew this existed.