Parsing

Posts tagged with "Parsing."
  • 11

    NOV
    2011

    Doing it Wrong

    Continuing with my Breaking All of the Rules series, I want to peek into several little areas where I've been caught doing the wrong thing. I'm a rule breaker and I'm determined to take someone down with me!

    My Forbidden Parser

    In one application, I work with an API that hands me very simple data like this:

    <emails>
      <email>user1@example.com</email>
      <email>user2@example.com</email>
      <email>user3@example.com</email></emails>
    

    Now I need to make a dirty confession: I parsed this with a Regular Expression.

    I know, I know. We should never parse HTML or XML with a Regular Expression. If you don't believe me, just take a moment to actually read that response. Yikes!

    Oh and you shouldn't validate emails with a Regular Expression. Oops. We're talking about at least two violations here.

    But it gets worse.

    You may be think I rolled a little parser based on Regular Expressions. That might look like this:

    #!/usr/bin/env ruby -w
    
    require "strscan"
    
    class EmailParser
      def initialize(data)
        @scanner = StringScanner.new(data)
      end
    
      def parse(&block)
        parse_emails(&block)
      end
    
      private
    
      def parse_emails(&block)
        @scanner.scan(%r{\s*<emails>\s*}) or fail "Failed to match list start"
        loop do
          parse_email(&block) or break
        end
        @scanner.scan(%r{\s*</emails>}) or fail "Failed to match list end"
      end
    
      def parse_email(&block)
        if @scanner.scan(%r{<email>\s*})
          if email = @scanner.scan_until(%r{</email>\s*})
            block[email.strip[0..-9].strip]
            return true
          else
            fail "Failed to match email end"
          end
        end
        false
      end
    end
    
    EmailParser.new(ARGF.read).parse do |email|
      puts email
    end
    

    Read more…

  • 18

    NOV
    2007

    Ghost Wheel Example

    There has been a fair bit of buzz around the Treetop parser in the Ruby community lately. Part of that is fueled by the nice screencast that shows off how to use the parser generator.

    It doesn't get talked about as much, but I wrote a parser generator too, called Ghost Wheel. Probably the main reason Ghost Wheel doesn't receive much attention yet is that I have been slow in getting the documentation written. Given that, I thought I would show how the code built in the Treetop screencast translates to Ghost Wheel:

    #!/usr/bin/env ruby -wKU
    
    require "rubygems"
    require "ghost_wheel"
    
    # define a parser using Ghost Wheel's Ruby DSL
    RubyParser    = GhostWheel.build_parser do
      rule( :additive,
            alt( seq( :multiplicative,
                      :space,
                      :additive_op,
                      :space,
                      :additive ) { |add| add[0].send(add[2], add[-1])},
                 :multiplicative ) )
      rule(:additive_op, alt("+", "-"))
    
      rule( :multiplicative,
            alt( seq( :primary,
                      :space,
                      :multiplicative_op,
                      :space,
                      :multiplicative ) { |mul| mul[0].send(mul[2], mul[-1])},
                 :primary ) )
      rule(:multiplicative_op, alt("*", "/"))
    
      rule(:primary, alt(:parenthized_additive, :number))
      rule( :parenthized_additive,
            seq("(", :space, :additive, :space, ")") { |par| par[2] } )
      rule(:number, /[1-9][0-9]*|0/) { |n| Integer(n) }
    
      rule(:space, /\s*/)
      parser(:exp, seq(:additive, eof) { |e| e[0] })
    end
    
    # define a parser using Ghost Wheel's grammar syntax
    GrammarParser = GhostWheel.build_parser %q{
      additive             =  multiplicative space additive_op space additive
                              { ast[0].send(ast[2], ast[-1]) }
                           |  multiplicative
      additive_op          =  "+" | "-"
    
      multiplicative       =  primary space multiplicative_op space multiplicative
                              { ast[0].send(ast[2], ast[-1])}
                           |  primary
      multiplicative_op    =  "*" | "/"
    
      primary              = parenthized_additive | number
      parenthized_additive =  "(" space additive space ")" { ast[2] }
      number               =  /[1-9][0-9]*|0/ { Integer(ast) }
    
      space                =  /\s*/
      exp                  := additive EOF { ast[0] }
    }
    
    if __FILE__ == $PROGRAM_NAME
      require "test/unit"
    
      class TestArithmetic < Test::Unit::TestCase
        def test_paring_numbers
          assert_parses         "0"
          assert_parses         "1"
          assert_parses         "123"
          assert_does_not_parse "01"
        end
    
        def test_parsing_multiplicative
          assert_parses "1*2"
          assert_parses "1 * 2"
          assert_parses "1/2"
          assert_parses "1 / 2"
        end
    
        def test_parsing_additive
          assert_parses "1+2"
          assert_parses "1 + 2"
          assert_parses "1-2"
          assert_parses "1 - 2"
    
          assert_parses "1*2 + 3 * 4"
        end
    
        def test_parsing_parenthized_expressions
          assert_parses "1 * (2 + 3) * 4"
        end
    
        def test_parse_results
          assert_correct_result "0"
          assert_correct_result "1"
          assert_correct_result "123"
    
          assert_correct_result "1*2"
          assert_correct_result "1 * 2"
          assert_correct_result "1/2"
          assert_correct_result "1 / 2"
    
          assert_correct_result "1+2"
          assert_correct_result "1 + 2"
          assert_correct_result "1-2"
          assert_correct_result "1 - 2"
    
          assert_correct_result "1*2 + 3 * 4"
          assert_correct_result "1 * (2 + 3) * 4"
        end
    
        private
    
        PARSERS = [RubyParser, GrammarParser]
    
        def assert_parses(input)
          PARSERS.each do |parser|
            assert_nothing_raised(GhostWheel::FailedParseError) do
              parser.parse(input)
            end
          end
        end
    
        def assert_does_not_parse(input)
          PARSERS.each do |parser|
            assert_raises(GhostWheel::FailedParseError) { parser.parse(input) }
          end
        end
    
        def assert_correct_result(input)
          PARSERS.each { |parser| assert_equal(eval(input), parser.parse(input)) }
        end
      end
    end
    

    Read more…