Deadly Regular Expressions

What can we learn by using regular expression to do what it cannot do?

22

SEP
2014

A Regex Can't Match Balanced Parentheses

Can we do math with regular expressions?

#!/usr/bin/env ruby -w

def build_preparation_regex(number_regex, ops)
  %r{
    (?<number>             #{number_regex}                                   ){0}
    (?<operator>           [#{ops.map(&Regexp.method(:escape)).join}]        ){0}
    (?<term_operator_term> \g<term> \s* \g<operator> \s* \g<term>            ){0}
    (?<term>               \g<number> | \( \s* \g<term_operator_term> \s* \) ){0}

    \g<term_operator_term>(?=\s*\z|[^)])
  }x
end

NUMBER_REGEX               = %r{
  -?            # an optional minus
  \d+           # an integer
  (?: \. \d+)?  # an optional fractional bit
}x
PREPARE_MULT_AND_DIV_REGEX = build_preparation_regex(NUMBER_REGEX, %w[* /])
PREPARE_ADD_AND_SUB_REGEX  = build_preparation_regex(NUMBER_REGEX, %w[* / + -])
CHECK_REGEX                = %r{
  \A                   # the start of the expression
  (?<term>             # a term, which is:
    #{NUMBER_REGEX}    # a number
    |                  # or
    \( \s*             # a parenthesized group of
      \g<term>         # a term
      \s* [*/+\-] \s*  # an operator
      \g<term>         # and another term
    \s* \)             # the end of the parenthesized group
  )
  \z                   # the end of the expression
}x
MATH_REGEX                 = %r{
  \( \s*
  (?<left>     #{NUMBER_REGEX} )
  \s*
  (?<operator> [*/+\-]         )
  \s*
  (?<right>    #{NUMBER_REGEX} )
  \s* \)
}x

verbose = ARGV.delete("-v")
problem = ARGV.first.strip or abort "USAGE:  #{$PROGRAM_NAME} MATH_EXPRESSION"
steps   = [ ]

[PREPARE_MULT_AND_DIV_REGEX, PREPARE_ADD_AND_SUB_REGEX].each do |preparation|
  loop do
    steps << problem.dup if verbose
    problem.sub!(preparation) { |term| "(#{term})" } or break
  end
end

problem =~ CHECK_REGEX or abort "Error:  Invalid expression"

solution = problem.dup
loop do
  steps << solution.dup if verbose
  solution.sub!(MATH_REGEX) {
    $~[:left].to_f.public_send($~[:operator], $~[:right].to_f)
  } or break
end

puts steps.uniq[0..-2] if verbose
puts solution.sub(/\.0+\z/, "")
# $ ruby math.rb '2 + 3 * 4 / 6 - 5 + 1'
# 0
# $ ruby math.rb -v '2 + 3 * 4 / 6 - 5 + 1'
# 2 + 3 * 4 / 6 - 5 + 1
# 2 + (3 * 4) / 6 - 5 + 1
# 2 + ((3 * 4) / 6) - 5 + 1
# (2 + ((3 * 4) / 6)) - 5 + 1
# ((2 + ((3 * 4) / 6)) - 5) + 1
# (((2 + ((3 * 4) / 6)) - 5) + 1)
# (((2 + (12.0 / 6)) - 5) + 1)
# (((2 + 2.0) - 5) + 1)
# ((4.0 - 5) + 1)
# (-1.0 + 1)
# 0
# $ ruby math.rb -v '-2.1 * 5'
# -2.1 * 5
# (-2.1 * 5)
# -10.5
# $ ruby math.rb -v '2 +'
# Error:  Invalid expression
Comments (3)
  1. Iazel
    Iazel October 1st, 2014 Link

    Nice code! I like the approach of defining regexp parts in build_preparation_regexp and also the recursive thing (I wasn't aware of this possibility!)

  2. Krishna Dole
    Krishna Dole January 2nd, 2015 Link
    $ ruby math.rb -v '1 + 2 * (3 + 4)'
    1 + 2 * (3 + 4)
    (1 + 2) * (3 + 4)
    ((1 + 2) * (3 + 4))
    (3.0 * (3 + 4))
    (3.0 * 7.0)
    21
    

    Is this a known limitation?

    1. James Edward Gray II
      James Edward Gray II January 2nd, 2015 Link

      It's known now, but, no, I wasn't aware of this bug. Good find.