Gray Soft

20

SEP
2014

Can You snake_case/CamelCase With One Regex?

In Rails, methods like underscore() and camelize() use several regexen to transform the String under the hood. Many people have asked if you can do it with a single regex though. These specs I borrowed from Rails seem to say yes:

#!/usr/bin/env ruby -w

class String
  def snake_case(acronyms = self.class.acronyms)
    gsub( %r{
      (?:
        (?<before>  \b | [A-Za-z\d]   )
        (?<acronym> #{acronyms.regex} )
        (?<after>   \b | [^a-z]       )
      )
      |
      (?: (?<before> [A-Z]+ ) (?<after> [A-Z][^A-Z] ) )
      |
      (?: (?<before> [^A-Z:] ) (?<after> [A-Z] ) )
      |
      (?<nesting> :: )
    }x ) { |m|
      if $~[:nesting]
        "/"
      else
        [$~[:before], $~[:acronym], $~[:after]]
          .compact
          .reject(&:empty?)
          .join("_")
      end
    }.downcase
  end

  def CamelCase(acronyms = self.class.acronyms)
    gsub( %r{
      (?:
        (?: \A | _ | (?<nesting> / ) )
        (?<acronym> #{acronyms.inverted_regex} )
        (?= \b | [A-Z_] )
      )
      |
      (?: (?: \A | _ ) (?<letter> . ) )
      |
      (?: (?<nesting> / ) (?<letter> . ) )
    }mx ) {
      nested      = $~[:nesting] && "::"
      capitalized = acronyms.capitalize($~[:acronym]) { $~[:letter].upcase }
      "#{nested}#{capitalized}"
    }
  end

  def camelCase
    self.CamelCase.sub(/\A[A-Z]/) { |first_char| first_char.downcase }
  end

  def self.acronyms
    @acronyms ||= AcronymManager.new
  end
end

class AcronymManager
  NEVER_MATCHES = /\zA/

  def initialize
    @acronyms = { }
    @inverted = { }
  end

  attr_reader :acronyms, :inverted
  private     :acronyms, :inverted

  def add(acronym)
    acronyms[acronym] = acronym.downcase
    @inverted         = acronyms.invert
  end

  def regex
    return NEVER_MATCHES if acronyms.empty?

    /(?:#{acronyms.keys.map(&Regexp.method(:escape)).join('|')})/
  end

  def inverted_regex
    return NEVER_MATCHES if acronyms.empty?

    /(?:#{inverted.keys.map(&Regexp.method(:escape)).join('|')})/
  end

  def capitalize(acronym, &default)
    inverted.fetch(acronym, &default)
  end
end

if $PROGRAM_NAME == __FILE__
  require "minitest/autorun"

  describe "Case changing" do
    # https://github.com/rails/rails/blob/
    # 620f4a4fc962c863b91a51876ffdf58f33bedb9c/activesupport/test/
    # inflector_test_cases.rb#L118-L123
    let(:examples) {
      {
        "Product"               => "product",
        "SpecialGuest"          => "special_guest",
        "ApplicationController" => "application_controller",
        "Area51Controller"      => "area51_controller",
      }
    }
    # https://github.com/rails/rails/blob/
    # 620f4a4fc962c863b91a51876ffdf58f33bedb9c/activesupport/test/
    # inflector_test_cases.rb#L139-L145
    let(:one_way_snake_examples) {
      {
        "HTMLTidy"              => "html_tidy",
        "HTMLTidyGenerator"     => "html_tidy_generator",
        "FreeBSD"               => "free_bsd",
        "HTML"                  => "html",
        "ForceXMLController"    => "force_xml_controller"
      }
    }
    # https://github.com/rails/rails/blob/
    # 620f4a4fc962c863b91a51876ffdf58f33bedb9c/activesupport/test/
    # inflector_test.rb#L98
    let(:one_way_camel_examples) {
      {
        "CamelCase"             => "Camel_Case"
      }
    }
    # added by James
    let(:path_examples) {
      {
        "SomeLib::WithClass"    => "some_lib/with_class"
      }
    }
    # https://github.com/rails/rails/blob/
    # 620f4a4fc962c863b91a51876ffdf58f33bedb9c/activesupport/test/
    # inflector_test.rb#L101-L145
    let(:acronym_examples) {
      {
        "API"                   => "api",
        "APIController"         => "api_controller",
        "Nokogiri::HTML"        => "nokogiri/html",
        "HTTPAPI"               => "http_api",
        "HTTP::Get"             => "http/get",
        "SSLError"              => "ssl_error",
        "RESTful"               => "restful",
        "RESTfulController"     => "restful_controller",
        "Nested::RESTful"       => "nested/restful",
        "IHeartW3C"             => "i_heart_w3c",
        "PhDRequired"           => "phd_required",
        "IRoRU"                 => "i_ror_u",
        "RESTfulHTTPAPI"        => "restful_http_api",

        # misdirection
        "Capistrano"            => "capistrano",
        "CapiController"        => "capi_controller",
        "HttpsApis"             => "https_apis",
        "Html5"                 => "html5",
        "Restfully"             => "restfully",
        "RoRails"               => "ro_rails"
      }
    }

    it "can snake_case a String" do
      examples.each do |camel, snake|
        camel.snake_case.must_equal(snake)
      end
    end

    it "can handle some tricky one-way cases for snake_case" do
      one_way_snake_examples.each do |camel, snake|
        camel.snake_case.must_equal(snake)
      end
    end

    it "can CamelCase a String" do
      examples.each do |camel, snake|
        snake.CamelCase.must_equal(camel)
      end
    end

    it "can handle some tricky one-way cases for CamelCase" do
      one_way_camel_examples.each do |camel, snakey|
        snakey.CamelCase.must_equal(camel)
      end
    end

    it "can camelCase a String" do
      "camel_case".camelCase.must_equal("camelCase")
    end

    it "can convert nesting to paths and back" do
      path_examples.each do |camel, snake|
        camel.snake_case.must_equal(snake)
        snake.CamelCase.must_equal(camel)
      end
    end

    it "is aware of acronyms" do
      acronyms = AcronymManager.new
      acronyms.add("API")
      acronyms.add("HTML")
      acronyms.add("HTTP")
      acronyms.add("RESTful")
      acronyms.add("W3C")
      acronyms.add("PhD")
      acronyms.add("RoR")
      acronyms.add("SSL")

      acronym_examples.each do |camel, snake|
        camel.snake_case(acronyms).must_equal(snake)
        snake.CamelCase(acronyms).must_equal(camel)
      end
    end
  end
end

In: Deadly Regular Expressions | Tags: Regular Expression | 0 Comments

19

SEP
2014

"You can't parse [X]HTML with regex."

The only explanation I'll give for the following code it to provide this link to my favorite Stack Overflow answer.

#!/usr/bin/env ruby -w

require "open-uri"

URL    = "http://stackoverflow.com/questions/1732348/" +
         "regex-match-open-tags-except-xhtml-self-contained-tags"
PARSER = %r{
  (?<doctype_declaration>
    <!DOCTYPE\b (?<doctype> [^>]* ) >
  ){0}
  (?<comment>
    <!-- .* -->
  ){0}

  (?<script_tag>
    < \s* (?<tag_name> script ) \s* (?<attributes> [^>]* > )
      (?<script> .*? )
    < \s* / \s* script \s* >
  ){0}
  (?<self_closed_tag>
    < \s* (?<tag_name> \w+ ) \s* (?<attributes> [^>]* / \s* > )
  ){0}
  (?<unclosed_tag>
    < \s*
    (?<tag_name> link | meta | br | input | hr | img ) \b
    \s*
    (?<attributes> [^>]* > )
  ){0}
  (?<open_tag>
    < \s* (?<tag_name> \w+ ) \s* (?<attributes> [^>]* > )
  ){0}
  (?<close_tag>
    < \s* / \s* (?<tag_name> \w+ ) \s* >
  ){0}

  (?<attribute>
    (?<attribute_name> [-\w]+ )
    (?: \s* = \s* (?<attribute_value> "[^"]*" | '[^']*' | [^>\s]+ ) )? \s*
  ){0}
  (?<attribute_list>
    \g<attribute>
    (?= [^>]* > \z )  # attributes keep a trailing > to disambiguate from text
  ){0}

  (?<text>
    (?! [^<]* /?\s*> \z )  # a guard to prevent this from parsing attributes
    [^<]+
  ){0}

  \G
  (?:
    \g<doctype_declaration>
    |
    \g<comment>
    |
    \g<script_tag>
    |
    \g<self_closed_tag>
    |
    \g<unclosed_tag>
    |
    \g<open_tag>
    |
    \g<attribute_list>
    |
    \g<close_tag>
    |
    \g<text>
  )
  \s*
}mix

def parse(html)
  stack = [{attributes: [ ], contents: [ ], name: :root}]
  loop do
    html.sub!(PARSER, "") or break
    if $~[:doctype_declaration]
      add_to_tree(stack.last, "DOCTYPE", $~[:doctype].strip)
    elsif $~[:script_tag]
      add_to_stack(stack, $~[:tag_name], $~[:attributes], $~[:script])
    elsif $~[:self_closed_tag] || $~[:unclosed_tag] || $~[:open_tag]
      add_to_stack(stack, $~[:tag_name], $~[:attributes], "", $~[:open_tag])
    elsif $~[:close_tag]
      stack.pop
    elsif $~[:text]
      stack.last[:contents] << $~[:text]
    end
  end
  stack.pop
end

def add_to_tree(branch, name, value)
  if branch.include?(name)
    branch[name]  = [branch[name]] unless branch[name].is_a?(Array)
    branch[name] << value
  else
    branch[name] = value
  end
end

def add_to_stack(stack, tag_name, attributes_html, contents, open = false)
  tag = { attributes: parse_attributes(attributes_html),
          contents:   [contents].reject(&:empty?),
          name:       tag_name }
  add_to_tree(stack.last, tag_name, tag)
  stack.last[:contents] << tag
  stack                 << tag if open
end

def parse_attributes(attributes_html)
  attributes = { }
  loop do
    attributes_html.sub!(PARSER, "") or break
    add_to_tree(
      attributes,
      $~[:attribute_name],
      ($~[:attribute_value] || $~[:attribute_name]).sub(/\A(["'])(.*)\1\z/, '\2')
    )
  end
  attributes
end

def convert_to_bbcode(node)
  if node.is_a?(Hash)
    name = node[:name].sub(/\Astrike\z/, "s")
    "[#{name}]#{node[:contents].map { |c| send(__method__, c) }.join}[/#{name}]"
  else
    node
  end
end

html = open(URL, &:read).strip
ast  = parse(html)
puts ast["html"]["body"]["div"]
  .find { |div| div[:attributes]["class"] == "container"      }["div"]
  .find { |div| div[:attributes]["id"]    == "content"        }["div"]["div"]
  .find { |div| div[:attributes]["id"]    == "mainbar"        }["div"]
  .find { |div| div[:attributes]["id"]    == "answers"        }["div"]
  .find { |div| div[:attributes]["id"]    == "answer-1732454" }["table"]["tr"]
  .first["td"]
  .find { |div| div[:attributes]["class"] == "answercell"     }["div"]["p"]
  .first[:contents]
  .map(&method(:convert_to_bbcode))  # to reach a wider audience
  .join

In: Deadly Regular Expressions | Tags: For Fun, Parsing & Regular Expression | 1 Comment

11
SEP
2014
Experimenting With Ownership

Let's use a trivial exercise to see what we can learn about ownership, moving, borrowing, and more in Rust. Here's the idea:
1. We'll allocate a list of numbers
2. We'll add one to each number in the list
3. We'll print the resulting list of numbers
This is a simple process requiring only a few lines of code:
```
fn main() {
    let mut numbers = vec![1u, 2, 3];
    for n in numbers.mut_iter() {
        *n += 1;
    }
    println!("{}", numbers);
}
```
The output is hopefully what we all expect to see:
```
$ ./one_function 
[2, 3, 4]
```
In this code there is just one variable: numbers. That variable owns a list of numbers on the heap and it's scope is limited to the main() function, which is just a way to say that the data exists for the length of that function call. Since all three steps happen in that one function call, ownership doesn't really affect us here.

To better examine what ownership really means, let's add one small twist to our exercise:
- The increment of each number in the list must happen in a separate function
Read more…
In: Rusting | Tags: Iterators, Rust & Syntax | 0 Comments
6
SEP
2014
Taking Rust to Task

Now that I've reached the point where I can get some Rust code running without asking questions in IRC every five minutes, I really wanted to play with some tasks. Tasks are the way Rust handles multiprocessing code. Under the hood they can map one-to-one with operating system threads or you can use a many-to-one mapping that I'm not ready to go into yet.

Probably one of the most exciting aspect of tasks in Rust, in my opinion, is that unsafe use of shared memory is rejected outright as a compile error. That lead me to want to figure out how you communicate correctly. (Spoiler: the same was you do in Ruby: just pass messages.)

Ready to dive in, I grossly simplified a recent challenge from work and coded it up in Rust. You can get the idea with a glance at main():
```
use std::collections::HashMap;

// ...

fn string_vec(strs: &[&'static str]) -> Vec<String> {
    let mut v = Vec::new();
    for s in strs.iter() {
        v.push(s.to_string());
    }
    v
}

fn main() {
    let mut services = HashMap::new();
    services.insert("S1".to_string(), string_vec(["A", "B"]));
    services.insert("S2".to_string(), string_vec(["A", "C"]));
    services.insert("S3".to_string(), string_vec(["C", "D", "E", "F"]));
    services.insert("S4".to_string(), string_vec(["D", "B"]));
    services.insert("S5".to_string(), string_vec(["A", "Z"]));

    let work = Work(Search::new("A".to_string(), "B".to_string()));

    let mut task_manager = TaskManager::new(services);
    task_manager.run(work);
}
```
Read more…
In: Rusting | Tags: Concurrency & Rust | 1 Comment

27

AUG
2014

Which Types to Type

I've mentioned before that I'm writing some Rust code, specifically an RPN calculator as a simple exercise. I'm going to dump the code here so we can discuss one aspect of it, but do remember that I'm very new to Rust and this code could surely be better:

use std::fmt;
use std::os;

struct Stack {
    numbers: Vec<f64>
}
impl Stack {
    fn new() -> Stack {
        Stack{numbers: vec![]}
    }

    fn is_empty(&self) -> bool {
        self.numbers.is_empty()
    }

    fn push(&mut self, number: f64) {
        self.numbers.push(number);
    }

    fn result(&self) -> f64 {
        *self.numbers.last().expect("Stack empty.")
    }

    fn add(&mut self)      { self._do_binary_operation(|l, r| l + r); }
    fn subtract(&mut self) { self._do_binary_operation(|l, r| l - r); }
    fn multiply(&mut self) { self._do_binary_operation(|l, r| l * r); }
    fn divide(&mut self)   { self._do_binary_operation(|l, r| l / r); }

    fn _do_binary_operation(&mut self, operation: |f64, f64| -> f64) {
        let r = self.numbers.pop().expect("Stack underflow.");
        let l = self.numbers.pop().expect("Stack underflow.");
        self.numbers.push(operation(l, r));
    }
}
impl fmt::Show for Stack {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        let mut s = String::new();
        let mut i = self.numbers.len();
        for number in self.numbers.iter() {
            i -= 1;
            s = s.add(&format!("{}: {}\n", i, number));
        }
        s.pop_char();
        write!(f, "{}", s)
    }
}

struct Tokenizer {
    tokens: Vec<String>,
    i:      uint
}
impl Tokenizer {
    fn new(expression: &str) -> Tokenizer {
        Tokenizer{
          tokens: expression.split(|c: char| c.is_whitespace())
                            .map(|s| s.to_string())
                            .collect(),
          i:      0
        }
    }

    fn has_next_token(&self) -> bool {
        self.i < self.tokens.len()
    }

    fn next_token(&mut self) -> &str {
        if !self.has_next_token() { fail!("Tokens exhausted.") }

        let token = self.tokens[self.i].as_slice();
        self.i   += 1;
        token
    }
}

struct RPNCalculator {
    stack:  Stack,
    tokens: Tokenizer
}
impl RPNCalculator {
    fn new(stack: Stack, tokens: Tokenizer) -> RPNCalculator {
        RPNCalculator{stack: stack, tokens: tokens}
    }

    fn calculate(&mut self) -> f64 {
        while self.tokens.has_next_token() {
            let token = self.tokens.next_token();
            if !self.stack.is_empty() {
                println!("{}", self.stack);
            }
            println!("T: {}\n", token);
            match token {
                "+" => { self.stack.add(); }
                "-" => { self.stack.subtract(); }
                "*" => { self.stack.multiply(); }
                "/" => { self.stack.divide(); }
                n   => { self.stack.push(from_str(n).expect("Not a number.")); }
            }
        }
        if !self.stack.is_empty() {
            println!("{}\n", self.stack);
        }
        self.stack.result()
    }
}

fn main() {
    let     expression = os::args();
    let     stack      = Stack::new();
    let     tokenizer  = Tokenizer::new(expression[1].as_slice());
    let mut calculator = RPNCalculator::new(stack, tokenizer);
    println!("{}", calculator.calculate());
}

In: Rusting | Tags: Rust & Style | 0 Comments

22

AUG
2014

Sleepy Programs

When we think of real multiprocessing, our thoughts probably drift more towards languages like Erlang, Go, Clojure, or Rust. Such languages really focus on getting separate "processes" to communicate via messages. This makes it a lot easier to know when one process is waiting on another, because calls to receive messages typically block until one is available.

But what about Ruby? Can we do intelligent process coordination in Ruby?

Yes, we can. The tools for it are more awkward though. It's easy to run into tricky edge cases and hard to code your way out of them correctly.

Let's play with an example to see how good we can make things. Here's what we will do:

We will start one parent process that will fork() a single child process
The child will push three messages onto a RabbitMQ queue and exit()
The parent will listen for three messages to arrive, then exit()

Here's a somewhat sloppy first attempt at solving this:

#!/usr/bin/env ruby

require "benchmark"

require "bunny"

QUEUE_NAME = "example"
MESSAGES   = %w[first second third]

def send_messages(*messages)
  connection = Bunny.new.tap(&:start)
  exchange   = connection.create_channel.default_exchange

  messages.each do |message|
    exchange.publish(message, routing_key: QUEUE_NAME)
  end

  connection.close
end

def listen_for_messages(received_messages)
  connection = Bunny.new.tap(&:start)
  queue      = connection.create_channel.queue(QUEUE_NAME, auto_delete: true)

  queue.subscribe do |delivery_info, metadata, payload|
    received_messages << payload
  end

  time_it("Received #{MESSAGES.size} messages") do
    yield
  end

  connection.close
end

def time_it(name)
  elapsed = Benchmark.realtime do
    yield
  end
  puts "%s: %.2fs" % [name, elapsed]
end

def wait_for_messages(received_messages)
  until received_messages == MESSAGES
    sleep 0.1  # don't peg the CPU while we wait
  end
end

def send_and_receive
  pid = fork do
    sleep 3  # make sure we're receiving before they are sent
    send_messages(*MESSAGES)
  end
  Process.detach(pid)

  received_messages = [ ]
  listen_for_messages(received_messages) do
    wait_for_messages(received_messages)
  end
end

send_and_receive

In: Rubies in the Rough | Tags: Concurrency & Performance | 8 Comments

21

AUG
2014

Guard Clauses, Rust Style

When I'm programming in Ruby, I will often use guard clauses to prevent undesirable scenarios. For example, let's say I'm building a simple Stack:

class Stack
  def initialize
    @numbers = [ ]
  end

  def push(number)
    @numbers.push(number)
  end

  def peek
    fail "Stack underflow" if @numbers.empty?

    @numbers.last
  end
end

stack = Stack.new

ARGV.each do |number|
  stack.push(number.to_f)
end

p stack.peek

If I only want to work with numbers everywhere, I add a line like the call to fail() above. This prevents a nil from being returned from peek(), ruining my expectation that I will have numbers everywhere.

When I first started playing with Rust, I wanted to write code the same way:

use std::os;

struct Stack {
    numbers: Vec<f64>
}
impl Stack {
    fn new() -> Stack {
        Stack{numbers: vec![]}
    }

    fn push(&mut self, number: f64) {
        self.numbers.push(number);
    }

    fn peek(&self) -> f64 {
        if self.numbers.is_empty() { fail!("Stack underflow"); }

        self.numbers.last()
    }
}

fn main() {
    let mut stack = Stack::new();

    for number in os::args().tail().iter() {
        stack.push(from_str(number.as_slice()).expect("Not a number"));
    }

    println!("{}", stack.peek());
}

In: Rusting | Tags: Error Handling, Patterns & Rust | 2 Comments

21
AUG
2014

Asking Better Questions

I've been playing with some Rust lately and learning a bunch of new concepts.

As part of my experimentation, I built an RPN calculator in the language, just as an exercise that would require a moderate amount of code to be worked out. Honestly, I don't fully understand everything that I had to do to get this code working yet. Maybe 20% of it came about as me following instructions from what seem to be very helpful compiler errors.

I wanted to start attacking these concepts that I didn't understand to increase my knowledge. Of course, I was impatient. I had some working code and I knew what I was missing, so I jumped into IRC, pointed at the code, and asked some not-at-all complete questions. All I got was crickets.

This isn't a failing of the Rust community. It's a lesson I have to relearn every now and then. You have to take the time to present a good enough question that answering it is easy enough and worth it. It's hard work to get your head around 100 lines of code and, even if you do, you'll still be missing plenty of context if the question isn't really well formed. Given that, most people just ignore the question. That's probably for the better too, because any answers provided likely would have missed the points I really needed help with.
Read more…

In: Rusting | Tags: Community & Rust | 0 Comments
20
JUL
2014

Dave's No Tests Challenge

I've mentioned before my difficulties in the 2014 IPSC. But taking one beating is no reason not to try again. The first loss just showed me that the contest still had more to teach me.

A buddy of mine has spent some time with the crossword problem and told me that he enjoyed it. I didn't try this problem during the actual event, but I was a little familiar with it from my friend's description.

To add to the fun, I decided this would be a great excuse to take up the recent challenge Dave Thomas gave to the Ruby Rogues: "Stop writing tests."

Step 1: Feedback Loops

Without tests to guide me, I really want to see what's going on. One of the biggest advantages of tests, in my opinion, is the feedback loop it provides. So I set out to provide my own feedback.

Since the problem at hand involves filling in a crossword board, the easiest feedback loop I could think of was to see the board as it fills in. The final board is also the required output. Therefor, I decided a good first step would just be to read the board into some data structure and write it back out. Once I had that, I could insert code between those steps to fill it in. And constantly seeing the board evolve would let me eyeball things for obvious mistakes.
Read more…

In: Rubies in the Rough | Tags: Databases, Process & Test-Driven Development | 0 Comments
10
JUL
2014
One Programmer's Library

I have always enjoyed the posts where people list out all of the books they think are important, given some subject. They vary wildly. For example, some are very to-the-point while others are rich in detail (and Danielle makes one of those each year).

I began to wonder what my list would look like. Below I've tried to make those decisions. I was surprised by just how hard it is to restrict myself to the bare essentials. A lot of things that I read influence me in some way.

The Classics

I'll start with the super obvious titles you've likely heard mentioned before.
- Refactoring: Improving the Design of Existing Code is probably the classically great programming text that had the biggest effect on me. This book teaches you how to turn the code you have into the code you want. It doesn't get much more essential than that.
- Smalltalk Best Practice Patterns is the book I learned a new language just to read. It's worth that. This is The Field Manual of Object-Oriented Tactics and it helps you know what to do line-by-line.
- Patterns of Enterprise Application Architecture is the book you should read so you can see how much great knowledge we've had about building programs for over a decade. Odds are that this book can teach you multiple strategies for problems you face regularly.
- The Pragmatic Programmer: From Journeyman to Master is one of those rare books that can make you a better person, in addition to a better programmer. Advice like "Fix Broken Windows" and "Make Stone Soup" have universal scope. This is a must read.
- Programming Pearls (2nd Edition) is a book about algorithms and, eventually, all programmers need to learn some algorithms. The upside of this title is that it's fun from page one. That helps to keep you interested in what can be a dry topic.
- Growing Object-Oriented Software, Guided by Tests is almost surely the single biggest influence on how I do Test-Driven Development. There are other schools of thought but pick one and dig deep enough into TDD until you're confident about when and how to use it. This is a tool everyone needs in their toolbox.
Read more…
In: Book Reviews | Tags: Process | 1 Comment