Gray Soft

The programming blog of James Edward Gray II (JEG2).
  • 19

    SEP
    2014

    "You can't parse [X]HTML with regex."

    The only explanation I'll give for the following code it to provide this link to my favorite Stack Overflow answer.

    #!/usr/bin/env ruby -w
    
    require "open-uri"
    
    URL    = "http://stackoverflow.com/questions/1732348/" +
             "regex-match-open-tags-except-xhtml-self-contained-tags"
    PARSER = %r{
      (?<doctype_declaration>
        <!DOCTYPE\b (?<doctype> [^>]* ) >
      ){0}
      (?<comment>
        <!-- .* -->
      ){0}
    
      (?<script_tag>
        < \s* (?<tag_name> script ) \s* (?<attributes> [^>]* > )
          (?<script> .*? )
        < \s* / \s* script \s* >
      ){0}
      (?<self_closed_tag>
        < \s* (?<tag_name> \w+ ) \s* (?<attributes> [^>]* / \s* > )
      ){0}
      (?<unclosed_tag>
        < \s*
        (?<tag_name> link | meta | br | input | hr | img ) \b
        \s*
        (?<attributes> [^>]* > )
      ){0}
      (?<open_tag>
        < \s* (?<tag_name> \w+ ) \s* (?<attributes> [^>]* > )
      ){0}
      (?<close_tag>
        < \s* / \s* (?<tag_name> \w+ ) \s* >
      ){0}
    
      (?<attribute>
        (?<attribute_name> [-\w]+ )
        (?: \s* = \s* (?<attribute_value> "[^"]*" | '[^']*' | [^>\s]+ ) )? \s*
      ){0}
      (?<attribute_list>
        \g<attribute>
        (?= [^>]* > \z )  # attributes keep a trailing > to disambiguate from text
      ){0}
    
      (?<text>
        (?! [^<]* /?\s*> \z )  # a guard to prevent this from parsing attributes
        [^<]+
      ){0}
    
      \G
      (?:
        \g<doctype_declaration>
        |
        \g<comment>
        |
        \g<script_tag>
        |
        \g<self_closed_tag>
        |
        \g<unclosed_tag>
        |
        \g<open_tag>
        |
        \g<attribute_list>
        |
        \g<close_tag>
        |
        \g<text>
      )
      \s*
    }mix
    
    def parse(html)
      stack = [{attributes: [ ], contents: [ ], name: :root}]
      loop do
        html.sub!(PARSER, "") or break
        if $~[:doctype_declaration]
          add_to_tree(stack.last, "DOCTYPE", $~[:doctype].strip)
        elsif $~[:script_tag]
          add_to_stack(stack, $~[:tag_name], $~[:attributes], $~[:script])
        elsif $~[:self_closed_tag] || $~[:unclosed_tag] || $~[:open_tag]
          add_to_stack(stack, $~[:tag_name], $~[:attributes], "", $~[:open_tag])
        elsif $~[:close_tag]
          stack.pop
        elsif $~[:text]
          stack.last[:contents] << $~[:text]
        end
      end
      stack.pop
    end
    
    def add_to_tree(branch, name, value)
      if branch.include?(name)
        branch[name]  = [branch[name]] unless branch[name].is_a?(Array)
        branch[name] << value
      else
        branch[name] = value
      end
    end
    
    def add_to_stack(stack, tag_name, attributes_html, contents, open = false)
      tag = { attributes: parse_attributes(attributes_html),
              contents:   [contents].reject(&:empty?),
              name:       tag_name }
      add_to_tree(stack.last, tag_name, tag)
      stack.last[:contents] << tag
      stack                 << tag if open
    end
    
    def parse_attributes(attributes_html)
      attributes = { }
      loop do
        attributes_html.sub!(PARSER, "") or break
        add_to_tree(
          attributes,
          $~[:attribute_name],
          ($~[:attribute_value] || $~[:attribute_name]).sub(/\A(["'])(.*)\1\z/, '\2')
        )
      end
      attributes
    end
    
    def convert_to_bbcode(node)
      if node.is_a?(Hash)
        name = node[:name].sub(/\Astrike\z/, "s")
        "[#{name}]#{node[:contents].map { |c| send(__method__, c) }.join}[/#{name}]"
      else
        node
      end
    end
    
    html = open(URL, &:read).strip
    ast  = parse(html)
    puts ast["html"]["body"]["div"]
      .find { |div| div[:attributes]["class"] == "container"      }["div"]
      .find { |div| div[:attributes]["id"]    == "content"        }["div"]["div"]
      .find { |div| div[:attributes]["id"]    == "mainbar"        }["div"]
      .find { |div| div[:attributes]["id"]    == "answers"        }["div"]
      .find { |div| div[:attributes]["id"]    == "answer-1732454" }["table"]["tr"]
      .first["td"]
      .find { |div| div[:attributes]["class"] == "answercell"     }["div"]["p"]
      .first[:contents]
      .map(&method(:convert_to_bbcode))  # to reach a wider audience
      .join
    

    Read more…

  • 11

    SEP
    2014

    Experimenting With Ownership

    Let's use a trivial exercise to see what we can learn about ownership, moving, borrowing, and more in Rust. Here's the idea:

    1. We'll allocate a list of numbers
    2. We'll add one to each number in the list
    3. We'll print the resulting list of numbers

    This is a simple process requiring only a few lines of code:

    fn main() {
        let mut numbers = vec![1u, 2, 3];
        for n in numbers.mut_iter() {
            *n += 1;
        }
        println!("{}", numbers);
    }
    

    The output is hopefully what we all expect to see:

    $ ./one_function 
    [2, 3, 4]
    

    In this code there is just one variable: numbers. That variable owns a list of numbers on the heap and it's scope is limited to the main() function, which is just a way to say that the data exists for the length of that function call. Since all three steps happen in that one function call, ownership doesn't really affect us here.

    To better examine what ownership really means, let's add one small twist to our exercise:

    • The increment of each number in the list must happen in a separate function

    Read more…

  • 6

    SEP
    2014

    Taking Rust to Task

    Now that I've reached the point where I can get some Rust code running without asking questions in IRC every five minutes, I really wanted to play with some tasks. Tasks are the way Rust handles multiprocessing code. Under the hood they can map one-to-one with operating system threads or you can use a many-to-one mapping that I'm not ready to go into yet.

    Probably one of the most exciting aspect of tasks in Rust, in my opinion, is that unsafe use of shared memory is rejected outright as a compile error. That lead me to want to figure out how you communicate correctly. (Spoiler: the same was you do in Ruby: just pass messages.)

    Ready to dive in, I grossly simplified a recent challenge from work and coded it up in Rust. You can get the idea with a glance at main():

    use std::collections::HashMap;
    
    // ...
    
    fn string_vec(strs: &[&'static str]) -> Vec<String> {
        let mut v = Vec::new();
        for s in strs.iter() {
            v.push(s.to_string());
        }
        v
    }
    
    fn main() {
        let mut services = HashMap::new();
        services.insert("S1".to_string(), string_vec(["A", "B"]));
        services.insert("S2".to_string(), string_vec(["A", "C"]));
        services.insert("S3".to_string(), string_vec(["C", "D", "E", "F"]));
        services.insert("S4".to_string(), string_vec(["D", "B"]));
        services.insert("S5".to_string(), string_vec(["A", "Z"]));
    
        let work = Work(Search::new("A".to_string(), "B".to_string()));
    
        let mut task_manager = TaskManager::new(services);
        task_manager.run(work);
    }
    

    Read more…

    In: Rusting | Tags: Concurrency & Rust | 1 Comment
  • 27

    AUG
    2014

    Which Types to Type

    I've mentioned before that I'm writing some Rust code, specifically an RPN calculator as a simple exercise. I'm going to dump the code here so we can discuss one aspect of it, but do remember that I'm very new to Rust and this code could surely be better:

    use std::fmt;
    use std::os;
    
    struct Stack {
        numbers: Vec<f64>
    }
    impl Stack {
        fn new() -> Stack {
            Stack{numbers: vec![]}
        }
    
        fn is_empty(&self) -> bool {
            self.numbers.is_empty()
        }
    
        fn push(&mut self, number: f64) {
            self.numbers.push(number);
        }
    
        fn result(&self) -> f64 {
            *self.numbers.last().expect("Stack empty.")
        }
    
        fn add(&mut self)      { self._do_binary_operation(|l, r| l + r); }
        fn subtract(&mut self) { self._do_binary_operation(|l, r| l - r); }
        fn multiply(&mut self) { self._do_binary_operation(|l, r| l * r); }
        fn divide(&mut self)   { self._do_binary_operation(|l, r| l / r); }
    
        fn _do_binary_operation(&mut self, operation: |f64, f64| -> f64) {
            let r = self.numbers.pop().expect("Stack underflow.");
            let l = self.numbers.pop().expect("Stack underflow.");
            self.numbers.push(operation(l, r));
        }
    }
    impl fmt::Show for Stack {
        fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
            let mut s = String::new();
            let mut i = self.numbers.len();
            for number in self.numbers.iter() {
                i -= 1;
                s = s.add(&format!("{}: {}\n", i, number));
            }
            s.pop_char();
            write!(f, "{}", s)
        }
    }
    
    struct Tokenizer {
        tokens: Vec<String>,
        i:      uint
    }
    impl Tokenizer {
        fn new(expression: &str) -> Tokenizer {
            Tokenizer{
              tokens: expression.split(|c: char| c.is_whitespace())
                                .map(|s| s.to_string())
                                .collect(),
              i:      0
            }
        }
    
        fn has_next_token(&self) -> bool {
            self.i < self.tokens.len()
        }
    
        fn next_token(&mut self) -> &str {
            if !self.has_next_token() { fail!("Tokens exhausted.") }
    
            let token = self.tokens[self.i].as_slice();
            self.i   += 1;
            token
        }
    }
    
    struct RPNCalculator {
        stack:  Stack,
        tokens: Tokenizer
    }
    impl RPNCalculator {
        fn new(stack: Stack, tokens: Tokenizer) -> RPNCalculator {
            RPNCalculator{stack: stack, tokens: tokens}
        }
    
        fn calculate(&mut self) -> f64 {
            while self.tokens.has_next_token() {
                let token = self.tokens.next_token();
                if !self.stack.is_empty() {
                    println!("{}", self.stack);
                }
                println!("T: {}\n", token);
                match token {
                    "+" => { self.stack.add(); }
                    "-" => { self.stack.subtract(); }
                    "*" => { self.stack.multiply(); }
                    "/" => { self.stack.divide(); }
                    n   => { self.stack.push(from_str(n).expect("Not a number.")); }
                }
            }
            if !self.stack.is_empty() {
                println!("{}\n", self.stack);
            }
            self.stack.result()
        }
    }
    
    fn main() {
        let     expression = os::args();
        let     stack      = Stack::new();
        let     tokenizer  = Tokenizer::new(expression[1].as_slice());
        let mut calculator = RPNCalculator::new(stack, tokenizer);
        println!("{}", calculator.calculate());
    }
    

    Read more…

    In: Rusting | Tags: Rust & Style | 0 Comments
  • 22

    AUG
    2014

    Sleepy Programs

    When we think of real multiprocessing, our thoughts probably drift more towards languages like Erlang, Go, Clojure, or Rust. Such languages really focus on getting separate "processes" to communicate via messages. This makes it a lot easier to know when one process is waiting on another, because calls to receive messages typically block until one is available.

    But what about Ruby? Can we do intelligent process coordination in Ruby?

    Yes, we can. The tools for it are more awkward though. It's easy to run into tricky edge cases and hard to code your way out of them correctly.

    Let's play with an example to see how good we can make things. Here's what we will do:

    1. We will start one parent process that will fork() a single child process
    2. The child will push three messages onto a RabbitMQ queue and exit()
    3. The parent will listen for three messages to arrive, then exit()

    Here's a somewhat sloppy first attempt at solving this:

    #!/usr/bin/env ruby
    
    require "benchmark"
    
    require "bunny"
    
    QUEUE_NAME = "example"
    MESSAGES   = %w[first second third]
    
    def send_messages(*messages)
      connection = Bunny.new.tap(&:start)
      exchange   = connection.create_channel.default_exchange
    
      messages.each do |message|
        exchange.publish(message, routing_key: QUEUE_NAME)
      end
    
      connection.close
    end
    
    def listen_for_messages(received_messages)
      connection = Bunny.new.tap(&:start)
      queue      = connection.create_channel.queue(QUEUE_NAME, auto_delete: true)
    
      queue.subscribe do |delivery_info, metadata, payload|
        received_messages << payload
      end
    
      time_it("Received #{MESSAGES.size} messages") do
        yield
      end
    
      connection.close
    end
    
    def time_it(name)
      elapsed = Benchmark.realtime do
        yield
      end
      puts "%s: %.2fs" % [name, elapsed]
    end
    
    def wait_for_messages(received_messages)
      until received_messages == MESSAGES
        sleep 0.1  # don't peg the CPU while we wait
      end
    end
    
    def send_and_receive
      pid = fork do
        sleep 3  # make sure we're receiving before they are sent
        send_messages(*MESSAGES)
      end
      Process.detach(pid)
    
      received_messages = [ ]
      listen_for_messages(received_messages) do
        wait_for_messages(received_messages)
      end
    end
    
    send_and_receive
    

    Read more…

  • 21

    AUG
    2014

    Guard Clauses, Rust Style

    When I'm programming in Ruby, I will often use guard clauses to prevent undesirable scenarios. For example, let's say I'm building a simple Stack:

    class Stack
      def initialize
        @numbers = [ ]
      end
    
      def push(number)
        @numbers.push(number)
      end
    
      def peek
        fail "Stack underflow" if @numbers.empty?
    
        @numbers.last
      end
    end
    
    stack = Stack.new
    
    ARGV.each do |number|
      stack.push(number.to_f)
    end
    
    p stack.peek
    

    If I only want to work with numbers everywhere, I add a line like the call to fail() above. This prevents a nil from being returned from peek(), ruining my expectation that I will have numbers everywhere.

    When I first started playing with Rust, I wanted to write code the same way:

    use std::os;
    
    struct Stack {
        numbers: Vec<f64>
    }
    impl Stack {
        fn new() -> Stack {
            Stack{numbers: vec![]}
        }
    
        fn push(&mut self, number: f64) {
            self.numbers.push(number);
        }
    
        fn peek(&self) -> f64 {
            if self.numbers.is_empty() { fail!("Stack underflow"); }
    
            self.numbers.last()
        }
    }
    
    fn main() {
        let mut stack = Stack::new();
    
        for number in os::args().tail().iter() {
            stack.push(from_str(number.as_slice()).expect("Not a number"));
        }
    
        println!("{}", stack.peek());
    }
    

    Read more…

  • 21

    AUG
    2014

    Asking Better Questions

    I've been playing with some Rust lately and learning a bunch of new concepts.

    As part of my experimentation, I built an RPN calculator in the language, just as an exercise that would require a moderate amount of code to be worked out. Honestly, I don't fully understand everything that I had to do to get this code working yet. Maybe 20% of it came about as me following instructions from what seem to be very helpful compiler errors.

    I wanted to start attacking these concepts that I didn't understand to increase my knowledge. Of course, I was impatient. I had some working code and I knew what I was missing, so I jumped into IRC, pointed at the code, and asked some not-at-all complete questions. All I got was crickets.

    This isn't a failing of the Rust community. It's a lesson I have to relearn every now and then. You have to take the time to present a good enough question that answering it is easy enough and worth it. It's hard work to get your head around 100 lines of code and, even if you do, you'll still be missing plenty of context if the question isn't really well formed. Given that, most people just ignore the question. That's probably for the better too, because any answers provided likely would have missed the points I really needed help with.

    Read more…

    In: Rusting | Tags: Community & Rust | 0 Comments
  • 20

    JUL
    2014

    Dave's No Tests Challenge

    I've mentioned before my difficulties in the 2014 IPSC. But taking one beating is no reason not to try again. The first loss just showed me that the contest still had more to teach me.

    A buddy of mine has spent some time with the crossword problem and told me that he enjoyed it. I didn't try this problem during the actual event, but I was a little familiar with it from my friend's description.

    To add to the fun, I decided this would be a great excuse to take up the recent challenge Dave Thomas gave to the Ruby Rogues: "Stop writing tests."

    Step 1: Feedback Loops

    Without tests to guide me, I really want to see what's going on. One of the biggest advantages of tests, in my opinion, is the feedback loop it provides. So I set out to provide my own feedback.

    Since the problem at hand involves filling in a crossword board, the easiest feedback loop I could think of was to see the board as it fills in. The final board is also the required output. Therefor, I decided a good first step would just be to read the board into some data structure and write it back out. Once I had that, I could insert code between those steps to fill it in. And constantly seeing the board evolve would let me eyeball things for obvious mistakes.

    Read more…

  • 10

    JUL
    2014

    One Programmer's Library

    I have always enjoyed the posts where people list out all of the books they think are important, given some subject. They vary wildly. For example, some are very to-the-point while others are rich in detail (and Danielle makes one of those each year).

    I began to wonder what my list would look like. Below I've tried to make those decisions. I was surprised by just how hard it is to restrict myself to the bare essentials. A lot of things that I read influence me in some way.

    The Classics

    I'll start with the super obvious titles you've likely heard mentioned before.

    • Refactoring: Improving the Design of Existing Code is probably the classically great programming text that had the biggest effect on me. This book teaches you how to turn the code you have into the code you want. It doesn't get much more essential than that.
    • Smalltalk Best Practice Patterns is the book I learned a new language just to read. It's worth that. This is The Field Manual of Object-Oriented Tactics and it helps you know what to do line-by-line.
    • Patterns of Enterprise Application Architecture is the book you should read so you can see how much great knowledge we've had about building programs for over a decade. Odds are that this book can teach you multiple strategies for problems you face regularly.
    • The Pragmatic Programmer: From Journeyman to Master is one of those rare books that can make you a better person, in addition to a better programmer. Advice like "Fix Broken Windows" and "Make Stone Soup" have universal scope. This is a must read.
    • Programming Pearls (2nd Edition) is a book about algorithms and, eventually, all programmers need to learn some algorithms. The upside of this title is that it's fun from page one. That helps to keep you interested in what can be a dry topic.
    • Growing Object-Oriented Software, Guided by Tests is almost surely the single biggest influence on how I do Test-Driven Development. There are other schools of thought but pick one and dig deep enough into TDD until you're confident about when and how to use it. This is a tool everyone needs in their toolbox.

    Read more…

  • 26

    JUN
    2014

    IPSC 2014 Postmortem

    I decided to give the Internet Problem Solving Contest (IPSC) a go this year, with a couple of friends. I've done it in the past and enjoyed it. I like how it only eats a few hours one day and I like how the variety in the problems they give you keeps things interesting.

    That said, my performance in the IPSC this year is probably best described as, "Three strikes and you're out!" I did terrible.

    I solved one very simple problem. I spent the rest of the contest chasing after a much harder challenge that I couldn't complete in the time allowed.

    The worst part is that I made some silly mistakes that I've learned to avoid in the past. As penance, I offer up this article, mainly as a reminder to myself, but hopefully also as a tool that could save others from some of my folly.

    Let's start with a simple mistake I made…

    Not All Problems are Programming Problems

    The IPSC does a great job each year of reminding us that some problems are trivial to solve without programming. It's a good thing they do too, because I seem to need a lot of reminding.

    Read more…