-
20
SEP
2014Can You snake_case/CamelCase With One Regex?
In Rails, methods like
underscore()
andcamelize()
use several regexen to transform theString
under the hood. Many people have asked if you can do it with a single regex though. These specs I borrowed from Rails seem to say yes:#!/usr/bin/env ruby -w class String def snake_case(acronyms = self.class.acronyms) gsub( %r{ (?: (?<before> \b | [A-Za-z\d] ) (?<acronym> #{acronyms.regex} ) (?<after> \b | [^a-z] ) ) | (?: (?<before> [A-Z]+ ) (?<after> [A-Z][^A-Z] ) ) | (?: (?<before> [^A-Z:] ) (?<after> [A-Z] ) ) | (?<nesting> :: ) }x ) { |m| if $~[:nesting] "/" else [$~[:before], $~[:acronym], $~[:after]] .compact .reject(&:empty?) .join("_") end }.downcase end def CamelCase(acronyms = self.class.acronyms) gsub( %r{ (?: (?: \A | _ | (?<nesting> / ) ) (?<acronym> #{acronyms.inverted_regex} ) (?= \b | [A-Z_] ) ) | (?: (?: \A | _ ) (?<letter> . ) ) | (?: (?<nesting> / ) (?<letter> . ) ) }mx ) { nested = $~[:nesting] && "::" capitalized = acronyms.capitalize($~[:acronym]) { $~[:letter].upcase } "#{nested}#{capitalized}" } end def camelCase self.CamelCase.sub(/\A[A-Z]/) { |first_char| first_char.downcase } end def self.acronyms @acronyms ||= AcronymManager.new end end class AcronymManager NEVER_MATCHES = /\zA/ def initialize @acronyms = { } @inverted = { } end attr_reader :acronyms, :inverted private :acronyms, :inverted def add(acronym) acronyms[acronym] = acronym.downcase @inverted = acronyms.invert end def regex return NEVER_MATCHES if acronyms.empty? /(?:#{acronyms.keys.map(&Regexp.method(:escape)).join('|')})/ end def inverted_regex return NEVER_MATCHES if acronyms.empty? /(?:#{inverted.keys.map(&Regexp.method(:escape)).join('|')})/ end def capitalize(acronym, &default) inverted.fetch(acronym, &default) end end if $PROGRAM_NAME == __FILE__ require "minitest/autorun" describe "Case changing" do # https://github.com/rails/rails/blob/ # 620f4a4fc962c863b91a51876ffdf58f33bedb9c/activesupport/test/ # inflector_test_cases.rb#L118-L123 let(:examples) { { "Product" => "product", "SpecialGuest" => "special_guest", "ApplicationController" => "application_controller", "Area51Controller" => "area51_controller", } } # https://github.com/rails/rails/blob/ # 620f4a4fc962c863b91a51876ffdf58f33bedb9c/activesupport/test/ # inflector_test_cases.rb#L139-L145 let(:one_way_snake_examples) { { "HTMLTidy" => "html_tidy", "HTMLTidyGenerator" => "html_tidy_generator", "FreeBSD" => "free_bsd", "HTML" => "html", "ForceXMLController" => "force_xml_controller" } } # https://github.com/rails/rails/blob/ # 620f4a4fc962c863b91a51876ffdf58f33bedb9c/activesupport/test/ # inflector_test.rb#L98 let(:one_way_camel_examples) { { "CamelCase" => "Camel_Case" } } # added by James let(:path_examples) { { "SomeLib::WithClass" => "some_lib/with_class" } } # https://github.com/rails/rails/blob/ # 620f4a4fc962c863b91a51876ffdf58f33bedb9c/activesupport/test/ # inflector_test.rb#L101-L145 let(:acronym_examples) { { "API" => "api", "APIController" => "api_controller", "Nokogiri::HTML" => "nokogiri/html", "HTTPAPI" => "http_api", "HTTP::Get" => "http/get", "SSLError" => "ssl_error", "RESTful" => "restful", "RESTfulController" => "restful_controller", "Nested::RESTful" => "nested/restful", "IHeartW3C" => "i_heart_w3c", "PhDRequired" => "phd_required", "IRoRU" => "i_ror_u", "RESTfulHTTPAPI" => "restful_http_api", # misdirection "Capistrano" => "capistrano", "CapiController" => "capi_controller", "HttpsApis" => "https_apis", "Html5" => "html5", "Restfully" => "restfully", "RoRails" => "ro_rails" } } it "can snake_case a String" do examples.each do |camel, snake| camel.snake_case.must_equal(snake) end end it "can handle some tricky one-way cases for snake_case" do one_way_snake_examples.each do |camel, snake| camel.snake_case.must_equal(snake) end end it "can CamelCase a String" do examples.each do |camel, snake| snake.CamelCase.must_equal(camel) end end it "can handle some tricky one-way cases for CamelCase" do one_way_camel_examples.each do |camel, snakey| snakey.CamelCase.must_equal(camel) end end it "can camelCase a String" do "camel_case".camelCase.must_equal("camelCase") end it "can convert nesting to paths and back" do path_examples.each do |camel, snake| camel.snake_case.must_equal(snake) snake.CamelCase.must_equal(camel) end end it "is aware of acronyms" do acronyms = AcronymManager.new acronyms.add("API") acronyms.add("HTML") acronyms.add("HTTP") acronyms.add("RESTful") acronyms.add("W3C") acronyms.add("PhD") acronyms.add("RoR") acronyms.add("SSL") acronym_examples.each do |camel, snake| camel.snake_case(acronyms).must_equal(snake) snake.CamelCase(acronyms).must_equal(camel) end end end end
-
19
SEP
2014"You can't parse [X]HTML with regex."
The only explanation I'll give for the following code it to provide this link to my favorite Stack Overflow answer.
#!/usr/bin/env ruby -w require "open-uri" URL = "http://stackoverflow.com/questions/1732348/" + "regex-match-open-tags-except-xhtml-self-contained-tags" PARSER = %r{ (?<doctype_declaration> <!DOCTYPE\b (?<doctype> [^>]* ) > ){0} (?<comment> <!-- .* --> ){0} (?<script_tag> < \s* (?<tag_name> script ) \s* (?<attributes> [^>]* > ) (?<script> .*? ) < \s* / \s* script \s* > ){0} (?<self_closed_tag> < \s* (?<tag_name> \w+ ) \s* (?<attributes> [^>]* / \s* > ) ){0} (?<unclosed_tag> < \s* (?<tag_name> link | meta | br | input | hr | img ) \b \s* (?<attributes> [^>]* > ) ){0} (?<open_tag> < \s* (?<tag_name> \w+ ) \s* (?<attributes> [^>]* > ) ){0} (?<close_tag> < \s* / \s* (?<tag_name> \w+ ) \s* > ){0} (?<attribute> (?<attribute_name> [-\w]+ ) (?: \s* = \s* (?<attribute_value> "[^"]*" | '[^']*' | [^>\s]+ ) )? \s* ){0} (?<attribute_list> \g<attribute> (?= [^>]* > \z ) # attributes keep a trailing > to disambiguate from text ){0} (?<text> (?! [^<]* /?\s*> \z ) # a guard to prevent this from parsing attributes [^<]+ ){0} \G (?: \g<doctype_declaration> | \g<comment> | \g<script_tag> | \g<self_closed_tag> | \g<unclosed_tag> | \g<open_tag> | \g<attribute_list> | \g<close_tag> | \g<text> ) \s* }mix def parse(html) stack = [{attributes: [ ], contents: [ ], name: :root}] loop do html.sub!(PARSER, "") or break if $~[:doctype_declaration] add_to_tree(stack.last, "DOCTYPE", $~[:doctype].strip) elsif $~[:script_tag] add_to_stack(stack, $~[:tag_name], $~[:attributes], $~[:script]) elsif $~[:self_closed_tag] || $~[:unclosed_tag] || $~[:open_tag] add_to_stack(stack, $~[:tag_name], $~[:attributes], "", $~[:open_tag]) elsif $~[:close_tag] stack.pop elsif $~[:text] stack.last[:contents] << $~[:text] end end stack.pop end def add_to_tree(branch, name, value) if branch.include?(name) branch[name] = [branch[name]] unless branch[name].is_a?(Array) branch[name] << value else branch[name] = value end end def add_to_stack(stack, tag_name, attributes_html, contents, open = false) tag = { attributes: parse_attributes(attributes_html), contents: [contents].reject(&:empty?), name: tag_name } add_to_tree(stack.last, tag_name, tag) stack.last[:contents] << tag stack << tag if open end def parse_attributes(attributes_html) attributes = { } loop do attributes_html.sub!(PARSER, "") or break add_to_tree( attributes, $~[:attribute_name], ($~[:attribute_value] || $~[:attribute_name]).sub(/\A(["'])(.*)\1\z/, '\2') ) end attributes end def convert_to_bbcode(node) if node.is_a?(Hash) name = node[:name].sub(/\Astrike\z/, "s") "[#{name}]#{node[:contents].map { |c| send(__method__, c) }.join}[/#{name}]" else node end end html = open(URL, &:read).strip ast = parse(html) puts ast["html"]["body"]["div"] .find { |div| div[:attributes]["class"] == "container" }["div"] .find { |div| div[:attributes]["id"] == "content" }["div"]["div"] .find { |div| div[:attributes]["id"] == "mainbar" }["div"] .find { |div| div[:attributes]["id"] == "answers" }["div"] .find { |div| div[:attributes]["id"] == "answer-1732454" }["table"]["tr"] .first["td"] .find { |div| div[:attributes]["class"] == "answercell" }["div"]["p"] .first[:contents] .map(&method(:convert_to_bbcode)) # to reach a wider audience .join
-
11
SEP
2014Experimenting With Ownership
Let's use a trivial exercise to see what we can learn about ownership, moving, borrowing, and more in Rust. Here's the idea:
- We'll allocate a list of numbers
- We'll add one to each number in the list
- We'll print the resulting list of numbers
This is a simple process requiring only a few lines of code:
fn main() { let mut numbers = vec![1u, 2, 3]; for n in numbers.mut_iter() { *n += 1; } println!("{}", numbers); }
The output is hopefully what we all expect to see:
$ ./one_function [2, 3, 4]
In this code there is just one variable:
numbers
. That variable owns a list of numbers on the heap and it's scope is limited to themain()
function, which is just a way to say that the data exists for the length of that function call. Since all three steps happen in that one function call, ownership doesn't really affect us here.To better examine what ownership really means, let's add one small twist to our exercise:
- The increment of each number in the list must happen in a separate function
-
6
SEP
2014Taking Rust to Task
Now that I've reached the point where I can get some Rust code running without asking questions in IRC every five minutes, I really wanted to play with some tasks. Tasks are the way Rust handles multiprocessing code. Under the hood they can map one-to-one with operating system threads or you can use a many-to-one mapping that I'm not ready to go into yet.
Probably one of the most exciting aspect of tasks in Rust, in my opinion, is that unsafe use of shared memory is rejected outright as a compile error. That lead me to want to figure out how you communicate correctly. (Spoiler: the same was you do in Ruby: just pass messages.)
Ready to dive in, I grossly simplified a recent challenge from work and coded it up in Rust. You can get the idea with a glance at
main()
:use std::collections::HashMap; // ... fn string_vec(strs: &[&'static str]) -> Vec<String> { let mut v = Vec::new(); for s in strs.iter() { v.push(s.to_string()); } v } fn main() { let mut services = HashMap::new(); services.insert("S1".to_string(), string_vec(["A", "B"])); services.insert("S2".to_string(), string_vec(["A", "C"])); services.insert("S3".to_string(), string_vec(["C", "D", "E", "F"])); services.insert("S4".to_string(), string_vec(["D", "B"])); services.insert("S5".to_string(), string_vec(["A", "Z"])); let work = Work(Search::new("A".to_string(), "B".to_string())); let mut task_manager = TaskManager::new(services); task_manager.run(work); }
-
27
AUG
2014Which Types to Type
I've mentioned before that I'm writing some Rust code, specifically an RPN calculator as a simple exercise. I'm going to dump the code here so we can discuss one aspect of it, but do remember that I'm very new to Rust and this code could surely be better:
use std::fmt; use std::os; struct Stack { numbers: Vec<f64> } impl Stack { fn new() -> Stack { Stack{numbers: vec![]} } fn is_empty(&self) -> bool { self.numbers.is_empty() } fn push(&mut self, number: f64) { self.numbers.push(number); } fn result(&self) -> f64 { *self.numbers.last().expect("Stack empty.") } fn add(&mut self) { self._do_binary_operation(|l, r| l + r); } fn subtract(&mut self) { self._do_binary_operation(|l, r| l - r); } fn multiply(&mut self) { self._do_binary_operation(|l, r| l * r); } fn divide(&mut self) { self._do_binary_operation(|l, r| l / r); } fn _do_binary_operation(&mut self, operation: |f64, f64| -> f64) { let r = self.numbers.pop().expect("Stack underflow."); let l = self.numbers.pop().expect("Stack underflow."); self.numbers.push(operation(l, r)); } } impl fmt::Show for Stack { fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { let mut s = String::new(); let mut i = self.numbers.len(); for number in self.numbers.iter() { i -= 1; s = s.add(&format!("{}: {}\n", i, number)); } s.pop_char(); write!(f, "{}", s) } } struct Tokenizer { tokens: Vec<String>, i: uint } impl Tokenizer { fn new(expression: &str) -> Tokenizer { Tokenizer{ tokens: expression.split(|c: char| c.is_whitespace()) .map(|s| s.to_string()) .collect(), i: 0 } } fn has_next_token(&self) -> bool { self.i < self.tokens.len() } fn next_token(&mut self) -> &str { if !self.has_next_token() { fail!("Tokens exhausted.") } let token = self.tokens[self.i].as_slice(); self.i += 1; token } } struct RPNCalculator { stack: Stack, tokens: Tokenizer } impl RPNCalculator { fn new(stack: Stack, tokens: Tokenizer) -> RPNCalculator { RPNCalculator{stack: stack, tokens: tokens} } fn calculate(&mut self) -> f64 { while self.tokens.has_next_token() { let token = self.tokens.next_token(); if !self.stack.is_empty() { println!("{}", self.stack); } println!("T: {}\n", token); match token { "+" => { self.stack.add(); } "-" => { self.stack.subtract(); } "*" => { self.stack.multiply(); } "/" => { self.stack.divide(); } n => { self.stack.push(from_str(n).expect("Not a number.")); } } } if !self.stack.is_empty() { println!("{}\n", self.stack); } self.stack.result() } } fn main() { let expression = os::args(); let stack = Stack::new(); let tokenizer = Tokenizer::new(expression[1].as_slice()); let mut calculator = RPNCalculator::new(stack, tokenizer); println!("{}", calculator.calculate()); }
-
22
AUG
2014Sleepy Programs
When we think of real multiprocessing, our thoughts probably drift more towards languages like Erlang, Go, Clojure, or Rust. Such languages really focus on getting separate "processes" to communicate via messages. This makes it a lot easier to know when one process is waiting on another, because calls to receive messages typically block until one is available.
But what about Ruby? Can we do intelligent process coordination in Ruby?
Yes, we can. The tools for it are more awkward though. It's easy to run into tricky edge cases and hard to code your way out of them correctly.
Let's play with an example to see how good we can make things. Here's what we will do:
- We will start one parent process that will
fork()
a single child process - The child will push three messages onto a RabbitMQ queue and
exit()
- The parent will listen for three messages to arrive, then
exit()
Here's a somewhat sloppy first attempt at solving this:
#!/usr/bin/env ruby require "benchmark" require "bunny" QUEUE_NAME = "example" MESSAGES = %w[first second third] def send_messages(*messages) connection = Bunny.new.tap(&:start) exchange = connection.create_channel.default_exchange messages.each do |message| exchange.publish(message, routing_key: QUEUE_NAME) end connection.close end def listen_for_messages(received_messages) connection = Bunny.new.tap(&:start) queue = connection.create_channel.queue(QUEUE_NAME, auto_delete: true) queue.subscribe do |delivery_info, metadata, payload| received_messages << payload end time_it("Received #{MESSAGES.size} messages") do yield end connection.close end def time_it(name) elapsed = Benchmark.realtime do yield end puts "%s: %.2fs" % [name, elapsed] end def wait_for_messages(received_messages) until received_messages == MESSAGES sleep 0.1 # don't peg the CPU while we wait end end def send_and_receive pid = fork do sleep 3 # make sure we're receiving before they are sent send_messages(*MESSAGES) end Process.detach(pid) received_messages = [ ] listen_for_messages(received_messages) do wait_for_messages(received_messages) end end send_and_receive
- We will start one parent process that will
-
21
AUG
2014Guard Clauses, Rust Style
When I'm programming in Ruby, I will often use guard clauses to prevent undesirable scenarios. For example, let's say I'm building a simple
Stack
:class Stack def initialize @numbers = [ ] end def push(number) @numbers.push(number) end def peek fail "Stack underflow" if @numbers.empty? @numbers.last end end stack = Stack.new ARGV.each do |number| stack.push(number.to_f) end p stack.peek
If I only want to work with numbers everywhere, I add a line like the call to
fail()
above. This prevents anil
from being returned frompeek()
, ruining my expectation that I will have numbers everywhere.When I first started playing with Rust, I wanted to write code the same way:
use std::os; struct Stack { numbers: Vec<f64> } impl Stack { fn new() -> Stack { Stack{numbers: vec![]} } fn push(&mut self, number: f64) { self.numbers.push(number); } fn peek(&self) -> f64 { if self.numbers.is_empty() { fail!("Stack underflow"); } self.numbers.last() } } fn main() { let mut stack = Stack::new(); for number in os::args().tail().iter() { stack.push(from_str(number.as_slice()).expect("Not a number")); } println!("{}", stack.peek()); }
-
21
AUG
2014Asking Better Questions
I've been playing with some Rust lately and learning a bunch of new concepts.
As part of my experimentation, I built an RPN calculator in the language, just as an exercise that would require a moderate amount of code to be worked out. Honestly, I don't fully understand everything that I had to do to get this code working yet. Maybe 20% of it came about as me following instructions from what seem to be very helpful compiler errors.
I wanted to start attacking these concepts that I didn't understand to increase my knowledge. Of course, I was impatient. I had some working code and I knew what I was missing, so I jumped into IRC, pointed at the code, and asked some not-at-all complete questions. All I got was crickets.
This isn't a failing of the Rust community. It's a lesson I have to relearn every now and then. You have to take the time to present a good enough question that answering it is easy enough and worth it. It's hard work to get your head around 100 lines of code and, even if you do, you'll still be missing plenty of context if the question isn't really well formed. Given that, most people just ignore the question. That's probably for the better too, because any answers provided likely would have missed the points I really needed help with.
-
20
JUL
2014Dave's No Tests Challenge
I've mentioned before my difficulties in the 2014 IPSC. But taking one beating is no reason not to try again. The first loss just showed me that the contest still had more to teach me.
A buddy of mine has spent some time with the crossword problem and told me that he enjoyed it. I didn't try this problem during the actual event, but I was a little familiar with it from my friend's description.
To add to the fun, I decided this would be a great excuse to take up the recent challenge Dave Thomas gave to the Ruby Rogues: "Stop writing tests."
Step 1: Feedback Loops
Without tests to guide me, I really want to see what's going on. One of the biggest advantages of tests, in my opinion, is the feedback loop it provides. So I set out to provide my own feedback.
Since the problem at hand involves filling in a crossword board, the easiest feedback loop I could think of was to see the board as it fills in. The final board is also the required output. Therefor, I decided a good first step would just be to read the board into some data structure and write it back out. Once I had that, I could insert code between those steps to fill it in. And constantly seeing the board evolve would let me eyeball things for obvious mistakes.
-
10
JUL
2014One Programmer's Library
I have always enjoyed the posts where people list out all of the books they think are important, given some subject. They vary wildly. For example, some are very to-the-point while others are rich in detail (and Danielle makes one of those each year).
I began to wonder what my list would look like. Below I've tried to make those decisions. I was surprised by just how hard it is to restrict myself to the bare essentials. A lot of things that I read influence me in some way.
The Classics
I'll start with the super obvious titles you've likely heard mentioned before.
- Refactoring: Improving the Design of Existing Code is probably the classically great programming text that had the biggest effect on me. This book teaches you how to turn the code you have into the code you want. It doesn't get much more essential than that.
- Smalltalk Best Practice Patterns is the book I learned a new language just to read. It's worth that. This is The Field Manual of Object-Oriented Tactics and it helps you know what to do line-by-line.
- Patterns of Enterprise Application Architecture is the book you should read so you can see how much great knowledge we've had about building programs for over a decade. Odds are that this book can teach you multiple strategies for problems you face regularly.
- The Pragmatic Programmer: From Journeyman to Master is one of those rare books that can make you a better person, in addition to a better programmer. Advice like "Fix Broken Windows" and "Make Stone Soup" have universal scope. This is a must read.
- Programming Pearls (2nd Edition) is a book about algorithms and, eventually, all programmers need to learn some algorithms. The upside of this title is that it's fun from page one. That helps to keep you interested in what can be a dry topic.
- Growing Object-Oriented Software, Guided by Tests is almost surely the single biggest influence on how I do Test-Driven Development. There are other schools of thought but pick one and dig deep enough into TDD until you're confident about when and how to use it. This is a tool everyone needs in their toolbox.