Archive for May 9th, 2009

Baldrick – a dogsbody

May 9, 2009

Ruby is an awesome language for hacking stuff together (amongst other things of course) but do you want something that takes care of more of the plumbing, especially where web feeds are involved? Baldrick will service your whims.

Check it out at Github.

The Problem

Where I used to work we used a few Delcom build lights to monitor our continuous integration build. The scripts used to run these things are great fun to write (probably why we had a few different ones floating around) but the code to monitor the RSS feed containing the build status was quite repetitive. What we really cared about was linking a change in status to a change in light colour (and behaviour), not how to pull apart the RSS for the stuff we needed.

We had modified our light scripts to make the lights flash when the build had been newly broken. Someone could then ‘claim’ the build and stop it flashing by hitting a particular web URL. We also tended to communicate such things over IRC or some other means of broadcast. I thought it would be cool if you only had to claim the build in your message and that something could pick that up and change the status of the light for you.

At the same time I was playing a lot with Sinatra and I was giddy as a schoolgirl at just how easy it was to knock out a simple web server in a few lines. The magic ability to just execute a script and have it run as a web server really ticked my fancy. I thought that I’d love to have something like Sinatra that took care of the plumbing and allowed me to easily glue events to actions.

I was also impressed with the syntax of Cucumber steps and the ability to join up a textual step with the implementation via regular expressions.

All of things came together as Baldrick:

#cuppa.rb
require 'rubygems'
require 'baldrick_serve'

listen_to :feed, :at => 'http://search.twitter.com/search.atom?q=cup+of'

on_hearing /cup of (.*?)[\.,]/ do |beverage, order|
  puts "#{order[:who]} would like a cup of #{beverage}"  
end

Executing the above script will start a Baldrick server that listens to a Twitter feed for messages containing ‘cup of’. When a tweet containing a cup of something is found, the name of the tweeter and the beverage (perhaps) are spat out to the console.

How Does it Work?

Baldrick listens to a number of sources (at the moment RSS/Atom feeds and Injour statuses) for orders. The content of these sources is wrangled into a common format containing who, what, where and when.

From there its a case of hooking an order up to a task. When you define a task you give it a block to call when it is triggered. On receiving a new order, Baldrick will trigger all matching tasks. This also means that you can have orders from multiple sources triggering the same task if you so desire. Capturing groups within the regex will be passed as arguments to the block, followed by the order.

Baldrick uses the same tricks as Sinatra to allow an arbitrary script to be run as a server (the #at_exit hook).

Writing your own listeners is a snap, check out the wiki for details.

Try it out and drop me a line to tell me what you use it for.

Advertisements

Numerouno – number parsing for Ruby

May 9, 2009

How do I turn a string like ‘forty two’ into something I can manipulate as a number? String has the #to_i method but that only works on numerals like ‘3’. Numerouno is an English natural language parser for numbers.

Check it out at github.

The Problem

I hit this problem a few times in the past while writing Cucumber features that contained textual descriptions of numbers. Being good little BDD elves, we had worked very hard to keep the feature language true to that used by the customer. We were already using the awesome Chronic for parsing descriptions of dates and times which went a long way to preserving the language.

Unfortunately, describing numbers still seemed a bit clunky. We had steps like:

When I hop 37 times

The above is not ugly by any means, more mildly irritating. The main thing is that this is not how I would write the sentence. Maybe you find ’37’ more concise but to me it sticks out like a sore digit (ha ha) in an otherwise natural looking sentence. I want to write something like:

When I hop thirty seven times

And indeed now I can! Hurray hurrah!

require 'numerouno'
'thirty six billion, three hundred and ninety two'.as_number
 => 36000000392

How does it work?

The problem of parsing English number phrases was an interesting one and it took me a while to model it in a way that wasn’t totally confusing. Basically the current approach goes a little like this:

Identify individual numbers in the string

The first thing is to turn ‘thirty six billion, three hundred and ninety two’ into something we can manipulate a little easier, [30, 6, 1000000000, 300, 90, 2]. Simple regex matching is used to identify individual numbers.

Combine numbers

The English language has certain rules for interpreting numbers in a sentence. The rules most often revolve around numbers that are powers of ten, one hundred, one thousand, one million and so on. Once you hit one of these numbers you can start applying rules for the numbers either side of it to mash them into a combined figure.

The rules typically lead to you multiplying by the number to the left and then adding the number to the right. For example ‘five thousand and one’ => [5, 1000, 1] => 5 * 1000 + 1 => 5001.

Combination is done in several passes to ensure that lower powers of ten are combined properly before attempting to combine them with higher ones. Once all combination passes have been made a final step sums up the resulting list of combined numbers for the actual figure.

Limitations

At the moment only whole numbers up to those in the trillions are supported. The following things are not:

  • anything bigger than nine hundred and ninety nine trillion, nine hundred and ninety nine billion, nine hundred and ninety nine million, nine hundred and ninety nine thousand, nine hundred and ninety nine
  • fractions be they decimal or otherwise
  • other variations of numbers like ‘third’, ‘thirteenth’
  • slang like ‘K’, ‘grand’
  • any language except English. The rules for interpreting number are specific to the English language.

Yes, ironically Numerouno does not recognise ‘numero uno’.

If in doubt, try it out. Rhymes.