Archive for May, 2009

keeping on the tail of code quality with a ratchet

May 29, 2009

High code quality is one of my all time favourite things, up there with beer, icecream and when a bird sings. When I talk about quality in this sense I mean the maintainability of code. Quality is not a finite thing; instead it is a subjective little creature, a slippery invertebrate that squirms and changes over time. The subjective nature of quality is something that we have to live with, a more solvable problem is one of adapting to our changing ideas of it.

The Problem

Bash!

Bash!

A number of tools exist already exist for measuring and monitoring the quality of code in a variety of programming languages but some of the ways in which we use them are flawed. A common approach to style checking involves encoding our current view of what constitutes quality as a series of rules. Those rules can then be engraved in stone and used to bludgeon our code and developers from that day forward.

Our idea of quality changes over time as we gain experience and understanding. Frequent reassessment of what constitutes quality is a fantastic thing but unfortunately we don’t tend to do it all that much. Its hard to keep discussion rolling about design and standards unless we are forced to. It doesn’t help that a sudden shift in ideas about quality can suddenly cause us to view our legacy code base in a different light.

What do we do when we have a bunch of legacy code written to a previous set of code quality standards but then the standards change? There seems to be a few options; make all of the code conform in a big bang refactoring or relax the automated checks.

Big bang refactorings are risky, exhausting and disruptive. If your idea of what constitutes quality code has changed dramatically then you may have a lot of work ahead of you. I for one do not relish the idea of having to absorb that in one huge hit.

Relaxing the automated checks would be asking for trouble. What is to stop someone introducing additional code that completely violates your shiny new standards of quality? Nothing. You just need to survive on good will and pixie dust until you can do the above, the big bang refactoring.

As the size of a team grows, communication becomes harder. We tend to have less informal discussions about things like quality because of the difficulties of coordinating a bigger group. Sometimes communication gets a little neglected unless we are prompted to discuss things regularly.

How can you have the freedom to reassess your code quality standards and bring existing code in line with them over time? How can you prompt regular reassessment and discussion?

Style Violations

Yuck

Yuck

We are used to seeing style violations like nasty little cockroaches, scuttling about on our precious code. They make us cry. Nobody likes them. But cockroaches get a bad rap; they are relatively clean little fellows. They just love a filthy surface.

Style violations are not the problem, they are only symptomatic of it (maybe). They are indications of a possible problem. The problem is not that the method complexity metric was violated, the problem may be that the method is too complex to understand.

As any university student will tell you, cockroaches are only a problem if your hygiene standards are sufficiently high. If you all of a sudden decide that you really hate cockroaches, you may get a nasty surprise the next time you turn the kitchen light on. You can’t treat the problem immediately though, it takes time. This involves tolerating a certain number of violations.

The violation threshold represents the number of violations that you will tolerate; the minimum level of hygiene that must be maintained. If the current number of violations exceeds this threshold, the style checks should fail. If the number drops below, the threshold should be tightened down to the new count. The aim should be to keep driving the threshold towards zero, it should only ever be increased when the style rules are changed. So:

  • threshold only goes up when style rules have changed, never because of code changes
  • threshold only goes down when code has been improved or if the style rules are relaxed (and this should never happen lightly)

Introducing a Ratchet

I love that clicking noise

I love that clicking noise

I first heard about the concept of a ratchet in Chris Stevenson’s blog post. The gist of this approach is to steadily tighten accepted levels of one metric. The ratchet effect is that the levels are never allowed to slacken, only to be tightened. When used to improve code quality, old code will be tolerated until it can be cleaned up (but not allowed to degrade any further) and new code is held to the newer, stricter quality standards.

The first step in implementing a code quality ratchet is to decide what our current idea of quality is. Pick out some common metrics like complexity and class/method length and come up with some starting levels. Choose levels that are aggressive, remember that they won’t be set in stone; they should cause another discussion later on.

Once you have decided on some initial checks and levels, run them against your current code and note the number of violations. The number of violations that pops up is your initial threshold. Most style checking tools like Checkstyle have the ability to set a maximum violations figure, use whatever means to set the figure to your threshold.

Once you have your ratchet in place preventing things from getting worse, you need to figure out how to tighten it. The best method is to tighten the levels automatically as soon as they drop. You can work your own build magic to do this but its prettier in some build languages than it is in others (I’m looking at you, Ant).

Having the facilities to tighten the ratchet when the number of violations drop is good, but how do you continually drive them down? Try and set targets for each iteration or release. Make the current threshold easily visible to everyone and review progress regularly. The easy pickings will soon evaporate and expose the meatier challenges.

If you do ever drive your threshold to zero violations then make sure to have more discussions about quality. Were the metrics aggressive enough? Is the system there is terms of desired quality levels? If so, pat yourself on the back. If not, reset the rules, set a new threshold and get to it.

Communication

Keep an ear out for friction

Keep an ear out for friction

So this can be a useful technical approach for increasing the level of code quality but I think the real values lies in the discussion it encourages.

When someone is being prevented from checking in because their changes have broken the ratchet, the rest of the team will know about it (usually manifested as “arrrgh! the fucking ratchet won’t let me check in!”). These times are a prompt to have a discussion about the changes, namely which rule was violated and what it suggests about the current design of the code. Why did we set this rule? What situation is it trying to guard against?

Violations do not pop out as neat little tickets telling you what is wrong and how you need to fix it. Style violations are the prompt for the team to find out what the real problem is and how it can be fixed.

Understandably, violations will be a major cause of frustration. Legacy code will have many tissue-thin spots that teeter on the edge of breaking; it sucks to be the one holding the bomb when it goes off. People can view the style checks as a nuisance when they are too focused on their primary goal of just getting their chunk of work out of the door. A significant share of the focus needs to be awarded to maintaining and improving quality. Negative energy needs to be channeled into discussion about the real problem and how it is going to be fixed. Bigger, more painful nips from the style checking tool should also encourage people to run the checks more frequently.

Style violations have the magic effect of being a catalyst for design discussions that would not normally have taken place. The timing is not ideal (the code has already been produced) but it is better than nothing. More often than not, there is a better design that would satisfy the current notion of quality. Hopefully the problem is then resolved, a new direction is set and everyone has learned something new that they otherwise wouldn’t have.

Of course, the design isn’t always the problem. These situations can also suggest that the style rules need to be reviewed. This is why you should set the levels to be aggressive; it is better to change the level of a rule based on experience rather than taking a stab at it during initial discussions. In an ideal world the settings for each rule are the result of real experiences of what is acceptable and what is not.

Care needs to be taken to act on these prompts to communicate otherwise the approach will fail. When violations pop up:

  • move focus away from the symptom and onto the cause
  • relate the problem back to overall quality goals
  • review the rules but only if all options for a better design have been explored and there is widespread agreement
  • treat the goal of excellent quality as primary; don’t compromise it just to get the current story out of the door

You ideally need one or more people to really champion this approach. They should be constantly listening for people being bitten by violations and should be ready to fire up the necessary discussions as soon as it happens.

You can’t force a team to continually focus on code quality, all you can do is create an environment that is more conducive to such an attitude. The mechanical aspects of using a ratchet should make the job easier but the real key is consistent communication.

Images

Baldrick – a dogsbody

May 9, 2009

Ruby is an awesome language for hacking stuff together (amongst other things of course) but do you want something that takes care of more of the plumbing, especially where web feeds are involved? Baldrick will service your whims.

Check it out at Github.

The Problem

Where I used to work we used a few Delcom build lights to monitor our continuous integration build. The scripts used to run these things are great fun to write (probably why we had a few different ones floating around) but the code to monitor the RSS feed containing the build status was quite repetitive. What we really cared about was linking a change in status to a change in light colour (and behaviour), not how to pull apart the RSS for the stuff we needed.

We had modified our light scripts to make the lights flash when the build had been newly broken. Someone could then ‘claim’ the build and stop it flashing by hitting a particular web URL. We also tended to communicate such things over IRC or some other means of broadcast. I thought it would be cool if you only had to claim the build in your message and that something could pick that up and change the status of the light for you.

At the same time I was playing a lot with Sinatra and I was giddy as a schoolgirl at just how easy it was to knock out a simple web server in a few lines. The magic ability to just execute a script and have it run as a web server really ticked my fancy. I thought that I’d love to have something like Sinatra that took care of the plumbing and allowed me to easily glue events to actions.

I was also impressed with the syntax of Cucumber steps and the ability to join up a textual step with the implementation via regular expressions.

All of things came together as Baldrick:

#cuppa.rb
require 'rubygems'
require 'baldrick_serve'

listen_to :feed, :at => 'http://search.twitter.com/search.atom?q=cup+of'

on_hearing /cup of (.*?)[\.,]/ do |beverage, order|
  puts "#{order[:who]} would like a cup of #{beverage}"  
end

Executing the above script will start a Baldrick server that listens to a Twitter feed for messages containing ‘cup of’. When a tweet containing a cup of something is found, the name of the tweeter and the beverage (perhaps) are spat out to the console.

How Does it Work?

Baldrick listens to a number of sources (at the moment RSS/Atom feeds and Injour statuses) for orders. The content of these sources is wrangled into a common format containing who, what, where and when.

From there its a case of hooking an order up to a task. When you define a task you give it a block to call when it is triggered. On receiving a new order, Baldrick will trigger all matching tasks. This also means that you can have orders from multiple sources triggering the same task if you so desire. Capturing groups within the regex will be passed as arguments to the block, followed by the order.

Baldrick uses the same tricks as Sinatra to allow an arbitrary script to be run as a server (the #at_exit hook).

Writing your own listeners is a snap, check out the wiki for details.

Try it out and drop me a line to tell me what you use it for.

Numerouno – number parsing for Ruby

May 9, 2009

How do I turn a string like ‘forty two’ into something I can manipulate as a number? String has the #to_i method but that only works on numerals like ‘3’. Numerouno is an English natural language parser for numbers.

Check it out at github.

The Problem

I hit this problem a few times in the past while writing Cucumber features that contained textual descriptions of numbers. Being good little BDD elves, we had worked very hard to keep the feature language true to that used by the customer. We were already using the awesome Chronic for parsing descriptions of dates and times which went a long way to preserving the language.

Unfortunately, describing numbers still seemed a bit clunky. We had steps like:

When I hop 37 times

The above is not ugly by any means, more mildly irritating. The main thing is that this is not how I would write the sentence. Maybe you find ’37’ more concise but to me it sticks out like a sore digit (ha ha) in an otherwise natural looking sentence. I want to write something like:

When I hop thirty seven times

And indeed now I can! Hurray hurrah!

require 'numerouno'
'thirty six billion, three hundred and ninety two'.as_number
 => 36000000392

How does it work?

The problem of parsing English number phrases was an interesting one and it took me a while to model it in a way that wasn’t totally confusing. Basically the current approach goes a little like this:

Identify individual numbers in the string

The first thing is to turn ‘thirty six billion, three hundred and ninety two’ into something we can manipulate a little easier, [30, 6, 1000000000, 300, 90, 2]. Simple regex matching is used to identify individual numbers.

Combine numbers

The English language has certain rules for interpreting numbers in a sentence. The rules most often revolve around numbers that are powers of ten, one hundred, one thousand, one million and so on. Once you hit one of these numbers you can start applying rules for the numbers either side of it to mash them into a combined figure.

The rules typically lead to you multiplying by the number to the left and then adding the number to the right. For example ‘five thousand and one’ => [5, 1000, 1] => 5 * 1000 + 1 => 5001.

Combination is done in several passes to ensure that lower powers of ten are combined properly before attempting to combine them with higher ones. Once all combination passes have been made a final step sums up the resulting list of combined numbers for the actual figure.

Limitations

At the moment only whole numbers up to those in the trillions are supported. The following things are not:

  • anything bigger than nine hundred and ninety nine trillion, nine hundred and ninety nine billion, nine hundred and ninety nine million, nine hundred and ninety nine thousand, nine hundred and ninety nine
  • fractions be they decimal or otherwise
  • other variations of numbers like ‘third’, ‘thirteenth’
  • slang like ‘K’, ‘grand’
  • any language except English. The rules for interpreting number are specific to the English language.

Yes, ironically Numerouno does not recognise ‘numero uno’.

If in doubt, try it out. Rhymes.