Archive for October, 2010

What Is Going On With Ironman Perl?

Herbert Breunung’s The Pearl Metaphor has been at the top for days. Did he manage to break it just by writing a post with a future date?


Read Full Post »

Why Micro-Benchmark?

I have had some feedback along the lines of why write a micro-benchmark such as Tokyo Cabinet vs Berkeley DB. Everyone knows that:

"premature optimization is the root of all evil"


“The First Rule of Program Optimization: Don’t do it. The Second Rule of Program Optimization (for experts only!): Don’t do it yet.”

That kinda misses the point. Micro-benchmarking is nothing to do with optimization. Its purpose is to avoid premature pessimization due to choosing the wrong technology.

I have a project to replace an existing system where the backing store is by far the biggest bottleneck. The new system needs to be signicantly faster than the existing system and I have a lot of flexibility in choosing the backing store.

The two alternatives that immediately come to mind are.

  1. create a nice abstraction layer that can easily switch between Perl DBI, Redis, Tokyo Cabinet and a bunch of other alternatives… or
  2. micro-benchmark them with something similar to my read/write data profile.

Time pressure unfortunately compels me to choose #2. Maybe it’s time to take a second look at KiokuDB, for inspiration at least, even if I can’t use it directly.

Read Full Post »

Perl And Type Checking

(aka known as which language am I using again?)

Dave Rolsky has a new post on perl 5 overloading. It’s fairly informative, but it contains this little gem (emphasis mine):

If you don’t care about defensive programming, then Perl 5’s overloading is perfect, and you can stop reading now. Also, please let me know so I can avoid working on code with you, thanks.

Defensive programming, for the purposes of this entry, can be defined as "checking sub/method arguments for sanity".

Blanket statements like this really get my gripe up. Let’s have a look at defensive argument checking in Perl taken to its illogical conclusion.

Defensive Primitive Type Checking

use strict;
use warnings;

use Carp;
use Scalar::Util 'looks_like_number';

sub can_legally_drink
    my $age = shift;
    croak "$age is not a number" unless looks_like_number($age);
    return $age >= 18;

print can_legally_drink('x'), "\n";

And fantastic news! can_legally_drink correctly detected that my argument isn’t a number.

x is not a number at age.pl line 10
        main::can_legally_drink('x') called at age.pl line 14

But hang on a minute. Not all integers are ages. Surely we want to check if a real age type was passed in.

Checking For A ‘Real’ Type

My stripped down defensive age type might look something like this.

package age;

use Carp;
use Scalar::Util 'looks_like_number';

sub isa_age
    my $arg = shift;
    return ref($arg) and blessed $arg and $arg->isa('age');

sub new {
    my ($class, $years) = @_;
    croak "$years is not a number" unless looks_like_number($years);
    bless { years => $years }, $class;

sub years {
    return $_[0]->{'years'}

sub less_than {
    my ($self, $other) = @_;
    croak "$other is not an age" unless isa_age($other);
    return $self->years() < $other->years();

And then my calling code can look like this:

package main;

sub can_legally_drink
    my $age = shift;
    croak "$age is not an age" unless $age->isa('age');
    return ! $age->less_than(age->new(18));

print can_legally_drink(age->new(18)), "\n";
print can_legally_drink(18), "\n";

And woohoo, the second line throws an error as I wanted.

Actually, I don’t write Perl like this. Dave, you probably want to avoid working on code with me, thanks.


To be fair, Rolsky is talking his own book. Moose has a bunch of stuff that handles all this type checking malarky nice and cleanly. If you’re building something big in Perl, you should take a look.

But if you really care about types, I mean defensive programming that much, you could use a statically typed language instead and then you even get fast code thrown in for free.

Read Full Post »

In response to my Berkeley DB benchmarking post, Pedro Melo points out that Tokyo Cabinet is faster and that JSON::XS is faster than Storable.

I couldn’t find an up to date Ubuntu package that included the TC perl libraries so I had to build everything from source. It was pretty straightforward though.

First we need to get the database handle.

my $tc_file = "$ENV{HOME}/test.tc";
unlink $tc_file;
my $hdb = TokyoCabinet::HDB->new();

if(!$hdb->open($tc_file, $hdb->OWRITER | $hdb->OCREAT)){
    my $ecode = $hdb->ecode();
    printf STDERR ("open error: %s\n", $hdb->errmsg($ecode));

Presumably putasync is the fastest database put method.

my $ORDER_ID = 0;

sub store_record_tc
    my ($db, $ref_record, $no_sync, $json) = @_;
    $json //= 0;
    $no_sync //= 0;
    $ref_record->{'order_id'} = ++$ORDER_ID;
    my $key = "/order/$ref_record->{'order_id'}";
    $db->putasync($key, $json ? encode_json($ref_record)
                              : Storable::freeze($ref_record));

I needed to amend store_record to compare json and storable too.

sub store_record
    my ($db, $ref_record, $no_sync, $json) = @_;
    $json //= 0;
    $no_sync //= 0;
    $ref_record->{'order_id'} = ++$ORDER_ID;
    my $key = "/order/$ref_record->{'order_id'}";
    $db->db_put($key, $json ? encode_json($ref_record)
                            : Storable::freeze($ref_record));
    $db->db_sync() unless $no_sync;

The benchmarking code looks like this.

Benchmark::cmpthese(-1, {
    'json-only-50/50' => sub { json_only($db, $rec_50_50) },
    'freeze-only-50/50' => sub { freeze_only($db, $rec_50_50) },

    'freeze-no-sync-50/50' => sub { store_record($db, $rec_50_50, 1) },
    'freeze-no-sync-50/50-tc' => sub { store_record_tc($hdb, $rec_50_50, 1) },

    'json-no-sync-50/50' => sub { store_record($db, $rec_50_50, 1, 1) },
    'json-no-sync-50/50-tc' => sub { store_record_tc($hdb, $rec_50_50, 1, 1) },

And the results are as follows:

        Rate freeze-no-sync-50/50 json-no-sync-50/50 freeze-no-sync-50/50-tc json-no-sync-50/50-tc freeze-only-50/50 json-only-50/50
freeze-no-sync-50/50     7791/s                   --                -9%                    -39%                  -47%              -59%            -81%
json-no-sync-50/50       8605/s                  10%                 --                    -33%                  -41%              -55%            -79%
freeze-no-sync-50/50-tc 12800/s                  64%                49%                      --                  -13%              -33%            -69%
json-no-sync-50/50-tc   14698/s                  89%                71%                     15%                    --              -23%            -64%
freeze-only-50/50       19166/s                 146%               123%                     50%                   30%                --            -54%
json-only-50/50         41353/s                 431%               381%                    223%                  181%              116%              --

Pedro was right. Tokyo Cabinet is significantly faster than Berkeley DB, at least in this simple benchmark.

Edit: json and no_sync parameter switch has been fixed.

Read Full Post »

Finding Useful Posts

One of the weaknesses of many blogs, including my own, is the difficulty of finding old, useful articles. There are a few ways to find articles written previously, such as the Archives, the categories and the tags. But to be honest, a lot of the stuff I write is only relevant (at best) at the time of publishing. And even I have difficulty finding my useful posts again.

There are several possible solutions, e.g. I could tag pages within delicious as curiousprogrammer/useful. For the moment, I’ve decided to keep a blog highlights page, and list the posts which are useful for me. Later on I might try a more comprehensive index.

Read Full Post »

I’m not really a fan of nested loops, so when I need to create a list of combinations based on two or three other lists, I really miss list comprehensions1 such as those in Python.

l = [(x,y) for x in range(5) for y in range(5)]

Very elegant.

If I was using Lisp, I might use a nested mapcar.

(mapcar (lambda (out) (mapcar (lambda (in) (cons in out))
                              '(a b c d e)))
          '(1 2 3 4 5))

Perl map uses $_ so you don’t need to explicitly specify the name of the lambda parameter. How can I differentiate between the outer $_ and the inner $_?

my $outer;
my @x = map { $outer = $_;
              (map { $outer . $_ } qw(a b c d e)) }

Note: if you are trying to do something as simple as this, take a look at Set::CrossProduct instead.

1. It’s available on the CPAN of course

Read Full Post »