Feeds:
Posts
Comments

Archive for the ‘Perl’ Category

In my recent Connecting Software Systems post, I included a teaser:

There is a great trick to connect two processes together using the shell (hint: using pipes) that I’m going to talk about next time

If you need to connect a single producer to a single consumer, you can easily connect them using a unix pipe without changing much of the underlying code. The trick is to write to STDOUT whenever work is available for the consumer. The consumer reads work requests from STDIN.

Script Preamble

The preamble will be mostly common to both the producer and the consumer. I set STDOUT to autoflush to avoid any slowness due to buffering.

use 5.010;

use strict;
use warnings;

use File::Path 'make_path';
use IO::Handle;
use POSIX 'strftime';

use File::Slurp;

STDOUT->autoflush(1);

sub hms
{
    return strftime('%H:%M:%S', localtime(time()));
}

sub my_log
{
    print '[ ', hms(), ' ] : ', @_, "\n";
}

The Producer

The work files are created in /var/tmp/pro-co. Hopefully create_file would be doing something a bit more useful in a real application!

sub create_file
{
    my $top = shift;

    my $filename = $top . '/' . int(rand(1_000_000)) . '.txt';
    my $content = rand(); # or something more useful...
    # simulate taking some time for processing
    sleep rand(5);
    write_file($filename, $content);
    return $filename;
}

# --

my $top = '/var/tmp/pro-co';

make_path($top);

for (1..5) {
    my_log "Iteration $_";
    my $filename = create_file($top);
    my_log "PRODUCED $filename";
}

The Consumer

The consumer removes the timestamp from the producer output. Then, any line that indicates a unit of work (marked in the example by PRODUCED) is passed to process_file().

sub process_file
{
    my $file = shift;
    my_log "Processing file [$file]";
    # File processing logic here ...
}

# --

while (defined(my $line = <STDIN>)) {
    # Strip the timestamp
    $line =~ s{\[\s[0-9:]+\s\]\s:\s}{};
    chomp $line;

    if ($line =~ /^PRODUCED\s+(.+)/) {
        process_file($1);
    } else {
        my_log "FROM PRODUCER [$line]";
    }
}

The Example Run

As you can see from the output below, the consumer was able to process the work as it became available.

$ ./producer.pl | tee producer.log | ./consumer.pl
[ 09:22:20 ] : FROM PRODUCER [Iteration 1]
[ 09:22:20 ] : Processing file [/var/tmp/pro-co/944319.txt]
[ 09:22:20 ] : FROM PRODUCER [Iteration 2]
[ 09:22:23 ] : Processing file [/var/tmp/pro-co/141765.txt]
[ 09:22:23 ] : FROM PRODUCER [Iteration 3]
[ 09:22:27 ] : Processing file [/var/tmp/pro-co/463599.txt]
[ 09:22:27 ] : FROM PRODUCER [Iteration 4]
[ 09:22:28 ] : Processing file [/var/tmp/pro-co/423055.txt]
[ 09:22:28 ] : FROM PRODUCER [Iteration 5]
[ 09:22:28 ] : Processing file [/var/tmp/pro-co/233909.txt]
$ cat producer.log
[ 09:22:20 ] : Iteration 1
[ 09:22:20 ] : PRODUCED /var/tmp/pro-co/944319.txt
[ 09:22:20 ] : Iteration 2
[ 09:22:23 ] : PRODUCED /var/tmp/pro-co/141765.txt
[ 09:22:23 ] : Iteration 3
[ 09:22:27 ] : PRODUCED /var/tmp/pro-co/463599.txt
[ 09:22:27 ] : Iteration 4
[ 09:22:28 ] : PRODUCED /var/tmp/pro-co/423055.txt
[ 09:22:28 ] : Iteration 5
[ 09:22:28 ] : PRODUCED /var/tmp/pro-co/233909.txt

Read Full Post »

So Why is This Post on Ironman Perl?

Okay, I waded in too fast on Gabor’s post. Sorry Gabor. And you may be wondering who died and made me the Ironman Perl police1. No-one, but riddle me this: why is this WordPress related post on Perl Ironman?

It doesn’t have any perl related tags and neither was it posted in the perl category. I thought that perhaps there was a sneaky mention of Perl in the post. But no, find in page doesn’t pick anything up.

Apart from the blog title that is – Perls of Wisdom. Is the blog title included in the content that is checked for the word perl2?


1. No-one, I promise. In the next post hopefully I’ll be back to the occasionally useful perl snippets.

2. Although I guess it is reasonable not to think of this. Perl is not a dictionary word so you probably wouldn’t expect it in the name of a blog.

Read Full Post »

What has been on my mind recently is if I write some software to do a task, say process some files and dump some other files to disk, how can that process inform another process that the files are ready?

Using Shell

There are millions of ways to do this. The easiest is probably to wrap both processes in a shell script:

create_files
process_created_files

The magic of unix means that process_created_files is triggered as soon as create_files exits.

This has a couple of problems. What if there were any errors in create_files? What if create_files takes a long time, and we want to start processing as soon as the first file has been created?

There is a great trick to connect two processes together using the shell (hint: using pipes) that I’m going to talk about next time. For now, let’s focus on the more heavy-weight options.

Using a Filewatcher

There are a few modules on CPAN that make it easy to build a filewatcher. A couple of examples are:

Maybe it is just me, but watching for file changes seems a bit 1970s, so moving swiftly on…

Using Third-Party Mechanisms

For connecting 1 producer to 1 consumer, I really like message queues. Apache has a crufy enterprise protocol called AMQP

RestMQ is a queue built on top of Redis.

Or you could use the Pub/Sub mechanism built into Redis directly.

ZeroMQ pretty much solves my problem without having to drop down to raw sockets.

# —

Unfortunately, none of these options are available to me. That brings me on to the raw socket options.

Using Raw Sockets

If the producer provides a server for interested consumers to connect to, you can have a bunch of consumers listening for events. Even better, you can add one layer of indirection and allow for many to many interactions.

An AnyEvent Notifier

The core of this code was taken from my Emulating POSIX Signals post.

use 5.010;

use strict;
use warnings;

package Notifier;

use constant CTRL_D => 4;
use constant DEBUG => $ENV{NOTIFIER_DEBUG};

my $cr = "\015\012";

use AnyEvent;
use AnyEvent::Handle;
use AnyEvent::Socket;

sub new
{
    my $class = shift;
    my $self = {
        pid => 0,
        watchers => {},
    };
    bless $self, $class;
    return $self;
}

sub disconnect_socket
{
    my ($self, $handle, $pid) = @_;
    $handle->on_drain(sub {
        say "disconnecting PID $pid...";
        $handle->destroy();
    });
    delete $self->{watchers}{$pid};
}

sub get_other_handles
{
    my ($self, $pid) = @_;
    return map { $self->{watchers}{$_}{handle} }
           grep { $_ != $pid } keys %{$self->{watchers}};
}

sub broadcast_line
{
    my ($self, $handles_ref, $line) = @_;
    foreach my $handle (@$handles_ref) {
        $handle->push_write($line . ${cr});
    }
}

sub watch_socket
{
    my ($self, $sock, $host, $port) = @_;

    # Closure variables
    my $pid = ++$self->{pid};

    my $handle; $handle = AnyEvent::Handle->new(
        fh => $sock,
        on_error => sub {
            say "Error: $! (PID $pid)";
            delete $self->{watchers}{$pid};
            $handle->destroy();
        },
        on_eof => sub {
            say "Disconnected (PID $pid)";
            delete $self->{watchers}{$pid};
            $handle->destroy();
        },
        on_read => sub {
            my $buffer = $handle->rbuf();
            $handle->rbuf() = '';

            if (length($buffer) == 1 and ord($buffer) == CTRL_D) {
                $handle->push_write('Received EOF.  ' .
                                    "Closing connection...${cr}");
                $self->disconnect_socket($handle, $pid);
            } else {
                my @handles = $self->get_other_handles($pid);
                foreach my $line (split /\r?\n/, $buffer) {
                    say "$host/$port : $line";
                    if (lc($line) =~ /quit|exit/) {
                        $self->disconnect_socket($handle, $pid);
                        return;
                    } else {
                        $self->broadcast_line(\@handles, $line);
                    }
                }
            }
        }
    );

    $handle->keepalive();
    $self->{watchers}{$pid} = { handle => $handle };
    $handle->push_write("Connected.  (PID $pid)${cr}");
}

sub prepare_handler
{
    my ($fh, $host, $port) = @_;
    DEBUG && warn "Listening on $host:$port\n";
}

sub _accept_handler
{
    my $self = shift;

    return sub {
        my ($sock, $peer_host, $peer_port) = @_;

        DEBUG && warn "Accepted connection from $peer_host:$peer_port\n";
        if (! $sock) {
            warn '$sock undefined' . "\n";
            return;
        }

        $self->watch_socket($sock, $peer_host, $peer_port);
    };
}

sub start_listen
{
    my ($self, $host, $port) = @_;

    $self->{server} = tcp_server($host,
                                 $port,
                                 $self->_accept_handler(),
                                 \&prepare_handler);
}

package main;

my $host = undef;
my $port = 12345;

my $kernel = Notifier->new();
$kernel->start_listen($host, $port);

AE::cv->recv();

Read Full Post »

Systems Monitoring

I almost missed this excellent post on system monitoring by (I think) Yanick Champoux. It’s a clever idea. The monitor has a bunch of TAP1 tests that check various parts of the system – e.g. whether diskspace on a particular partition has breached a certain threshold.

The tests are all run with Smolder, "a web-based continuous integration smoke server". This means that the results can be made available via email, RSS or Smolder’s web interface. Very neat.


1. Test Anything Protocol

Read Full Post »

IRC vs Planet Perl Iron Man

Mark Keating left a thoughtful comment on my throwaway post about Planet Perl Iron Man (PPIM).

IRC

I’m pretty introverted so IRC isn’t an ideal medium for me – it’s a direct conversation. Conversely, having a blog posted on PPIM sometime gets the message1 out to the important people and there is little need to interact with people directly ;) For me, PPIM is better than IRC.

Malicious or Careless Users


“the code does assume that we’re all honest and nice people and that we’ll use good tags and the correct date”

In this case, I’d assume it was user error rather than a deliberate and somewhat annoying attempt to monopolise the top of PPIM. Having said that, whenever I’ve needed to develop a webfacing app, I’ve noticed that assuming your users are going to play nice is the wrong thing to do. If even one of your thousands of users is malicious and effective, it’s going to ruin your day.

If I was responsible for implementing Planet Ironman, I wouldn’t have planned for a future dated post either. But having seen it, it should be fairly simple to discard any post with a date more than 1 day into the future.


1. In this case, the message wasn’t really important so finding the appropriate delivery channel didn’t really matter.

Read Full Post »

What Is Going On With Ironman Perl?

Herbert Breunung’s The Pearl Metaphor has been at the top for days. Did he manage to break it just by writing a post with a future date?

Read Full Post »

Why Micro-Benchmark?

I have had some feedback along the lines of why write a micro-benchmark such as Tokyo Cabinet vs Berkeley DB. Everyone knows that:

"premature optimization is the root of all evil"

and

“The First Rule of Program Optimization: Don’t do it. The Second Rule of Program Optimization (for experts only!): Don’t do it yet.”

That kinda misses the point. Micro-benchmarking is nothing to do with optimization. Its purpose is to avoid premature pessimization due to choosing the wrong technology.

I have a project to replace an existing system where the backing store is by far the biggest bottleneck. The new system needs to be signicantly faster than the existing system and I have a lot of flexibility in choosing the backing store.

The two alternatives that immediately come to mind are.

  1. create a nice abstraction layer that can easily switch between Perl DBI, Redis, Tokyo Cabinet and a bunch of other alternatives… or
  2. micro-benchmark them with something similar to my read/write data profile.

Time pressure unfortunately compels me to choose #2. Maybe it’s time to take a second look at KiokuDB, for inspiration at least, even if I can’t use it directly.

Read Full Post »

Perl And Type Checking

(aka known as which language am I using again?)

Dave Rolsky has a new post on perl 5 overloading. It’s fairly informative, but it contains this little gem (emphasis mine):

If you don’t care about defensive programming, then Perl 5′s overloading is perfect, and you can stop reading now. Also, please let me know so I can avoid working on code with you, thanks.

Defensive programming, for the purposes of this entry, can be defined as "checking sub/method arguments for sanity".

Blanket statements like this really get my gripe up. Let’s have a look at defensive argument checking in Perl taken to its illogical conclusion.

Defensive Primitive Type Checking

use strict;
use warnings;

use Carp;
use Scalar::Util 'looks_like_number';

sub can_legally_drink
{
    my $age = shift;
    croak "$age is not a number" unless looks_like_number($age);
    return $age >= 18;
}

print can_legally_drink('x'), "\n";

And fantastic news! can_legally_drink correctly detected that my argument isn’t a number.

x is not a number at age.pl line 10
        main::can_legally_drink('x') called at age.pl line 14

But hang on a minute. Not all integers are ages. Surely we want to check if a real age type was passed in.

Checking For A ‘Real’ Type

My stripped down defensive age type might look something like this.

package age;

use Carp;
use Scalar::Util 'looks_like_number';

sub isa_age
{
    my $arg = shift;
    return ref($arg) and blessed $arg and $arg->isa('age');
}

sub new {
    my ($class, $years) = @_;
    croak "$years is not a number" unless looks_like_number($years);
    bless { years => $years }, $class;
}

sub years {
    return $_[0]->{'years'}
}

sub less_than {
    my ($self, $other) = @_;
    croak "$other is not an age" unless isa_age($other);
    return $self->years() < $other->years();
}

And then my calling code can look like this:

package main;

sub can_legally_drink
{
    my $age = shift;
    croak "$age is not an age" unless $age->isa('age');
    return ! $age->less_than(age->new(18));
}

print can_legally_drink(age->new(18)), "\n";
print can_legally_drink(18), "\n";

And woohoo, the second line throws an error as I wanted.

Actually, I don’t write Perl like this. Dave, you probably want to avoid working on code with me, thanks.

Moose

To be fair, Rolsky is talking his own book. Moose has a bunch of stuff that handles all this type checking malarky nice and cleanly. If you’re building something big in Perl, you should take a look.

But if you really care about types, I mean defensive programming that much, you could use a statically typed language instead and then you even get fast code thrown in for free.

Read Full Post »

In response to my Berkeley DB benchmarking post, Pedro Melo points out that Tokyo Cabinet is faster and that JSON::XS is faster than Storable.

I couldn’t find an up to date Ubuntu package that included the TC perl libraries so I had to build everything from source. It was pretty straightforward though.

First we need to get the database handle.

my $tc_file = "$ENV{HOME}/test.tc";
unlink $tc_file;
my $hdb = TokyoCabinet::HDB->new();

if(!$hdb->open($tc_file, $hdb->OWRITER | $hdb->OCREAT)){
    my $ecode = $hdb->ecode();
    printf STDERR ("open error: %s\n", $hdb->errmsg($ecode));
}

Presumably putasync is the fastest database put method.

my $ORDER_ID = 0;

sub store_record_tc
{
    my ($db, $ref_record, $no_sync, $json) = @_;
    $json //= 0;
    $no_sync //= 0;
    $ref_record->{'order_id'} = ++$ORDER_ID;
    my $key = "/order/$ref_record->{'order_id'}";
    $db->putasync($key, $json ? encode_json($ref_record)
                              : Storable::freeze($ref_record));
}

I needed to amend store_record to compare json and storable too.

sub store_record
{
    my ($db, $ref_record, $no_sync, $json) = @_;
    $json //= 0;
    $no_sync //= 0;
    $ref_record->{'order_id'} = ++$ORDER_ID;
    my $key = "/order/$ref_record->{'order_id'}";
    $db->db_put($key, $json ? encode_json($ref_record)
                            : Storable::freeze($ref_record));
    $db->db_sync() unless $no_sync;
}

The benchmarking code looks like this.

Benchmark::cmpthese(-1, {
    'json-only-50/50' => sub { json_only($db, $rec_50_50) },
    'freeze-only-50/50' => sub { freeze_only($db, $rec_50_50) },

    'freeze-no-sync-50/50' => sub { store_record($db, $rec_50_50, 1) },
    'freeze-no-sync-50/50-tc' => sub { store_record_tc($hdb, $rec_50_50, 1) },

    'json-no-sync-50/50' => sub { store_record($db, $rec_50_50, 1, 1) },
    'json-no-sync-50/50-tc' => sub { store_record_tc($hdb, $rec_50_50, 1, 1) },
});

And the results are as follows:

        Rate freeze-no-sync-50/50 json-no-sync-50/50 freeze-no-sync-50/50-tc json-no-sync-50/50-tc freeze-only-50/50 json-only-50/50
freeze-no-sync-50/50     7791/s                   --                -9%                    -39%                  -47%              -59%            -81%
json-no-sync-50/50       8605/s                  10%                 --                    -33%                  -41%              -55%            -79%
freeze-no-sync-50/50-tc 12800/s                  64%                49%                      --                  -13%              -33%            -69%
json-no-sync-50/50-tc   14698/s                  89%                71%                     15%                    --              -23%            -64%
freeze-only-50/50       19166/s                 146%               123%                     50%                   30%                --            -54%
json-only-50/50         41353/s                 431%               381%                    223%                  181%              116%              --

Pedro was right. Tokyo Cabinet is significantly faster than Berkeley DB, at least in this simple benchmark.

Edit: json and no_sync parameter switch has been fixed.

Read Full Post »

Finding Useful Posts

One of the weaknesses of many blogs, including my own, is the difficulty of finding old, useful articles. There are a few ways to find articles written previously, such as the Archives, the categories and the tags. But to be honest, a lot of the stuff I write is only relevant (at best) at the time of publishing. And even I have difficulty finding my useful posts again.

There are several possible solutions, e.g. I could tag pages within delicious as curiousprogrammer/useful. For the moment, I’ve decided to keep a blog highlights page, and list the posts which are useful for me. Later on I might try a more comprehensive index.

Read Full Post »

« Newer Posts - Older Posts »

Follow

Get every new post delivered to your Inbox.