Feeds:
Posts
Comments

Archive for the ‘Programming’ Category

Previously we looked at connecting one producer to one consumer using a unix pipe. If we want to do many to many connections, we can use the AnyEvent Notifier I mentioned in Connecting Software Systems.

Instead of echoing a notification that a work unit has been created to STDOUT I write it to a socket connected to the notifier.

IO::Socket makes this absurdly easy.

use IO::Socket ':crlf';
use constant SHUTDOWN_SOCK_RW => 2;

my $sock = IO::Socket::INET->new(PeerAddr => 'localhost:12345');

for (1..5) {
    my_log "Iteration $_";
    my $filename = create_file($top);
    print $sock "/producer/file-creator/new-file $filename" . CRLF;
    my_log "PRODUCED $filename";
}

$sock->shutdown(SHUTDOWN_SOCK_RW);
$sock->close();

I often use a subject<SPACE>message format to make it easy for clients to filter between important messages.

I’ll demonstrate the consumer next time.

Read Full Post »

The Perl Flip Flop Operator

Mike Taylor dismisses Perl with a pithy reference to a section of its excellent documentation1. For some reason, I mis-remembered that he was complaining about the flip-flop operator rather than context in general.

So, I’ve come to defend the flip-flop operator, and the opposition hasn’t turned up! Oh well, never mind.

In scalar context, “..” returns a boolean value. The operator is bistable, like a flip-flop, and emulates the line-range (comma) operator of sed, awk, and various editors. Each “..” operator maintains its own boolean state. It is false as long as its left operand is false. Once the left operand is true, the range operator stays true until the right operand is true, AFTER which the range operator becomes false again.

Scanning Logfiles

So, say your logfile looks something like this:

... 100,000 lines ...
10:22:25.279 The first interesting line
... 30 more interesting lines ...
10:22:25.772 Another interesting line
10:22:25.772 The last interesting line
10:22:25.779 And then this line isn't interesting any more
... 100,000 lines ...

If you specify the beginning timestamp and end timestamp then you will get one uninteresting line which you can strip with head -n-1.

And that is it. Pretty easy eh?

jared@localhost $ cat flip-flop.muse \
> | perl -ne 'print if /^10:22:25.279/ .. /^10:22:25.779/' \
> | head -n-1 \
> | mail jared
10:22:25.279 The first interesting line
... 30 more interesting lines ...
10:22:25.772 Another interesting line
10:22:25.772 The last interesting line

Notes:

In a regex I often use an unescaped period (.) to match a period if it doesn’t matter like here

And for anyone thinking useless use of cat… it’s deliberate.


1. With impressive inconsistency, he later on says that Perl is a contender to be His Favourite Language which is why the alternative title for this post was Why Mike Taylor is not my Favourite Blogger.

Just kidding Mike.

Read Full Post »

In my recent Connecting Software Systems post, I included a teaser:

There is a great trick to connect two processes together using the shell (hint: using pipes) that I’m going to talk about next time

If you need to connect a single producer to a single consumer, you can easily connect them using a unix pipe without changing much of the underlying code. The trick is to write to STDOUT whenever work is available for the consumer. The consumer reads work requests from STDIN.

Script Preamble

The preamble will be mostly common to both the producer and the consumer. I set STDOUT to autoflush to avoid any slowness due to buffering.

use 5.010;

use strict;
use warnings;

use File::Path 'make_path';
use IO::Handle;
use POSIX 'strftime';

use File::Slurp;

STDOUT->autoflush(1);

sub hms
{
    return strftime('%H:%M:%S', localtime(time()));
}

sub my_log
{
    print '[ ', hms(), ' ] : ', @_, "\n";
}

The Producer

The work files are created in /var/tmp/pro-co. Hopefully create_file would be doing something a bit more useful in a real application!

sub create_file
{
    my $top = shift;

    my $filename = $top . '/' . int(rand(1_000_000)) . '.txt';
    my $content = rand(); # or something more useful...
    # simulate taking some time for processing
    sleep rand(5);
    write_file($filename, $content);
    return $filename;
}

# --

my $top = '/var/tmp/pro-co';

make_path($top);

for (1..5) {
    my_log "Iteration $_";
    my $filename = create_file($top);
    my_log "PRODUCED $filename";
}

The Consumer

The consumer removes the timestamp from the producer output. Then, any line that indicates a unit of work (marked in the example by PRODUCED) is passed to process_file().

sub process_file
{
    my $file = shift;
    my_log "Processing file [$file]";
    # File processing logic here ...
}

# --

while (defined(my $line = <STDIN>)) {
    # Strip the timestamp
    $line =~ s{\[\s[0-9:]+\s\]\s:\s}{};
    chomp $line;

    if ($line =~ /^PRODUCED\s+(.+)/) {
        process_file($1);
    } else {
        my_log "FROM PRODUCER [$line]";
    }
}

The Example Run

As you can see from the output below, the consumer was able to process the work as it became available.

$ ./producer.pl | tee producer.log | ./consumer.pl
[ 09:22:20 ] : FROM PRODUCER [Iteration 1]
[ 09:22:20 ] : Processing file [/var/tmp/pro-co/944319.txt]
[ 09:22:20 ] : FROM PRODUCER [Iteration 2]
[ 09:22:23 ] : Processing file [/var/tmp/pro-co/141765.txt]
[ 09:22:23 ] : FROM PRODUCER [Iteration 3]
[ 09:22:27 ] : Processing file [/var/tmp/pro-co/463599.txt]
[ 09:22:27 ] : FROM PRODUCER [Iteration 4]
[ 09:22:28 ] : Processing file [/var/tmp/pro-co/423055.txt]
[ 09:22:28 ] : FROM PRODUCER [Iteration 5]
[ 09:22:28 ] : Processing file [/var/tmp/pro-co/233909.txt]
$ cat producer.log
[ 09:22:20 ] : Iteration 1
[ 09:22:20 ] : PRODUCED /var/tmp/pro-co/944319.txt
[ 09:22:20 ] : Iteration 2
[ 09:22:23 ] : PRODUCED /var/tmp/pro-co/141765.txt
[ 09:22:23 ] : Iteration 3
[ 09:22:27 ] : PRODUCED /var/tmp/pro-co/463599.txt
[ 09:22:27 ] : Iteration 4
[ 09:22:28 ] : PRODUCED /var/tmp/pro-co/423055.txt
[ 09:22:28 ] : Iteration 5
[ 09:22:28 ] : PRODUCED /var/tmp/pro-co/233909.txt

Read Full Post »

So Why is This Post on Ironman Perl?

Okay, I waded in too fast on Gabor’s post. Sorry Gabor. And you may be wondering who died and made me the Ironman Perl police1. No-one, but riddle me this: why is this WordPress related post on Perl Ironman?

It doesn’t have any perl related tags and neither was it posted in the perl category. I thought that perhaps there was a sneaky mention of Perl in the post. But no, find in page doesn’t pick anything up.

Apart from the blog title that is – Perls of Wisdom. Is the blog title included in the content that is checked for the word perl2?


1. No-one, I promise. In the next post hopefully I’ll be back to the occasionally useful perl snippets.

2. Although I guess it is reasonable not to think of this. Perl is not a dictionary word so you probably wouldn’t expect it in the name of a blog.

Read Full Post »

What has been on my mind recently is if I write some software to do a task, say process some files and dump some other files to disk, how can that process inform another process that the files are ready?

Using Shell

There are millions of ways to do this. The easiest is probably to wrap both processes in a shell script:

create_files
process_created_files

The magic of unix means that process_created_files is triggered as soon as create_files exits.

This has a couple of problems. What if there were any errors in create_files? What if create_files takes a long time, and we want to start processing as soon as the first file has been created?

There is a great trick to connect two processes together using the shell (hint: using pipes) that I’m going to talk about next time. For now, let’s focus on the more heavy-weight options.

Using a Filewatcher

There are a few modules on CPAN that make it easy to build a filewatcher. A couple of examples are:

Maybe it is just me, but watching for file changes seems a bit 1970s, so moving swiftly on…

Using Third-Party Mechanisms

For connecting 1 producer to 1 consumer, I really like message queues. Apache has a crufy enterprise protocol called AMQP

RestMQ is a queue built on top of Redis.

Or you could use the Pub/Sub mechanism built into Redis directly.

ZeroMQ pretty much solves my problem without having to drop down to raw sockets.

# —

Unfortunately, none of these options are available to me. That brings me on to the raw socket options.

Using Raw Sockets

If the producer provides a server for interested consumers to connect to, you can have a bunch of consumers listening for events. Even better, you can add one layer of indirection and allow for many to many interactions.

An AnyEvent Notifier

The core of this code was taken from my Emulating POSIX Signals post.

use 5.010;

use strict;
use warnings;

package Notifier;

use constant CTRL_D => 4;
use constant DEBUG => $ENV{NOTIFIER_DEBUG};

my $cr = "\015\012";

use AnyEvent;
use AnyEvent::Handle;
use AnyEvent::Socket;

sub new
{
    my $class = shift;
    my $self = {
        pid => 0,
        watchers => {},
    };
    bless $self, $class;
    return $self;
}

sub disconnect_socket
{
    my ($self, $handle, $pid) = @_;
    $handle->on_drain(sub {
        say "disconnecting PID $pid...";
        $handle->destroy();
    });
    delete $self->{watchers}{$pid};
}

sub get_other_handles
{
    my ($self, $pid) = @_;
    return map { $self->{watchers}{$_}{handle} }
           grep { $_ != $pid } keys %{$self->{watchers}};
}

sub broadcast_line
{
    my ($self, $handles_ref, $line) = @_;
    foreach my $handle (@$handles_ref) {
        $handle->push_write($line . ${cr});
    }
}

sub watch_socket
{
    my ($self, $sock, $host, $port) = @_;

    # Closure variables
    my $pid = ++$self->{pid};

    my $handle; $handle = AnyEvent::Handle->new(
        fh => $sock,
        on_error => sub {
            say "Error: $! (PID $pid)";
            delete $self->{watchers}{$pid};
            $handle->destroy();
        },
        on_eof => sub {
            say "Disconnected (PID $pid)";
            delete $self->{watchers}{$pid};
            $handle->destroy();
        },
        on_read => sub {
            my $buffer = $handle->rbuf();
            $handle->rbuf() = '';

            if (length($buffer) == 1 and ord($buffer) == CTRL_D) {
                $handle->push_write('Received EOF.  ' .
                                    "Closing connection...${cr}");
                $self->disconnect_socket($handle, $pid);
            } else {
                my @handles = $self->get_other_handles($pid);
                foreach my $line (split /\r?\n/, $buffer) {
                    say "$host/$port : $line";
                    if (lc($line) =~ /quit|exit/) {
                        $self->disconnect_socket($handle, $pid);
                        return;
                    } else {
                        $self->broadcast_line(\@handles, $line);
                    }
                }
            }
        }
    );

    $handle->keepalive();
    $self->{watchers}{$pid} = { handle => $handle };
    $handle->push_write("Connected.  (PID $pid)${cr}");
}

sub prepare_handler
{
    my ($fh, $host, $port) = @_;
    DEBUG && warn "Listening on $host:$port\n";
}

sub _accept_handler
{
    my $self = shift;

    return sub {
        my ($sock, $peer_host, $peer_port) = @_;

        DEBUG && warn "Accepted connection from $peer_host:$peer_port\n";
        if (! $sock) {
            warn '$sock undefined' . "\n";
            return;
        }

        $self->watch_socket($sock, $peer_host, $peer_port);
    };
}

sub start_listen
{
    my ($self, $host, $port) = @_;

    $self->{server} = tcp_server($host,
                                 $port,
                                 $self->_accept_handler(),
                                 \&prepare_handler);
}

package main;

my $host = undef;
my $port = 12345;

my $kernel = Notifier->new();
$kernel->start_listen($host, $port);

AE::cv->recv();

Read Full Post »

Systems Monitoring

I almost missed this excellent post on system monitoring by (I think) Yanick Champoux. It’s a clever idea. The monitor has a bunch of TAP1 tests that check various parts of the system – e.g. whether diskspace on a particular partition has breached a certain threshold.

The tests are all run with Smolder, "a web-based continuous integration smoke server". This means that the results can be made available via email, RSS or Smolder’s web interface. Very neat.


1. Test Anything Protocol

Read Full Post »

IRC vs Planet Perl Iron Man

Mark Keating left a thoughtful comment on my throwaway post about Planet Perl Iron Man (PPIM).

IRC

I’m pretty introverted so IRC isn’t an ideal medium for me – it’s a direct conversation. Conversely, having a blog posted on PPIM sometime gets the message1 out to the important people and there is little need to interact with people directly ;) For me, PPIM is better than IRC.

Malicious or Careless Users


“the code does assume that we’re all honest and nice people and that we’ll use good tags and the correct date”

In this case, I’d assume it was user error rather than a deliberate and somewhat annoying attempt to monopolise the top of PPIM. Having said that, whenever I’ve needed to develop a webfacing app, I’ve noticed that assuming your users are going to play nice is the wrong thing to do. If even one of your thousands of users is malicious and effective, it’s going to ruin your day.

If I was responsible for implementing Planet Ironman, I wouldn’t have planned for a future dated post either. But having seen it, it should be fairly simple to discard any post with a date more than 1 day into the future.


1. In this case, the message wasn’t really important so finding the appropriate delivery channel didn’t really matter.

Read Full Post »

What Is Going On With Ironman Perl?

Herbert Breunung’s The Pearl Metaphor has been at the top for days. Did he manage to break it just by writing a post with a future date?

Read Full Post »

Why Micro-Benchmark?

I have had some feedback along the lines of why write a micro-benchmark such as Tokyo Cabinet vs Berkeley DB. Everyone knows that:

"premature optimization is the root of all evil"

and

“The First Rule of Program Optimization: Don’t do it. The Second Rule of Program Optimization (for experts only!): Don’t do it yet.”

That kinda misses the point. Micro-benchmarking is nothing to do with optimization. Its purpose is to avoid premature pessimization due to choosing the wrong technology.

I have a project to replace an existing system where the backing store is by far the biggest bottleneck. The new system needs to be signicantly faster than the existing system and I have a lot of flexibility in choosing the backing store.

The two alternatives that immediately come to mind are.

  1. create a nice abstraction layer that can easily switch between Perl DBI, Redis, Tokyo Cabinet and a bunch of other alternatives… or
  2. micro-benchmark them with something similar to my read/write data profile.

Time pressure unfortunately compels me to choose #2. Maybe it’s time to take a second look at KiokuDB, for inspiration at least, even if I can’t use it directly.

Read Full Post »

Perl And Type Checking

(aka known as which language am I using again?)

Dave Rolsky has a new post on perl 5 overloading. It’s fairly informative, but it contains this little gem (emphasis mine):

If you don’t care about defensive programming, then Perl 5′s overloading is perfect, and you can stop reading now. Also, please let me know so I can avoid working on code with you, thanks.

Defensive programming, for the purposes of this entry, can be defined as "checking sub/method arguments for sanity".

Blanket statements like this really get my gripe up. Let’s have a look at defensive argument checking in Perl taken to its illogical conclusion.

Defensive Primitive Type Checking

use strict;
use warnings;

use Carp;
use Scalar::Util 'looks_like_number';

sub can_legally_drink
{
    my $age = shift;
    croak "$age is not a number" unless looks_like_number($age);
    return $age >= 18;
}

print can_legally_drink('x'), "\n";

And fantastic news! can_legally_drink correctly detected that my argument isn’t a number.

x is not a number at age.pl line 10
        main::can_legally_drink('x') called at age.pl line 14

But hang on a minute. Not all integers are ages. Surely we want to check if a real age type was passed in.

Checking For A ‘Real’ Type

My stripped down defensive age type might look something like this.

package age;

use Carp;
use Scalar::Util 'looks_like_number';

sub isa_age
{
    my $arg = shift;
    return ref($arg) and blessed $arg and $arg->isa('age');
}

sub new {
    my ($class, $years) = @_;
    croak "$years is not a number" unless looks_like_number($years);
    bless { years => $years }, $class;
}

sub years {
    return $_[0]->{'years'}
}

sub less_than {
    my ($self, $other) = @_;
    croak "$other is not an age" unless isa_age($other);
    return $self->years() < $other->years();
}

And then my calling code can look like this:

package main;

sub can_legally_drink
{
    my $age = shift;
    croak "$age is not an age" unless $age->isa('age');
    return ! $age->less_than(age->new(18));
}

print can_legally_drink(age->new(18)), "\n";
print can_legally_drink(18), "\n";

And woohoo, the second line throws an error as I wanted.

Actually, I don’t write Perl like this. Dave, you probably want to avoid working on code with me, thanks.

Moose

To be fair, Rolsky is talking his own book. Moose has a bunch of stuff that handles all this type checking malarky nice and cleanly. If you’re building something big in Perl, you should take a look.

But if you really care about types, I mean defensive programming that much, you could use a statically typed language instead and then you even get fast code thrown in for free.

Read Full Post »

« Newer Posts - Older Posts »

Follow

Get every new post delivered to your Inbox.