Feeds:
Posts
Comments

Archive for November, 2010

The Perl Flip Flop Operator

Mike Taylor dismisses Perl with a pithy reference to a section of its excellent documentation1. For some reason, I mis-remembered that he was complaining about the flip-flop operator rather than context in general.

So, I’ve come to defend the flip-flop operator, and the opposition hasn’t turned up! Oh well, never mind.

In scalar context, “..” returns a boolean value. The operator is bistable, like a flip-flop, and emulates the line-range (comma) operator of sed, awk, and various editors. Each “..” operator maintains its own boolean state. It is false as long as its left operand is false. Once the left operand is true, the range operator stays true until the right operand is true, AFTER which the range operator becomes false again.

Scanning Logfiles

So, say your logfile looks something like this:

... 100,000 lines ...
10:22:25.279 The first interesting line
... 30 more interesting lines ...
10:22:25.772 Another interesting line
10:22:25.772 The last interesting line
10:22:25.779 And then this line isn't interesting any more
... 100,000 lines ...

If you specify the beginning timestamp and end timestamp then you will get one uninteresting line which you can strip with head -n-1.

And that is it. Pretty easy eh?

jared@localhost $ cat flip-flop.muse \
> | perl -ne 'print if /^10:22:25.279/ .. /^10:22:25.779/' \
> | head -n-1 \
> | mail jared
10:22:25.279 The first interesting line
... 30 more interesting lines ...
10:22:25.772 Another interesting line
10:22:25.772 The last interesting line

Notes:

In a regex I often use an unescaped period (.) to match a period if it doesn’t matter like here

And for anyone thinking useless use of cat… it’s deliberate.


1. With impressive inconsistency, he later on says that Perl is a contender to be His Favourite Language which is why the alternative title for this post was Why Mike Taylor is not my Favourite Blogger.

Just kidding Mike.

Read Full Post »

In my recent Connecting Software Systems post, I included a teaser:

There is a great trick to connect two processes together using the shell (hint: using pipes) that I’m going to talk about next time

If you need to connect a single producer to a single consumer, you can easily connect them using a unix pipe without changing much of the underlying code. The trick is to write to STDOUT whenever work is available for the consumer. The consumer reads work requests from STDIN.

Script Preamble

The preamble will be mostly common to both the producer and the consumer. I set STDOUT to autoflush to avoid any slowness due to buffering.

use 5.010;

use strict;
use warnings;

use File::Path 'make_path';
use IO::Handle;
use POSIX 'strftime';

use File::Slurp;

STDOUT->autoflush(1);

sub hms
{
    return strftime('%H:%M:%S', localtime(time()));
}

sub my_log
{
    print '[ ', hms(), ' ] : ', @_, "\n";
}

The Producer

The work files are created in /var/tmp/pro-co. Hopefully create_file would be doing something a bit more useful in a real application!

sub create_file
{
    my $top = shift;

    my $filename = $top . '/' . int(rand(1_000_000)) . '.txt';
    my $content = rand(); # or something more useful...
    # simulate taking some time for processing
    sleep rand(5);
    write_file($filename, $content);
    return $filename;
}

# --

my $top = '/var/tmp/pro-co';

make_path($top);

for (1..5) {
    my_log "Iteration $_";
    my $filename = create_file($top);
    my_log "PRODUCED $filename";
}

The Consumer

The consumer removes the timestamp from the producer output. Then, any line that indicates a unit of work (marked in the example by PRODUCED) is passed to process_file().

sub process_file
{
    my $file = shift;
    my_log "Processing file [$file]";
    # File processing logic here ...
}

# --

while (defined(my $line = <STDIN>)) {
    # Strip the timestamp
    $line =~ s{\[\s[0-9:]+\s\]\s:\s}{};
    chomp $line;

    if ($line =~ /^PRODUCED\s+(.+)/) {
        process_file($1);
    } else {
        my_log "FROM PRODUCER [$line]";
    }
}

The Example Run

As you can see from the output below, the consumer was able to process the work as it became available.

$ ./producer.pl | tee producer.log | ./consumer.pl
[ 09:22:20 ] : FROM PRODUCER [Iteration 1]
[ 09:22:20 ] : Processing file [/var/tmp/pro-co/944319.txt]
[ 09:22:20 ] : FROM PRODUCER [Iteration 2]
[ 09:22:23 ] : Processing file [/var/tmp/pro-co/141765.txt]
[ 09:22:23 ] : FROM PRODUCER [Iteration 3]
[ 09:22:27 ] : Processing file [/var/tmp/pro-co/463599.txt]
[ 09:22:27 ] : FROM PRODUCER [Iteration 4]
[ 09:22:28 ] : Processing file [/var/tmp/pro-co/423055.txt]
[ 09:22:28 ] : FROM PRODUCER [Iteration 5]
[ 09:22:28 ] : Processing file [/var/tmp/pro-co/233909.txt]
$ cat producer.log
[ 09:22:20 ] : Iteration 1
[ 09:22:20 ] : PRODUCED /var/tmp/pro-co/944319.txt
[ 09:22:20 ] : Iteration 2
[ 09:22:23 ] : PRODUCED /var/tmp/pro-co/141765.txt
[ 09:22:23 ] : Iteration 3
[ 09:22:27 ] : PRODUCED /var/tmp/pro-co/463599.txt
[ 09:22:27 ] : Iteration 4
[ 09:22:28 ] : PRODUCED /var/tmp/pro-co/423055.txt
[ 09:22:28 ] : Iteration 5
[ 09:22:28 ] : PRODUCED /var/tmp/pro-co/233909.txt

Read Full Post »

So Why is This Post on Ironman Perl?

Okay, I waded in too fast on Gabor’s post. Sorry Gabor. And you may be wondering who died and made me the Ironman Perl police1. No-one, but riddle me this: why is this WordPress related post on Perl Ironman?

It doesn’t have any perl related tags and neither was it posted in the perl category. I thought that perhaps there was a sneaky mention of Perl in the post. But no, find in page doesn’t pick anything up.

Apart from the blog title that is – Perls of Wisdom. Is the blog title included in the content that is checked for the word perl2?


1. No-one, I promise. In the next post hopefully I’ll be back to the occasionally useful perl snippets.

2. Although I guess it is reasonable not to think of this. Perl is not a dictionary word so you probably wouldn’t expect it in the name of a blog.

Read Full Post »

Gaming Ironman Perl

I subscribed to Planet Perl Iron Man in order to read posts relating to perl.

From the original announcement:

The rules are very simple: you blog, about Perl. Any aspect of Perl you like.

And ironman makes some effort to enforce that by subscribing, I will get what I am after by filtering out unrelated posts.

Of course it is very easy to game. But in traditional perlish style, we trust people not to game the system because we asked nicely, not because we have shotguns.

Frankly I have no idea how does this relate to Perl but I need it in the text so the IronMan will pick up the post.

I haven’t either. So why did you force this post which is completely unrelated to Perl onto Ironman Perl?

/Grumpy

Read Full Post »

What has been on my mind recently is if I write some software to do a task, say process some files and dump some other files to disk, how can that process inform another process that the files are ready?

Using Shell

There are millions of ways to do this. The easiest is probably to wrap both processes in a shell script:

create_files
process_created_files

The magic of unix means that process_created_files is triggered as soon as create_files exits.

This has a couple of problems. What if there were any errors in create_files? What if create_files takes a long time, and we want to start processing as soon as the first file has been created?

There is a great trick to connect two processes together using the shell (hint: using pipes) that I’m going to talk about next time. For now, let’s focus on the more heavy-weight options.

Using a Filewatcher

There are a few modules on CPAN that make it easy to build a filewatcher. A couple of examples are:

Maybe it is just me, but watching for file changes seems a bit 1970s, so moving swiftly on…

Using Third-Party Mechanisms

For connecting 1 producer to 1 consumer, I really like message queues. Apache has a crufy enterprise protocol called AMQP

RestMQ is a queue built on top of Redis.

Or you could use the Pub/Sub mechanism built into Redis directly.

ZeroMQ pretty much solves my problem without having to drop down to raw sockets.

# —

Unfortunately, none of these options are available to me. That brings me on to the raw socket options.

Using Raw Sockets

If the producer provides a server for interested consumers to connect to, you can have a bunch of consumers listening for events. Even better, you can add one layer of indirection and allow for many to many interactions.

An AnyEvent Notifier

The core of this code was taken from my Emulating POSIX Signals post.

use 5.010;

use strict;
use warnings;

package Notifier;

use constant CTRL_D => 4;
use constant DEBUG => $ENV{NOTIFIER_DEBUG};

my $cr = "\015\012";

use AnyEvent;
use AnyEvent::Handle;
use AnyEvent::Socket;

sub new
{
    my $class = shift;
    my $self = {
        pid => 0,
        watchers => {},
    };
    bless $self, $class;
    return $self;
}

sub disconnect_socket
{
    my ($self, $handle, $pid) = @_;
    $handle->on_drain(sub {
        say "disconnecting PID $pid...";
        $handle->destroy();
    });
    delete $self->{watchers}{$pid};
}

sub get_other_handles
{
    my ($self, $pid) = @_;
    return map { $self->{watchers}{$_}{handle} }
           grep { $_ != $pid } keys %{$self->{watchers}};
}

sub broadcast_line
{
    my ($self, $handles_ref, $line) = @_;
    foreach my $handle (@$handles_ref) {
        $handle->push_write($line . ${cr});
    }
}

sub watch_socket
{
    my ($self, $sock, $host, $port) = @_;

    # Closure variables
    my $pid = ++$self->{pid};

    my $handle; $handle = AnyEvent::Handle->new(
        fh => $sock,
        on_error => sub {
            say "Error: $! (PID $pid)";
            delete $self->{watchers}{$pid};
            $handle->destroy();
        },
        on_eof => sub {
            say "Disconnected (PID $pid)";
            delete $self->{watchers}{$pid};
            $handle->destroy();
        },
        on_read => sub {
            my $buffer = $handle->rbuf();
            $handle->rbuf() = '';

            if (length($buffer) == 1 and ord($buffer) == CTRL_D) {
                $handle->push_write('Received EOF.  ' .
                                    "Closing connection...${cr}");
                $self->disconnect_socket($handle, $pid);
            } else {
                my @handles = $self->get_other_handles($pid);
                foreach my $line (split /\r?\n/, $buffer) {
                    say "$host/$port : $line";
                    if (lc($line) =~ /quit|exit/) {
                        $self->disconnect_socket($handle, $pid);
                        return;
                    } else {
                        $self->broadcast_line(\@handles, $line);
                    }
                }
            }
        }
    );

    $handle->keepalive();
    $self->{watchers}{$pid} = { handle => $handle };
    $handle->push_write("Connected.  (PID $pid)${cr}");
}

sub prepare_handler
{
    my ($fh, $host, $port) = @_;
    DEBUG && warn "Listening on $host:$port\n";
}

sub _accept_handler
{
    my $self = shift;

    return sub {
        my ($sock, $peer_host, $peer_port) = @_;

        DEBUG && warn "Accepted connection from $peer_host:$peer_port\n";
        if (! $sock) {
            warn '$sock undefined' . "\n";
            return;
        }

        $self->watch_socket($sock, $peer_host, $peer_port);
    };
}

sub start_listen
{
    my ($self, $host, $port) = @_;

    $self->{server} = tcp_server($host,
                                 $port,
                                 $self->_accept_handler(),
                                 \&prepare_handler);
}

package main;

my $host = undef;
my $port = 12345;

my $kernel = Notifier->new();
$kernel->start_listen($host, $port);

AE::cv->recv();

Read Full Post »

Clever Emacs Lisp

Here’s another clever trick. Don’t embed the data together with the logic, as I do with my directory aliases. Save it in a separate file and load it in using (read).

Okay, it’s not that clever – I do it all the time in Perl using YAML and what-not. Slightly concerning, though, is that I didn’t automatically translate the knowledge gained from Perl to lisp. What was I thinking?

The code fragment is beautiful:

(defvar variable-name)

(with-temp-buffer
  (insert-file-contents-literally <filename>)
  (setq variable-name (read (current-buffer))))

Read Full Post »

Systems Monitoring

I almost missed this excellent post on system monitoring by (I think) Yanick Champoux. It’s a clever idea. The monitor has a bunch of TAP1 tests that check various parts of the system – e.g. whether diskspace on a particular partition has breached a certain threshold.

The tests are all run with Smolder, "a web-based continuous integration smoke server". This means that the results can be made available via email, RSS or Smolder’s web interface. Very neat.


1. Test Anything Protocol

Read Full Post »

IRC vs Planet Perl Iron Man

Mark Keating left a thoughtful comment on my throwaway post about Planet Perl Iron Man (PPIM).

IRC

I’m pretty introverted so IRC isn’t an ideal medium for me – it’s a direct conversation. Conversely, having a blog posted on PPIM sometime gets the message1 out to the important people and there is little need to interact with people directly ;) For me, PPIM is better than IRC.

Malicious or Careless Users


“the code does assume that we’re all honest and nice people and that we’ll use good tags and the correct date”

In this case, I’d assume it was user error rather than a deliberate and somewhat annoying attempt to monopolise the top of PPIM. Having said that, whenever I’ve needed to develop a webfacing app, I’ve noticed that assuming your users are going to play nice is the wrong thing to do. If even one of your thousands of users is malicious and effective, it’s going to ruin your day.

If I was responsible for implementing Planet Ironman, I wouldn’t have planned for a future dated post either. But having seen it, it should be fairly simple to discard any post with a date more than 1 day into the future.


1. In this case, the message wasn’t really important so finding the appropriate delivery channel didn’t really matter.

Read Full Post »

Follow

Get every new post delivered to your Inbox.