Feeds:
Posts
Comments

Archive for February, 2011

Replacing sed

I’ve spoken about processing logfiles with perl previously. Occasionally though, I still reach for sed.

Say I have a logfile that looks like this:

[ <timestamp> ] : somefunc()
[ <timestamp> ] : interesting line 1
[ <timestamp> ] : interesting line 2
... 1000s of lines
[ <timestamp> ] : somefunc()
[ <timestamp> ] : interesting line 1
[ <timestamp> ] : interesting line 2
... 1000s of lines

Picking out lines following a pattern is easy with sed - p prints the current match and n takes the next line.

$ < log.txt sed -n '/somefunc()/ {p;n;p;n;p}'
[ <timestamp> ] : somefunc()
[ <timestamp> ] : interesting line 1
[ <timestamp> ] : interesting line 2
[ <timestamp> ] : somefunc()
[ <timestamp> ] : interesting line 1
[ <timestamp> ] : interesting line 2

My first attempt to replace that with perl looks a bit ugly

< log.txt \
perl -ne 'if (/somefunc\(\)/) {print; print scalar(<>) for (1..2)}'

I'm not that happy with the module I came up with to hide the messiness either.

package Logfiles;

require Exporter;
our @ISA = qw(Exporter);
our @EXPORT_OK = qw(process);

use Carp;

sub process (&$;$)
{
    my ($sub, $regex, $lines) = @_;
    $lines ||= 0;
    return unless /$regex/;
    if (! $lines) {
        $sub->($_);
    } else {
        croak "process() arg 3 not ref ARRAY" unless ref($lines) eq 'ARRAY';
        my $line = 0;
        my @lines = @$lines;
        while (1) {
            if ($lines[0] == $line) {
                $sub->($_);
                shift @lines;
            }
            ++$line;
            last if ($line > $lines[0] or (not @lines));
            $_ = <>;
        }
    }
}

1;

But at least my typical spelunking looks a little cleaner now.

< log.txt perl -MLogfiles=process -ne \
    'process { print } qr/somefunc\(\)/, [0..2]'

Any suggestions on how to improve this (without reverting to sed)?

Read Full Post »

Moose Types

Rants about generalisations notwithstanding, I’m a fan of typeful programming (I’m sure I’d love Ada). For a script that will be moderately complex, I like to sit down and think about the types I’m going to use before I start.

Any library that will enable me to specify my types more precisely and concisely is obviously a win.

And speaking of Moose…

Moose has a bunch of methods to specify your types and a built-in type hierarchy available that you can build off.

Abusing an example I’ve used before:

use Moose;
use Moose::Util::TypeConstraints;
use MooseX::Params::Validate;

subtype 'LegalDrinkingAge'
    => as 'Int'
    => where { $_ >= 18 }
;

coerce 'LegalDrinkingAge'
    => from 'Int'
    => via { 1 }
;

sub can_legally_drink
{
    my $age = pos_validated_list(
        \@_,
        { isa => 'LegalDrinkingAge' },
    );

    return 1;
}

print can_legally_drink(18), "\n";
print can_legally_drink(17), "\n";

Checking for a LegalDrinkingAge type here is obviously the wrong thing to do, but for the purposes of the example it will do.

The resulting error is fine, if a little ugly.

$ perl5.10.1 moose-types.pl
1
Parameter #1 ("17") to main::can_legally_drink did not pass the 'checking type constraint for LegalDrinkingAge' callback
 at /u/packages/lib/perl5/site_perl/5.10.1/MooseX/Params/Validate.pm line 168
        MooseX::Params::Validate::pos_validated_list('ARRAY(0x976ce40)', 'HASH(0x911b6f8)') called at moose-types.pl line 24
        main::can_legally_drink(17) called at moose-types.pl line 33

Read Full Post »

Zawinski not withstanding, regular expressions are a hugely useful addition to a programmer’s toolbox. I learnt about them from Jeffrey Friedl’s excellent Mastering Regular Expressions1.

Why am I thinking about this now?

So, I was asked to review a perl script and it was pretty good, apart from the fact that the main regex2 didn’t quite match all the expected patterns.

The line to match was:

<some_tag>quickbrownfox</some_tag>

Other valid values within the tags included quick,brown,fox or quick brown fox.

And the regex in the script was:

m{\<some_tag\>(\w+)\<\/some_tag\>}

As you can see, there were also a few unnecessary backslashes.

I suggested [^<]+ instead of \w+ with my reasons.

The developer changed the regex to:

m{\<some_tag\>\s*(\w+)\s*\<\/some_tag\>}

Okay slightly better, but it still didn't match all the possibilities.

I pointed out a valid pattern it didn't match and asked again for [^<]+.

This was the result:

m{\<some_tag\>(\s*[^<]+\s*)\<\/some_tag\>}

Okay, fine. The \s space matchers are redundant, but at least it covers everything.

So basically, if you find yourself writing a bunch of regex regularly, and you don't really understand it, you could do worse than read Friedl's book.


1. Not an affiliate link

2. I know... "parsing" xml with regex will unleash cuthulu or something. It's not my script.

Read Full Post »

Living Without Emacs

I have integrated emacs into most parts of my daily workflow. In a typical day I might need to:

Some of these I have aliases for in my shell already.

A number of database GUIs have template functionality built-in or I could probably substitute the sql templates with scripts if I was sufficiently creative with the command line args.

And if I kept my TODO/Notes list in a wiki, it might be generally useful to the rest of my team.

Yes, thinking about it, I could live without Emacs. But life would be much less pleasant.

Read Full Post »

Follow

Get every new post delivered to your Inbox.