Feeds:
Posts
Comments

Archive for the ‘Perl’ Category

IPC::ConcurrencyLimit

(This is just a note for myself).

IPC::ConcurrencyLimit is a handy module for implementing a number of concurrency patterns.

Steffan Mueller mentioned it in a comment on my blog back in 2011. Since then, there have been a couple of articles about it on the Booking.com dev blog

Read Full Post »

Growing Perl

Chris Wellons has an interesting comment on Guy Steele’s classic Growing a Language which made me think of Perl:

  • The point about a more mature version of a language failing vs an earlier version can’t be right. For example, if Perl 4 and Perl 5 were released at the same time, which would people choose?
  • On the other hand, obviously Perl 5 would not exist, because of the lack of evolution, if it hadn’t been for Perl 4. I think I saw a speech by Larry where he said he had tried to lay grass down (e.g. symbol table hacking) for others to turn into sidewalks (e.g object orientation).
  • Perl 5 is nicely designed for evolution. Take the Try::Tiny module for example. How many commercially acceptable languages could you add Try/Catch/Finally too if it wasn’t already baked into the language?

Read Full Post »

Perl “Not Generators”

Reading through the Generators page on the Python wiki inspired me to knock up something almost completely unlike generators using closures (something Perl has that Python hasn’t…) just for fun.

Edit: correction, thanks Bernhard.

Standard preamble…

use strict;
use warnings;

The count() not-generator

sub count
{
    my $cnt = -1;
    return sub {
        return ++$cnt;
    };
}

Helper function to make it easier to make not-generators

sub make_generator (&)
{
    my $sub = $_[0];
    my $finished = 0;
    return sub {
        return undef if $finished;
        local $_ = $sub->();
        $finished = 1 unless defined $_;
        return $_;
    };
}

Composing not-generators

sub compose (&$)
{
    my ($op, $generator) = @_;
    return sub {
        local $_ = $generator->();
        return undef unless defined $_;
        return $op->($_);
    };
}

Takewhile …

sub takewhile (&$)
{
    my ($match, $generator) = @_;
    return make_generator {
        local $_ = $generator->();
        return $match->($_) ? $_ : undef;
    };
}

Foreach not-generator

I have to implement my own looping of course.

sub generator_for (&$)
{
    my ($fn, $generator) = @_;
    while (defined(my $ret = $generator->())) {
        $fn->($ret);
    }
}

Whew. And finally, after all that I can do the squares thing.

my $squares = compose { $_ * $_ } count();
my $bounded_squares = takewhile { $_ < 100 } $squares;
generator_for { print @_, "\n" } $bounded_squares;

Read Full Post »

Stop me if you’ve heard this one:

A Perl programmer and a Python programmer walk into a bar.

Python dev says “why are you using Perl, Python is much clearer”

“What do you mean”, says Perl dev, “how is it clearer?”

“It’s obvious innit,” says Python dev. It’s cleaner and better It’s got, er, objects and stuff.

C/C++ integration aside, has anyone got anything more meaningful than “cleaner and better” ? I’m genuinely curious.

Read Full Post »

Parallel::Iterator

While looking at the Job Manager script from last week, I omitted the section where each job section of the batch returns the result to the manager.

The job serialises a hash containing the results to disk using Storable. When the jobs have all finished, the manager retrieves the data using the identifier.

my $results = {};
my $id = $manager->identifier();
foreach (>/tmp/*_$id.result<) {
    if (! m{^/tmp/(\d+)_}) {
        say "Error: unable to retrieve id from $_";
        next;
    }
    $results->{$1} = retrieve($_);
}

use Data::Dumper;
print Dumper($results);

Now it turns out, there is yet another handy cpan module called parallel::iterator, which can return the output of each job in an output list. (Under the covers, it has pipes between the processes and serialises the data between them using Storable).

And I was going to say, it would be nice if folks on Ironman talked about useful modules they came across from time to time.

Except they do already. dagolden already spoke about parallel::iterator here.

Wouldn’t it be handy if you could tag your ironman posts with a hashtag, like #cpanmodules and clicking on the hashtag would return the results?

Ironman: #cpanmodules #fork

Read Full Post »

Wanted: A guide to CPAN

The other day I was looking at a script that ran a bunch of more or less independent jobs in batches of four.

I’ve reproduced the core of the script as best as I can remember it.

Job

It has a class to represent the jobs themselves.

package Job;

use Moose;

has identifier => (
    is => 'ro',
    required => 1,
);

has cmd => (
    is => 'ro',
    required => 1,
);

no Moose;
__PACKAGE__->meta->make_immutable;

Job Manager

and a class that tries to ensure that 4 jobs are running in parallel wherever possible.

package JobManager;

use Moose;

use POSIX 'strftime';

has identifier => (
    is => 'ro',
    default => sub { strftime('%H%M%S', localtime(time())); },
);

has max_processes => (
    is => 'ro',
    default => 4,
);

has _job_id => (
    is => 'ro',
    writer => '_set_job_id',
    init_arg => undef,
    default => 1,
);

has queued_jobs => (
    is => 'ro',
    traits => ['Array'],
    isa => 'ArrayRef[Job]',
    default => sub { [] },
    handles => {
        enqueue_job => 'push',
        dequeue_job => 'shift',
        exist_queued_jobs => 'count',
    },
);

has running_jobs => (
    is => 'ro',
    traits => ['Hash'],
    isa => 'HashRef[Job]',
    default => sub { {} },
    handles => {
        add_running_job => 'set',
        delete_running_job => 'delete',
        num_jobs => 'count',
    },
);

sub next_job_id
{
    my $self = shift;
    my $job_id = $self->_job_id();
    $self->_set_job_id($job_id + 1);
    return sprintf "%02d", $job_id;
}

sub run_job
{
    my ($self, $job) = @_;

    my ($identifier, $cmd) = ($job->identifier(), $job->cmd());
    my $pid = fork();
    if (! defined($pid)) {
        say "Failed to run job $identifier";
    } elsif ($pid) {
        say "Running job $identifier ($pid)";
        $self->add_running_job($pid, $job);
    } else {
        system("$cmd > /tmp/$identifier.output 2>&1");
        exit;
    }
}

sub add_job
{
    my ($self, $name, $cmd) = @_;

    my $job = Job->new(
        identifier => (sprintf "%s_${name}_%s",
                               $self->next_job_id(), $self->identifier()),
        cmd => $cmd);

    if ($self->num_jobs() > $self->max_processes()) {
        $self->enqueue_job($job);
    } else {
        $self->run_job($job);
    }
}

sub main_loop
{
    my $self = shift;

    while (1) {
        my $pid = wait();
        last if ($pid < 0);
        say "Child $pid has exited";

        $self->delete_running_job($pid);
        while ($self->num_jobs() < $self->max_processes()) {
            last unless $self->exist_queued_jobs();
            $self->run_job($self->dequeue_job());
        }
    }
}

no Moose;
__PACKAGE__->meta->make_immutable;

Test Code

My test code to check if I got the code more or less correct.

my $manager = JobManager->new();

$manager->add_job('echo', 'sleep 10 ; echo hello');
for (1..9) {
    $manager->add_job('echo', 'sleep 2 ; echo hello');
}

$manager->main_loop();
jared@localhost $ ls -ltr /tmp/*echo*
-rw-r--r-- 1 jared jared 6 2011-07-03 19:32 /tmp/05_echo_193228.output
-rw-r--r-- 1 jared jared 6 2011-07-03 19:32 /tmp/04_echo_193228.output
-rw-r--r-- 1 jared jared 6 2011-07-03 19:32 /tmp/03_echo_193228.output
-rw-r--r-- 1 jared jared 6 2011-07-03 19:32 /tmp/02_echo_193228.output
-rw-r--r-- 1 jared jared 6 2011-07-03 19:32 /tmp/08_echo_193228.output
-rw-r--r-- 1 jared jared 6 2011-07-03 19:32 /tmp/07_echo_193228.output
-rw-r--r-- 1 jared jared 6 2011-07-03 19:32 /tmp/06_echo_193228.output
-rw-r--r-- 1 jared jared 6 2011-07-03 19:32 /tmp/10_echo_193228.output
-rw-r--r-- 1 jared jared 6 2011-07-03 19:32 /tmp/09_echo_193228.output
-rw-r--r-- 1 jared jared 6 2011-07-03 19:32 /tmp/01_echo_193228.output

Conclusion

I took two lessons away.


Parallel::Queue would have greatly simplified the core of this script. How many CPAN modules could my code benefit from equally if only I knew about them?


fork() is nice and easy to deal with. The code to manage the processes isn’t hugely complicated and seems pretty robust (careful, I may not have duplicated the robustness here).

Read Full Post »

Perl 6

The latest from chromatic (emphasis mine):

"If you think people don’t like Perl because the Perl 6 project started almost ten years ago, you haven’t been paying attention.

(Think Python has better marketing? Guido announced Python 3000 before Larry announced Perl 6, and it still took the better part of eight years for the Python developers to produce Python 3, and people are still upset that Python 3 is a wholesale replacement for Python 2, and there’s still a debate over when – and in some cases, if – major projects using Python will embrace Python 3 and abandon Python 2. Think about that.)"

Okay, I didn’t see any regret for shafting Perl 5 for the last 10 years, but great! I’m so happy that the opposition made the same stupid mistake that we did. </sarcasm>

Read Full Post »

« Newer Posts - Older Posts »

Follow

Get every new post delivered to your Inbox.