Feeds:
Posts
Comments

Archive for the ‘Programming’ Category

Parallel::Iterator

While looking at the Job Manager script from last week, I omitted the section where each job section of the batch returns the result to the manager.

The job serialises a hash containing the results to disk using Storable. When the jobs have all finished, the manager retrieves the data using the identifier.

my $results = {};
my $id = $manager->identifier();
foreach (>/tmp/*_$id.result<) {
    if (! m{^/tmp/(\d+)_}) {
        say "Error: unable to retrieve id from $_";
        next;
    }
    $results->{$1} = retrieve($_);
}

use Data::Dumper;
print Dumper($results);

Now it turns out, there is yet another handy cpan module called parallel::iterator, which can return the output of each job in an output list. (Under the covers, it has pipes between the processes and serialises the data between them using Storable).

And I was going to say, it would be nice if folks on Ironman talked about useful modules they came across from time to time.

Except they do already. dagolden already spoke about parallel::iterator here.

Wouldn’t it be handy if you could tag your ironman posts with a hashtag, like #cpanmodules and clicking on the hashtag would return the results?

Ironman: #cpanmodules #fork

Read Full Post »

Wanted: A guide to CPAN

The other day I was looking at a script that ran a bunch of more or less independent jobs in batches of four.

I’ve reproduced the core of the script as best as I can remember it.

Job

It has a class to represent the jobs themselves.

package Job;

use Moose;

has identifier => (
    is => 'ro',
    required => 1,
);

has cmd => (
    is => 'ro',
    required => 1,
);

no Moose;
__PACKAGE__->meta->make_immutable;

Job Manager

and a class that tries to ensure that 4 jobs are running in parallel wherever possible.

package JobManager;

use Moose;

use POSIX 'strftime';

has identifier => (
    is => 'ro',
    default => sub { strftime('%H%M%S', localtime(time())); },
);

has max_processes => (
    is => 'ro',
    default => 4,
);

has _job_id => (
    is => 'ro',
    writer => '_set_job_id',
    init_arg => undef,
    default => 1,
);

has queued_jobs => (
    is => 'ro',
    traits => ['Array'],
    isa => 'ArrayRef[Job]',
    default => sub { [] },
    handles => {
        enqueue_job => 'push',
        dequeue_job => 'shift',
        exist_queued_jobs => 'count',
    },
);

has running_jobs => (
    is => 'ro',
    traits => ['Hash'],
    isa => 'HashRef[Job]',
    default => sub { {} },
    handles => {
        add_running_job => 'set',
        delete_running_job => 'delete',
        num_jobs => 'count',
    },
);

sub next_job_id
{
    my $self = shift;
    my $job_id = $self->_job_id();
    $self->_set_job_id($job_id + 1);
    return sprintf "%02d", $job_id;
}

sub run_job
{
    my ($self, $job) = @_;

    my ($identifier, $cmd) = ($job->identifier(), $job->cmd());
    my $pid = fork();
    if (! defined($pid)) {
        say "Failed to run job $identifier";
    } elsif ($pid) {
        say "Running job $identifier ($pid)";
        $self->add_running_job($pid, $job);
    } else {
        system("$cmd > /tmp/$identifier.output 2>&1");
        exit;
    }
}

sub add_job
{
    my ($self, $name, $cmd) = @_;

    my $job = Job->new(
        identifier => (sprintf "%s_${name}_%s",
                               $self->next_job_id(), $self->identifier()),
        cmd => $cmd);

    if ($self->num_jobs() > $self->max_processes()) {
        $self->enqueue_job($job);
    } else {
        $self->run_job($job);
    }
}

sub main_loop
{
    my $self = shift;

    while (1) {
        my $pid = wait();
        last if ($pid < 0);
        say "Child $pid has exited";

        $self->delete_running_job($pid);
        while ($self->num_jobs() < $self->max_processes()) {
            last unless $self->exist_queued_jobs();
            $self->run_job($self->dequeue_job());
        }
    }
}

no Moose;
__PACKAGE__->meta->make_immutable;

Test Code

My test code to check if I got the code more or less correct.

my $manager = JobManager->new();

$manager->add_job('echo', 'sleep 10 ; echo hello');
for (1..9) {
    $manager->add_job('echo', 'sleep 2 ; echo hello');
}

$manager->main_loop();
jared@localhost $ ls -ltr /tmp/*echo*
-rw-r--r-- 1 jared jared 6 2011-07-03 19:32 /tmp/05_echo_193228.output
-rw-r--r-- 1 jared jared 6 2011-07-03 19:32 /tmp/04_echo_193228.output
-rw-r--r-- 1 jared jared 6 2011-07-03 19:32 /tmp/03_echo_193228.output
-rw-r--r-- 1 jared jared 6 2011-07-03 19:32 /tmp/02_echo_193228.output
-rw-r--r-- 1 jared jared 6 2011-07-03 19:32 /tmp/08_echo_193228.output
-rw-r--r-- 1 jared jared 6 2011-07-03 19:32 /tmp/07_echo_193228.output
-rw-r--r-- 1 jared jared 6 2011-07-03 19:32 /tmp/06_echo_193228.output
-rw-r--r-- 1 jared jared 6 2011-07-03 19:32 /tmp/10_echo_193228.output
-rw-r--r-- 1 jared jared 6 2011-07-03 19:32 /tmp/09_echo_193228.output
-rw-r--r-- 1 jared jared 6 2011-07-03 19:32 /tmp/01_echo_193228.output

Conclusion

I took two lessons away.


Parallel::Queue would have greatly simplified the core of this script. How many CPAN modules could my code benefit from equally if only I knew about them?


fork() is nice and easy to deal with. The code to manage the processes isn’t hugely complicated and seems pretty robust (careful, I may not have duplicated the robustness here).

Read Full Post »

Perl 6

The latest from chromatic (emphasis mine):

"If you think people don’t like Perl because the Perl 6 project started almost ten years ago, you haven’t been paying attention.

(Think Python has better marketing? Guido announced Python 3000 before Larry announced Perl 6, and it still took the better part of eight years for the Python developers to produce Python 3, and people are still upset that Python 3 is a wholesale replacement for Python 2, and there’s still a debate over when – and in some cases, if – major projects using Python will embrace Python 3 and abandon Python 2. Think about that.)"

Okay, I didn’t see any regret for shafting Perl 5 for the last 10 years, but great! I’m so happy that the opposition made the same stupid mistake that we did. </sarcasm>

Read Full Post »

Easy Unicode in Perl

Lots of discussion on Unicode in Perl recently.

(I didn’t see any retraction from Nelson, respectful or otherwise, though)

I can’t see what is wrong with the (brand newish) utf8 pragma approach especially when all the issues are ironed out. You don’t want to break backwards compatibility and give folks any more reason not to upgrade to later versions of Perl.

Read Full Post »

Does Perl need a JIT?

Casey Randberger dropped by to point out that Squeak Smalltalk now has a JIT.

I don’t care about Squeak1, but very occasionally I think it would be nice if Perl got a JIT.

However, I don’t think it would make sense to develop a JIT for Perl.

  • People don’t choose Perl for its speed
  • The reason folks switch from Perl to a competitor2 is not because the competitor is faster (because it isn’t)
  • Even for Python, everyone uses CPython rather than speedy implementation

There’s a lot of reasons I choose Perl, but a JIT would not increase the number of projects where I reach for Perl over C++.


1. It’s not pragmatic enough for me – who would employ me to write Squeak for example

2. Python and Ruby

Read Full Post »

Streaming Plack

Thanks to Douglas and Jakub for pointing me at the appropriate part of the PSGI spec for streaming.

Miyagawa mentions that plack streaming can be blocking, and running this with Starman demonstrates that nicely.

my $app = sub {
    my $env = shift;

    return sub {
        my $respond = shift;
        my $writer = $respond->([200, ['Content-Type', 'text/html']]);

        for (1..5) {
            my $dt = localtime;
            $writer->write("[ $dt ]
\n"
); sleep 2; } $writer->close(); }; };

Starman attributes include:

'psgi.streaming' => 1,
'psgi.nonblocking' => ''

Firing off two requests at the same time, the second doesn’t start until the first completes.

[ Mon May 30 18:56:16 2011 ]
[ Mon May 30 18:56:18 2011 ]
[ Mon May 30 18:56:20 2011 ]
[ Mon May 30 18:56:22 2011 ]
[ Mon May 30 18:56:24 2011 ]
[ Mon May 30 18:56:26 2011 ]
[ Mon May 30 18:56:28 2011 ]
[ Mon May 30 18:56:30 2011 ]
[ Mon May 30 18:56:32 2011 ]
[ Mon May 30 18:56:34 2011 ]

Read Full Post »

Even for a curmudgeon like me who doesn’t make much use of object orientation, Moose offers something which supports my programming with types style: a degree of run-time type checking and the ability to create a range of convenience functions with very little code.

Say I have a person who makes a bunch of orders consisting of a bunch of items. My hashref might look something like this:

my $data = {
    fred => {
        orders => [
            {
            order_id => 'fred1',
            items => [
                { description => 'roses' },
            ],
        },
            {
            order_id => 'fred1',
            items => [
                { description => 'one true ring' },
            ],

        },
        ],
    },
};

Then I’ll probably need a bunch of convenience functions to make sure I’m adding items to orders rather than people.

In Moose, including the type checking, that looks like:

Item

package Item;
use Moose;

has description => (
    is => 'ro',
    isa => 'Str',
);

no Moose;
__PACKAGE__->meta->make_immutable;

Order

package Order;
use Moose;

has order_id => (
    is => 'ro',
    isa => 'Str',
    required => 1,
);

has items => (
    is => 'rw',
    isa => 'ArrayRef[Item]',
    default => sub { [] },
    traits  => ['Array'],
    handles => {
        add_item  => 'push',
        get_items => 'elements',
    },
);

no Moose;
__PACKAGE__->meta->make_immutable;

Person

package Person;
use Moose;

has name => (
    is => 'ro',
    isa => 'Str',
    required => 1,
);

has orders =>
    is => 'rw',
    isa => 'ArrayRef[Order]',
    default => sub { [] },
    traits  => ['Array'],
    handles => {
        add_order  => 'push',
        get_orders => 'elements',
    },


no Moose;
__PACKAGE__->meta->make_immutable;

Adding an item as an order gives a nice error message:

my $fred = Person->new(name => 'fred');
my $item = Item->new(description => 'One true ring');
$fred->add_order($item);
$ perl moose-arrays.pl
A new member value for orders does not pass its type constraint because:
Validation failed for 'Order' with value Item=HASH(0x9ad3ec8)
(not isa Order) at moose-arrays.pl line 63

Moose ensures you pass an order to add_order(…).

my $order1 = Order->new(order_id => 'fred1');
my $item = Item->new(description => 'One true ring');
$order1->add_item($item);
$fred->add_order($order1);

my $order2 = Order->new(order_id => 'fred2');
$fred->add_order($order2);

use Data::Dumper;
print Dumper($fred);
$VAR1 = bless( {
                 'orders' => [
                               bless( {
                                        'order_id' => 'fred1',
                                        'items' => [
                                                     bless( {
                                                              'description' => 'One true ring'
                                                            }, 'Item' )
                                                   ]
                                      }, 'Order' ),
                               bless( {
                                        'order_id' => 'fred2',
                                        'items' => []
                                      }, 'Order' )
                             ],
                 'name' => 'fred'
               }, 'Person' );

I like it when someone else has already written the code I would otherwise need to write myself.

Read Full Post »

« Newer Posts - Older Posts »

Follow

Get every new post delivered to your Inbox.