Feeds:
Posts
Comments

Archive for the ‘Perl’ Category

Stop me if you’ve heard this one:

A Perl programmer and a Python programmer walk into a bar.

Python dev says “why are you using Perl, Python is much clearer”

“What do you mean”, says Perl dev, “how is it clearer?”

“It’s obvious innit,” says Python dev. It’s cleaner and better It’s got, er, objects and stuff.

C/C++ integration aside, has anyone got anything more meaningful than “cleaner and better” ? I’m genuinely curious.

Read Full Post »

Parallel::Iterator

While looking at the Job Manager script from last week, I omitted the section where each job section of the batch returns the result to the manager.

The job serialises a hash containing the results to disk using Storable. When the jobs have all finished, the manager retrieves the data using the identifier.

my $results = {};
my $id = $manager->identifier();
foreach (>/tmp/*_$id.result<) {
    if (! m{^/tmp/(\d+)_}) {
        say "Error: unable to retrieve id from $_";
        next;
    }
    $results->{$1} = retrieve($_);
}

use Data::Dumper;
print Dumper($results);

Now it turns out, there is yet another handy cpan module called parallel::iterator, which can return the output of each job in an output list. (Under the covers, it has pipes between the processes and serialises the data between them using Storable).

And I was going to say, it would be nice if folks on Ironman talked about useful modules they came across from time to time.

Except they do already. dagolden already spoke about parallel::iterator here.

Wouldn’t it be handy if you could tag your ironman posts with a hashtag, like #cpanmodules and clicking on the hashtag would return the results?

Ironman: #cpanmodules #fork

Read Full Post »

Wanted: A guide to CPAN

The other day I was looking at a script that ran a bunch of more or less independent jobs in batches of four.

I’ve reproduced the core of the script as best as I can remember it.

Job

It has a class to represent the jobs themselves.

package Job;

use Moose;

has identifier => (
    is => 'ro',
    required => 1,
);

has cmd => (
    is => 'ro',
    required => 1,
);

no Moose;
__PACKAGE__->meta->make_immutable;

Job Manager

and a class that tries to ensure that 4 jobs are running in parallel wherever possible.

package JobManager;

use Moose;

use POSIX 'strftime';

has identifier => (
    is => 'ro',
    default => sub { strftime('%H%M%S', localtime(time())); },
);

has max_processes => (
    is => 'ro',
    default => 4,
);

has _job_id => (
    is => 'ro',
    writer => '_set_job_id',
    init_arg => undef,
    default => 1,
);

has queued_jobs => (
    is => 'ro',
    traits => ['Array'],
    isa => 'ArrayRef[Job]',
    default => sub { [] },
    handles => {
        enqueue_job => 'push',
        dequeue_job => 'shift',
        exist_queued_jobs => 'count',
    },
);

has running_jobs => (
    is => 'ro',
    traits => ['Hash'],
    isa => 'HashRef[Job]',
    default => sub { {} },
    handles => {
        add_running_job => 'set',
        delete_running_job => 'delete',
        num_jobs => 'count',
    },
);

sub next_job_id
{
    my $self = shift;
    my $job_id = $self->_job_id();
    $self->_set_job_id($job_id + 1);
    return sprintf "%02d", $job_id;
}

sub run_job
{
    my ($self, $job) = @_;

    my ($identifier, $cmd) = ($job->identifier(), $job->cmd());
    my $pid = fork();
    if (! defined($pid)) {
        say "Failed to run job $identifier";
    } elsif ($pid) {
        say "Running job $identifier ($pid)";
        $self->add_running_job($pid, $job);
    } else {
        system("$cmd > /tmp/$identifier.output 2>&1");
        exit;
    }
}

sub add_job
{
    my ($self, $name, $cmd) = @_;

    my $job = Job->new(
        identifier => (sprintf "%s_${name}_%s",
                               $self->next_job_id(), $self->identifier()),
        cmd => $cmd);

    if ($self->num_jobs() > $self->max_processes()) {
        $self->enqueue_job($job);
    } else {
        $self->run_job($job);
    }
}

sub main_loop
{
    my $self = shift;

    while (1) {
        my $pid = wait();
        last if ($pid < 0);
        say "Child $pid has exited";

        $self->delete_running_job($pid);
        while ($self->num_jobs() < $self->max_processes()) {
            last unless $self->exist_queued_jobs();
            $self->run_job($self->dequeue_job());
        }
    }
}

no Moose;
__PACKAGE__->meta->make_immutable;

Test Code

My test code to check if I got the code more or less correct.

my $manager = JobManager->new();

$manager->add_job('echo', 'sleep 10 ; echo hello');
for (1..9) {
    $manager->add_job('echo', 'sleep 2 ; echo hello');
}

$manager->main_loop();
jared@localhost $ ls -ltr /tmp/*echo*
-rw-r--r-- 1 jared jared 6 2011-07-03 19:32 /tmp/05_echo_193228.output
-rw-r--r-- 1 jared jared 6 2011-07-03 19:32 /tmp/04_echo_193228.output
-rw-r--r-- 1 jared jared 6 2011-07-03 19:32 /tmp/03_echo_193228.output
-rw-r--r-- 1 jared jared 6 2011-07-03 19:32 /tmp/02_echo_193228.output
-rw-r--r-- 1 jared jared 6 2011-07-03 19:32 /tmp/08_echo_193228.output
-rw-r--r-- 1 jared jared 6 2011-07-03 19:32 /tmp/07_echo_193228.output
-rw-r--r-- 1 jared jared 6 2011-07-03 19:32 /tmp/06_echo_193228.output
-rw-r--r-- 1 jared jared 6 2011-07-03 19:32 /tmp/10_echo_193228.output
-rw-r--r-- 1 jared jared 6 2011-07-03 19:32 /tmp/09_echo_193228.output
-rw-r--r-- 1 jared jared 6 2011-07-03 19:32 /tmp/01_echo_193228.output

Conclusion

I took two lessons away.


Parallel::Queue would have greatly simplified the core of this script. How many CPAN modules could my code benefit from equally if only I knew about them?


fork() is nice and easy to deal with. The code to manage the processes isn’t hugely complicated and seems pretty robust (careful, I may not have duplicated the robustness here).

Read Full Post »

Perl 6

The latest from chromatic (emphasis mine):

"If you think people don’t like Perl because the Perl 6 project started almost ten years ago, you haven’t been paying attention.

(Think Python has better marketing? Guido announced Python 3000 before Larry announced Perl 6, and it still took the better part of eight years for the Python developers to produce Python 3, and people are still upset that Python 3 is a wholesale replacement for Python 2, and there’s still a debate over when – and in some cases, if – major projects using Python will embrace Python 3 and abandon Python 2. Think about that.)"

Okay, I didn’t see any regret for shafting Perl 5 for the last 10 years, but great! I’m so happy that the opposition made the same stupid mistake that we did. </sarcasm>

Read Full Post »

Easy Unicode in Perl

Lots of discussion on Unicode in Perl recently.

(I didn’t see any retraction from Nelson, respectful or otherwise, though)

I can’t see what is wrong with the (brand newish) utf8 pragma approach especially when all the issues are ironed out. You don’t want to break backwards compatibility and give folks any more reason not to upgrade to later versions of Perl.

Read Full Post »

Does Perl need a JIT?

Casey Randberger dropped by to point out that Squeak Smalltalk now has a JIT.

I don’t care about Squeak1, but very occasionally I think it would be nice if Perl got a JIT.

However, I don’t think it would make sense to develop a JIT for Perl.

  • People don’t choose Perl for its speed
  • The reason folks switch from Perl to a competitor2 is not because the competitor is faster (because it isn’t)
  • Even for Python, everyone uses CPython rather than speedy implementation

There’s a lot of reasons I choose Perl, but a JIT would not increase the number of projects where I reach for Perl over C++.


1. It’s not pragmatic enough for me – who would employ me to write Squeak for example

2. Python and Ruby

Read Full Post »

Streaming Plack

Thanks to Douglas and Jakub for pointing me at the appropriate part of the PSGI spec for streaming.

Miyagawa mentions that plack streaming can be blocking, and running this with Starman demonstrates that nicely.

my $app = sub {
    my $env = shift;

    return sub {
        my $respond = shift;
        my $writer = $respond->([200, ['Content-Type', 'text/html']]);

        for (1..5) {
            my $dt = localtime;
            $writer->write("[ $dt ]
\n"
); sleep 2; } $writer->close(); }; };

Starman attributes include:

'psgi.streaming' => 1,
'psgi.nonblocking' => ''

Firing off two requests at the same time, the second doesn’t start until the first completes.

[ Mon May 30 18:56:16 2011 ]
[ Mon May 30 18:56:18 2011 ]
[ Mon May 30 18:56:20 2011 ]
[ Mon May 30 18:56:22 2011 ]
[ Mon May 30 18:56:24 2011 ]
[ Mon May 30 18:56:26 2011 ]
[ Mon May 30 18:56:28 2011 ]
[ Mon May 30 18:56:30 2011 ]
[ Mon May 30 18:56:32 2011 ]
[ Mon May 30 18:56:34 2011 ]

Read Full Post »

Even for a curmudgeon like me who doesn’t make much use of object orientation, Moose offers something which supports my programming with types style: a degree of run-time type checking and the ability to create a range of convenience functions with very little code.

Say I have a person who makes a bunch of orders consisting of a bunch of items. My hashref might look something like this:

my $data = {
    fred => {
        orders => [
            {
            order_id => 'fred1',
            items => [
                { description => 'roses' },
            ],
        },
            {
            order_id => 'fred1',
            items => [
                { description => 'one true ring' },
            ],

        },
        ],
    },
};

Then I’ll probably need a bunch of convenience functions to make sure I’m adding items to orders rather than people.

In Moose, including the type checking, that looks like:

Item

package Item;
use Moose;

has description => (
    is => 'ro',
    isa => 'Str',
);

no Moose;
__PACKAGE__->meta->make_immutable;

Order

package Order;
use Moose;

has order_id => (
    is => 'ro',
    isa => 'Str',
    required => 1,
);

has items => (
    is => 'rw',
    isa => 'ArrayRef[Item]',
    default => sub { [] },
    traits  => ['Array'],
    handles => {
        add_item  => 'push',
        get_items => 'elements',
    },
);

no Moose;
__PACKAGE__->meta->make_immutable;

Person

package Person;
use Moose;

has name => (
    is => 'ro',
    isa => 'Str',
    required => 1,
);

has orders =>
    is => 'rw',
    isa => 'ArrayRef[Order]',
    default => sub { [] },
    traits  => ['Array'],
    handles => {
        add_order  => 'push',
        get_orders => 'elements',
    },


no Moose;
__PACKAGE__->meta->make_immutable;

Adding an item as an order gives a nice error message:

my $fred = Person->new(name => 'fred');
my $item = Item->new(description => 'One true ring');
$fred->add_order($item);
$ perl moose-arrays.pl
A new member value for orders does not pass its type constraint because:
Validation failed for 'Order' with value Item=HASH(0x9ad3ec8)
(not isa Order) at moose-arrays.pl line 63

Moose ensures you pass an order to add_order(…).

my $order1 = Order->new(order_id => 'fred1');
my $item = Item->new(description => 'One true ring');
$order1->add_item($item);
$fred->add_order($order1);

my $order2 = Order->new(order_id => 'fred2');
$fred->add_order($order2);

use Data::Dumper;
print Dumper($fred);
$VAR1 = bless( {
                 'orders' => [
                               bless( {
                                        'order_id' => 'fred1',
                                        'items' => [
                                                     bless( {
                                                              'description' => 'One true ring'
                                                            }, 'Item' )
                                                   ]
                                      }, 'Order' ),
                               bless( {
                                        'order_id' => 'fred2',
                                        'items' => []
                                      }, 'Order' )
                             ],
                 'name' => 'fred'
               }, 'Person' );

I like it when someone else has already written the code I would otherwise need to write myself.

Read Full Post »

Plack Vs FastCGI

I’ve been sold on Plack for a while. This clarifies a part of the reason nicely1.

If you design a fibonacci() function, would you make it print the result to STDOUT, or return the result as a return value?

(It’s referring to the PSGI spec rather than Plack, but I figure the latter needs press a bit more)

Quick thought though – if a page takes a long time to generate, the stdout technique can deliver a bit at a time. Is there a plack plugin for that, or is the answer to go for ajax these day and deliver the page outline asap?


1. Namely that printing to stdout always seemed like a horrible and inefficient hack.

Read Full Post »

One thing I like about developing the emacs environment (i.e. writing emacs lisp) is programming in the image. In contrast when I run a perl script, any data loaded into the script vanishes when the script exits.

my $image = {};
populate_particular_set_of_people($image);
populate_particular_set_of_orders($image);
# ... more populate functions ...
produce_report($image);

If the populate functions take a long time to run and there is a syntax error in one of them, I might waste a lot of time waiting for data to be loaded over and over and potentially annoy my DBA.

Hence my thinking about REPLs earlier in the year.

Instead, I can mitigate this problem and support exploratory programming by serialising my data structure to disk after each section has been loaded.

use Storable;

use constant IMAGE_FILE => '/tmp/.jareds-image-RANDOMTEXT';

sub populate_particular_set_of_people
{
    my $image = shift;

    return if (exists $image->{'store'}{'people'});

    # Get the set of people

    store $image->{'store'}, IMAGE_FILE;
}

sub populate_particular_set_of_orders
{
    my $image = shift;

    return if (exists $image->{'store'}{'orders'});

    # Get the set of orders

    store $image->{'store'}, IMAGE_FILE;
}

my $image = {};
$image = retrieve(IMAGE_FILE) if -f IMAGE_FILE;

populate_particular_set_of_people($image);
populate_particular_set_of_orders($image);
# more populate functions
produce_report($image);

Error handling has been elided.

I’m only storing part of the hashref, $image->{'store'} in this example. This is fairly typical of my standard use case for this technique – quick one off reports. Some of the data is slow to load in from the database so I persist it to disk. The rest is calculated based on the loaded data so I don’t persist it. I keep all of the data together so I can pass it to each subroutine as a single parameter.

Of course, I wouldn’t recommend this technique for a production system.

Read Full Post »

« Newer Posts - Older Posts »

Follow

Get every new post delivered to your Inbox.