Feeds:
Posts
Comments

Archive for August, 2010

Perl Compiler Speed – It’s Fast!

As a developer who is not responsible for many infrastructure modules, I shouldn’t really waste so much of time micro-benchmarking. Profiling my code after the fact should be sufficient if even that is necessary.

But it seems I’ve got into a bad habit.


I’m not the only person that thinks that MooseX loading all those pre-requisites is what is taking a lot of time. Dave Rolsky says in the latest Moose blog post:

What’s a bit sad, however, is that this only appears to save 6-10% of the compilation time in real usage. I’m not sure why that is, but I think the issue is that the perl core’s compilation time may be a big factor. As I said, loading that one Markdent modules loads a lot of modules (203 individual .pm files), and the raw overhead of simply reading all those files and compiling them may be a significant chunk of the compilation time.

That gets me thinking about the speed of the perl compilation step. Although using eval doesn’t have the same disk IO as use, it still exercises the compiler.

Loop Unrolling with Eval

Does anyone remember back in the day when we used to unroll loops? For example, if we had an instruction sequence that was do_something, add, jne and do_something was relatively short, you could save some time unrolling that to do_something, add, do_something, add, jne – you would only have to execute approximately half as many jne instructions across a loop lifetime.

That is something I never worry about in perl. Unless I have to come up with contrived examples for my blog.

sub eval_loop
{
    my $count = 0;
    my $s = '';
    for (my $i = 0; $i < 100_000; ++$i) {
        $s .= '$count += ' . $i . ";\n";
    }
    eval $s;
    $count;
}

sub vanilla_loop
{
    my $count = 0;
    for (my $i = 0; $i < 100_000; ++$i) {
        $count += $i;
    }
    $count;
}

my $code = '';
for (my $i = 0; $i < 100_000; ++$i) {
    $code .= '$count += ' . $i . ";\n";
}

$code = '
sub totally_unrolled {
    my $count = 0;' . "\n" . $code . '
    $count;
};';

eval $code;

eval_loop compiles the unrolled loop every time the subroutine is called whereas totally_unrolled is compiled prior to benchmarking.

I also wanted some partially unrolled loops for comparison sake. Obviously these examples are not suitable for production use. For example, consider what would happen if unrolled_loop_4 needed to execute a number of times that was not a multiple of 4.

sub unrolled_loop_2
{
    my $count = 0;
    for (my $i = 0; $i < 100_000; ++$i) {
        $count += $i;
        $count += ++$i;
    }
    $count;
}

sub unrolled_loop_4
{
    my $count = 0;
    for (my $i = 0; $i < 100_000; ++$i) {
        $count += $i;
        $count += ++$i;
        $count += ++$i;
        $count += ++$i;
    }
    $count;
}

Benchmarking

The full benchmarking code is as follows. I test the result of each function to ensure I’m getting the same answer.

use Benchmark qw(:hireswallclock);

use strict;
use warnings;

sub vanilla_loop
{
    my $count = 0;
    for (my $i = 0; $i < 100_000; ++$i) {
        $count += $i;
    }
    $count;
}

sub unrolled_loop_2
{
    my $count = 0;
    for (my $i = 0; $i < 100_000; ++$i) {
        $count += $i;
        $count += ++$i;
    }
    $count;
}

sub unrolled_loop_4
{
    my $count = 0;
    for (my $i = 0; $i < 100_000; ++$i) {
        $count += $i;
        $count += ++$i;
        $count += ++$i;
        $count += ++$i;
    }
    $count;
}


sub eval_loop
{
    my $count = 0;
    my $s = '';
    for (my $i = 0; $i < 100_000; ++$i) {
        $s .= '$count += ' . $i . ";\n";
    }
    eval $s;
    $count;
}

my $code = '';
for (my $i = 0; $i < 100_000; ++$i) {
    $code .= '$count += ' . $i . ";\n";
}

$code = '
sub totally_unrolled {
    my $count = 0;' . "\n" . $code . '
    $count;
};';

eval $code;

print vanilla_loop(), "\n";
print unrolled_loop_2(), "\n";
print unrolled_loop_4(), "\n";
print eval_loop(), "\n";
print totally_unrolled(), "\n";

Benchmark::cmpthese(-1, {
    'vanilla' => \&vanilla_loop,
    'unrolled-2' => \&unrolled_loop_2,
    'unrolled-4' => \&unrolled_loop_4,
    'eval' => \&eval_loop,
    'unrolled-max' => \&totally_unrolled,
});

And the results are:

$ perl unrolling.pl
4999950000
4999950000
4999950000
4999950000
4999950000
               Rate        eval  unrolled-2  unrolled-4     vanilla unrolled-max
eval         3.25/s          --        -93%        -95%        -95%         -96%
unrolled-2   48.1/s       1381%          --        -25%        -26%         -45%
unrolled-4   63.9/s       1865%         33%          --         -2%         -27%
vanilla      65.1/s       1902%         35%          2%          --         -26%
unrolled-max 87.7/s       2598%         82%         37%         35%           --

Conclusions

I am actually surprised that eval is as fast as it is. It runs less than 20 times slower than the vanilla loop, yet it has to construct a string by appending to it 100,000 times and run the compiler on the result. Or to put it another way, it compiles and runs 100,000 lines of code in under a third of a second. Good job perl compiler guys! Or did you put something in there for patholgically ridiculous examples like this?

And I was right never to worry about unrolling my loops. unrolled-2 is quite significantly slower than vanilla. As I can’t think of any good reason for that, it makes me worried that I’ve done something wrong.

Even fully unrolling the loop, which is quite ridiculous gives me a relatively tiny 35% speed increase.


Read Full Post »

Spurious Benchmark.pm Warnings

My most recent post has a few comments, but because I’ve been on holiday I didn’t reply to them in a timely manner. Sorry about that folks, I did read all of them.

And time has moved on so if I reply to those comments in the comments section, I’m guessing the original commenters won’t read the response. Having said that, they might not read this post either, but I think that is less likely.

Therefore this is a post to respond to the commenters, and to keep up with the IronMan schedule even though I’m not sure I’m being measured.

First of all, Roman, thanks for the comment. The figures you provide are interesting and yours is a second complaint about opaque error messages. That would bother me as well – I get enough of them from emacs macros and I have at least got tools (well, a tool – macroexpand) to debug those.

@Max – Hi. I’m not sure how you got the impression that I was interested in object creation time. We have primarily been talking about Moose start-up overhead here. I think I’m measuring the right thing.

@Anonymous – when you say "Run each benchmark item for 5 seconds or more." I think you missed this bit of the post:

Benchmark: timing 200 iterations of 1just-moose, 2all-includes, 3nocreate, 4create...
  1just-moose: 100.196 wallclock secs ( 0.05 usr +  0.25 sys =  0.30 CPU) @ 673.40/s (n=200)
2all-includes: 279.252 wallclock secs ( 0.08 usr +  0.34 sys =  0.42 CPU) @ 475.06/s (n=200)
    3nocreate: 318.256 wallclock secs ( 0.06 usr +  0.28 sys =  0.34 CPU) @ 581.40/s (n=200)
      4create: 320.979 wallclock secs ( 0.06 usr +  0.27 sys =  0.33 CPU) @ 611.62/s (n=200)

Not one of my benchmarks runs for less than a minute and a half. Most of them are around the five minute mark. And I have deadlines!

So where is this message coming from? I had a look in Benchmark.pm.

    print "            (warning: too few iterations for a reliable count)\n"
        if     $n < $Min_Count # 4
            || ($t->real < 1 && $n < 1000)
            || $t->cpu_a < $Min_CPU; # 0.4

My iterations is 200 so $Min_Count is not the problem. $t->real is at least 100 so that leaves $Min_CPU. Hmmm… yeah, I would have had to double the number of iterations to be sure. Look at 4create – 321 wallclock seconds of which 0.3 is CPU. No worries, I’m good – the warnings were spurious as I originally thought.

Read Full Post »

The debate over whether or not Moose is too slow rumbles on. On one side, are the folks using it in production (and presumably happpy with it). On the other, are folks saying it adds multiple seconds to the start-up time.

Following on from my benchmarking Moose startup overhead post, Sue D. Nymme provided code that actually created an object. Thanks Sue!

I took a look and Sue’s code uses MooseX::Declare that I haven’t installed. One magic CPAN invocation and, oh my gosh, more than 20 minutes later, MooseX::Declare is finally installed. Now I’m thinking that the benchmark time is actually measuring the time for perl to load in and parse MooseX::Declare and tens of thousands of lines of prerequisites rather than how slow Moose itself is.

I’ll compare just loading the modules Moose vs MooseX::Declare.

Then I’ll add the rest of Sue’s code.

And finally I’ll actually create a Moose object.

just-moose.pl

use Modern::Perl;

use Moose;
use Moose::Util::TypeConstraints;
use DateTime;

1;

all-includes.pl

use Modern::Perl;

use MooseX::Declare;
use Moose::Util::TypeConstraints;
use DateTime;

1;

And moose.pl (taken from Sue D. Nymme’s comment here.

bench.pl

use Benchmark qw(:hireswallclock);

use strict;
use warnings;

my $perl = 'c:/strawberry/perl/bin/perl';

Benchmark::timethese(200, {
    '1just-moose' =>
        sub { system(qq{$perl -e "require 'just-moose.pl';"}) },
    '2all-includes' =>
        sub { system(qq{$perl -e "require 'all-includes.pl';"}) },
    '3nocreate' => sub { system(qq{$perl -e "require 'moose.pl';"}) },
    '4create' =>
        sub { system(qq/$perl -e "require 'moose.pl'; MooseObject->new({ name => 'Bob' })"/) },
});

The Results

I’ve elided the too few iterations warnings.

Benchmark: timing 200 iterations of 1just-moose, 2all-includes, 3nocreate, 4create...
  1just-moose: 100.196 wallclock secs ( 0.05 usr +  0.25 sys =  0.30 CPU) @ 673.40/s (n=200)
2all-includes: 279.252 wallclock secs ( 0.08 usr +  0.34 sys =  0.42 CPU) @ 475.06/s (n=200)
    3nocreate: 318.256 wallclock secs ( 0.06 usr +  0.28 sys =  0.34 CPU) @ 581.40/s (n=200)
      4create: 320.979 wallclock secs ( 0.06 usr +  0.27 sys =  0.33 CPU) @ 611.62/s (n=200)

Conclusions

The bulk of the time is in loading the MooseX::Declare library. But on the other hand, declaring a single object did take a significant amount of time (approximately 40 seconds over 200 runs). I can now believe that declaring a lot of objects would be prohibitively expensive.

But that is using MooseX::Declare. I’m sure it is nowhere near as bad if we used Plain Ole Moose.

Read Full Post »

Lisp Macros and Types

(message ...) already accepts a format string, but if it didn’t it would be easy to add that functionality with a macro.

(defmacro message* (str &rest vars)
  `(message (format ,str ,@vars)))

But it only accepts a single format string. Perhaps I would like to concatenate a few strings or variables together. But then if I only pass a string in, I want that to work correctly too. Macros can generate different code if you pass in different types. (I fix the underlying format* so I can use it in message, error, etc.)

(defmacro format* (str &rest vars)
  (if (stringp str)
      `(format ,str ,@vars)
    `(format (concat ,@(mapcar #'identity str)) ,@vars)))

The result looks a little strange. When I see an unquoted list, the first element has to be the name of the function that is being called.

(defvar x "xxx")
(format* (x " " "%s") 1) ;; --> "xxx 1"

Read Full Post »

…anytime soon at least.

Searching with Google

So, I’m looking for something I’ve written on my blog. This happens quite frequently – a lot of posts I write are where I’ve figured something out that is useful to me. I write it down in case I need it the information again. Blogs are not generally well indexed so I reach for everyone’s favourite search engine and type…

curiousprogrammer sysread

I’m looking for the post I did on unbuffered reading with perl (back in May it turns out).

And the top link is:

Did you mean: curious programmer sysread  
Search Results

   1.
      Read Stream of Input With Perl << A Curious Programmer
      14 May 2010 ... A Curious Programmer. Leveraging Perl and Emacs ... Okay, quite cool, but there is a better solution. sysread is unbuffered. ...
      curiousprogrammer.wordpress.com/2010/05/14/read-stream/ - Cached

Bingo!

Let’s try Bing.

Searching With Bing

We didn't find any results for curiousprogrammer sysread.
We're showing results for curious programmer sysread.

      Sysread - Blogs, Pictures, and more on WordPress

      Blogs about: Sysread ... output irregularly. First it sends "hello ", then waits 5 ... more   A Curious Programmer
      en.wordpress.com/tag/sysread . Cached page
    *
      Perl Streams - Blogs, Pictures, and more on WordPress

      ... output irregularly. First it sends "hello ", then waits 5 ...
 more   A Curious Programmer ... p ... more   Tags: Perl Programming, unbuffered input, input stream, sysread
      en.wordpress.com/tag/perl-streams . Cached page

Okay, not really. It’s only even picking up the links on wordpress.com because I tagged it so well. Maybe it needs a bit more help.

site:curiousprogrammer.wordpress.com sysread

Er, no.

We did not find any results for site:curiousprogrammer.wordpress.com sysread.
Were you looking for: sys read site:curiousprogrammer.wordpress.com
Search tips:

    * Ensure words are spelled correctly.
    * Try rephrasing keywords or using synonyms.
    * Try less specific keywords.
    * Make your queries as concise as possible.

Why Does it Matter?

My website is pretty small. I get 3000-6000 readers per month, and I doubt that many of them are uniques. I probably have a core of around 200 regular readers. I don’t get a lot of referrals from google. To be honest, I think I’m okay with that – I like the conversation from regulars and I can’t be bothered to moderate a lot of spam or put the effort into getting popular.

But, and it’s a big but, my website has a lot of useful stuff on. And there are tens of thousands of other small ‘unpopular’ sites with lots of useful stuff on. If I’m searching with Bing, what am I missing?

You might say that this only happens because sites like wordpress notify google when there has been an update so it knows to come along and update it’s cache. Well, sucks to be Bing, but in that case it doesn’t even look like the gap is going to close.

I’m going to stick with google.

Read Full Post »

Bad Hash Tables

Quick pop quiz. In emacs lisp what does this expression evaluate to?

(let ((%h (make-hash-table)))
  (puthash "key" "value" %h)
  (gethash "key" %h))

Okay, put your hands down, the lispers at the back. Anyone else?

That’s right, the answer is nil.

"Eh nil? Is that the same as false? Why’s that?" you may be asking. The answer is that by default hash tables use eql as the comparison function. (eql "x" "x") evaluates to nil. (equal "x" "x") evaluates to t.

What you actually want is this.

(let ((%h (make-hash-table :test 'equal)))
  (puthash "key" "value" %h)
  (gethash "key" %h))

Quite why you would want a hash where you can’t retrieve a string key is beyond me. Okay, I guess it possibly makes sense if you consider symbols, but damn, it’s not right.

Anyway, no worries, I can fix it. I’ll only ever use my _h macro rather than using make-hash-table, puthash or gethash directly, and I’ll extend it to create hash tables using :test 'equal.

(the original title for this page was Why do Emacs Lisp Hash Tables use Eql)

Read Full Post »

A while ago, I was trying to work around the emacs maintainers’ curious decision to always load a .elc file in preference to a .el file, even if the .el file is newer.

I didn’t want to settle for that. My solution was to recompile my emacs lisp files prior to running emacs.

I see now that there is a better alternative: replace the built-in require with my own version. And it is not just because it is open source that I can do this. I can fix it as a user too.

(require feature &optional filename noerror)

require performs the following steps:

  • do nothing if the feature has already been loaded (this protects against recursive loads)
  • finds the appropriate file in load-path
  • loads either the .el or .elc file
  • throws an error if the required feature is not provided
  • provides a mechanism for supressing errors
  • provides a mechanism to load a feature from a specific file

If I wanted a drop-in replacement, I should do all of those things, and replicate the function signature. However, I don’t need the error suppression or loading from a specific file. I also like to be able to force a load even if a library has been loaded previously. If this doesn’t match your requirements, it should be easy to adapt the code below.

(defun require* (feature &optional force)
  (when (or force (not (featurep feature)))
    (setq feature (symbol-name feature))
    (let ((path load-path)
          (found-filename nil)
          head el-attribs elc-attribs)
      (while (and (not found-filename) path)
        (setq head (pop path))
        (let ((el-filename (format "%s/%s.el" head feature))
              (elc-filename (format "%s/%s.elc" head feature)))
          ;; if .el and .elc both exist, pick the newest
          ;; otherwise pick the one that exists if any
          (cond ((and (file-exists-p el-filename)
                      (file-exists-p elc-filename))
                 (if (file-newer-than-file-p el-filename elc-filename)
                     (setq found-filename el-filename)
                   (setq found-filename elc-filename)))
                ((file-exists-p el-filename)
                 (setq found-filename el-filename))
                ((file-exists-p elc-filename)
                 (setq found-filename elc-filename)))
          ;; load file if found
          (when found-filename
            (message (format "Found: [%s]" found-filename))
            (let ((load-suffixes ()))
              (load found-filename)))))
      (unless found-filename (error "Unable to find %s" feature)))))

The brave might choose to do (defun require ...) instead of (defun require* ...)

Read Full Post »

Older Posts »

Follow

Get every new post delivered to your Inbox.