Archive for June, 2010

Abstraction and Subroutines

For me, the subroutine is the unit of abstraction. Give me subroutines (or methods or functions) and I can change the world.

Fixing Hashes

If you try and retrieve a non-existent key from a perl hash, it returns an undefined value. Often, that isn’t what I want.

No worries. With a simple subroutine, I can fix it.

sub get_hash_value (\%$)
    my ($hash_ref, $key) = @_;
    if (! exists $hash_ref->{$key}) {
        die "KeyNotExists: $key\n";
    return $hash_ref->{$key};

my %hash;
print $hash{'hardtospellkey'};
print get_hash_value(%hash, 'hardtospeelkey');

With warnings turned on, this is easily caught. Plus with emacs dabbrev, it is never a problem for me anyway – my mispellings are consistent.

But with a less enlightened language than perl, if you have this problem then as long as you also have subroutines, you can fix it.

Use of uninitialized value $hash{"hardtospellkey"} in print at t.pl line 14.
KeyNotExists: hardtospeelkey

Fixing Weakly Typed Numbers

use Scalar::Util 'looks_like_number';

sub ensure_number
    my $val = shift;
    if (! looks_like_number($val)) {
        die "NotNumber: $val\n";
    return $val;

my $x = '1';
my $y = 'rhubarb';

print ensure_number($x) + ensure_number(2), "\n";
print ensure_number($x) + ensure_number($y), "\n";

In production code I’m likely to call that subroutine something more like _n.

NotNumber: rhubarb

Subroutine Call Speed

This is why I care about perl subroutine call speed – I have so many little routines stating exactly what I expect from my code. And it seems like it is quick enough.

It would be nice though to have something like emacs’ defsubst. Say a new keyword like inline_sub { ... }. So I can invent the syntax I want, secure in the knowledge that, code bloating aside, I’m not paying for it.

Read Full Post »

More Subroutine Benchmarking

Some folks requested a few more benchmarks. In this case, I’m happy to oblige. Perl does not come off well in these benchmarks.

Perl Code

package class;

sub new
    return bless {}, $_[0];

sub f1

sub f2
    my ($self, $x, $y) = @_;
    return ($x, $y);

sub f2a
    my $self = shift;
    my $x = shift;
    my $y = shift;
    return ($x, $y);

package main;

my $obj = class->new();

for ($i = 0; $i < 10_000_000; ++$i) {
    my ($x, $y) = $obj->f2(1, 2);

Python Code

class myClass:

    def f1(self):

    def f2(self, a, b):
        return a, b

x = myClass()

for i in xrange (1, 10000000):
    a,b = x.f2('hello', 'world')

Perl Results

$ perl -v

This is perl, v5.10.1 (*) built for i686-linux-thread-multi

$ time perl ./func.pl # f1()

real    0m5.052s
user    0m5.044s
sys     0m0.008s

$ time perl ./func.pl # f2(1, 2)

real    0m11.598s
user    0m11.585s
sys     0m0.000s

real    0m10.838s
user    0m10.833s
sys     0m0.000s

$ time perl ./func.pl # f2a(1, 2)

real    0m10.740s
user    0m10.713s
sys     0m0.004s

real    0m12.014s
user    0m11.993s
sys     0m0.012s

$ time perl ./func.pl # f2a('hello', 'world')

real    0m16.524s
user    0m16.505s
sys     0m0.008s

real    0m16.521s
user    0m16.489s
sys     0m0.000s

Python Results

$ time python ./func.py # f1()

real    0m3.840s
user    0m3.828s
sys     0m0.004s

$ time python ./func.py # f2(1, 2)

real    0m4.546s
user    0m4.504s
sys     0m0.040s

real    0m5.887s
user    0m5.860s
sys     0m0.016s

$ time python ./func.py # f2('hello', 'world')

real    0m4.548s
user    0m4.540s
sys     0m0.008s

real    0m5.907s
user    0m5.904s
sys     0m0.004s

Read Full Post »

For a long time now I have suspected that calling perl subroutines is slow. And I couldn’t figure out from the language shootout which benchmark tested subroutine calls (did it use to be Ackermann?) so I made up my own benchmark.

Perl in comparison to its closest rival – CPython.

I tested a few things:

First, a loop that does nothing, to see how much is loop overhead
Next a zero parameter function
Then a function called with two integers
And finally, declaring those two integers inline.

I’m assuming that an optimiser doesn’t come along and remove code that does nothing. From the results, it seems like a safe assumption.

And, I’m not running benchmarks multiple times or with many iterations because, frankly, I don’t care that much. I just want to get an idea as to how Perl stacks up.

Python Code

def f1():

def f2(a, b):
    pass # return a, b

for i in xrange (1, 10000000):
    # f1
    # f2(1, 2)
    # x,y=1,2

Python Results

 $ python -V
Python 2.6.5

$ time python ./func.py # (pass)

real    0m0.722s
user    0m0.720s
sys     0m0.004s

$ time python ./func.py # (x,y = 1,2)

real    0m2.030s
user    0m2.024s
sys     0m0.004s

$ time python ./func.py # (f1)

real    0m2.265s
user    0m2.244s
sys     0m0.012s

$ time python ./func.py # (f2 - pass)

real    0m2.885s
user    0m2.880s
sys     0m0.004s

$ time python ./func.py # (f2 - return a, b)

real    0m3.190s
user    0m3.144s
sys     0m0.012s

Perl Code

sub f1

sub f2
    my ($x, $y) = @_;

for ($i = 0; $i < 10_000_000; ++$i) {
    # f1();
    # f2(1, 2);
    # my ($x, $y) = (1, 2);

Perl Results

$ time perl ./func.pl # (1)

real    0m0.893s
user    0m0.888s
sys     0m0.004s

$ time perl ./func.pl # (f1)

real    0m2.932s
user    0m2.924s
sys     0m0.004s

$ time perl ./func.pl # (f2)

real    0m5.607s
user    0m5.580s
sys     0m0.004s

$ time perl ./func.pl # (my ($x, $y) ...)

real    0m2.687s
user    0m2.672s
sys     0m0.008s


A few things jump out at me.

1. 10 million subroutine calls take around a couple of seconds. Would reducing call speed actually affect any real program much?

2. Declaring the variables and assigning the values takes almost as long as the empty function call.

3. Python function calls are faster than perl function calls but it’s not by enough to worry about.

My next post should hopefully clarify why I was thinking about this.

Read Full Post »

Disappointed with Desire

That’s the Android Smartphone of course, nothing to do with my personal life.

So, I got a HTC Desire a couple of weeks ago, thinking it would help me out in my role as a content creator blogger.

Writing notes is difficult

Typing on the 3.7" touch screen is difficult. When composing text messages or emails, the excellent auto-correction removes a lot of the pain, but obviously it doesn’t recognise a lot of tech jargon.

The default keyboard has no arrow keys, and key chording is practically impossible. This is probably worse for me as an emacs power user.

No access to wordpress admin

I have not been able to login to the wordpress admin screens using the standard android browser. Admittedly I haven’t put much effort into fixing this problem.

Short Battery Life

This gives yet another resource to conserve on top of the data download limit and free plan minutes. I miss the days that I could go 6 days between phone charges.

Phone Calls

It might be my imagination, but I’m having a lot of problems with basic phone functionality – dropped calls, one party being unable to hear the other and that type of thing.

Everything goes invisible in sunlight

and we’re talking about UK sunlight here. When the sun is shining, any task that requires the screen becomes impossible.

I think I would have been better off with a cheap 7" netbook and a dongle for accessing the internet. At least I would have been able to run Perl, Apache, Emacs and use a real, if minature keyboard.

Read Full Post »

Recently I have been adding more and more functionality into emacs using comint. And as there are more and better perl libraries for hooking into other systems (such as DBI, SOAP, HTTP…), I often start with a script that reads commands from stdin and writes the result to stdout. That is the comint way.

use strict;
use warnings;

use 5.010;

init(); # -- some initialization that takes a while

# ...

while (defined(my $command = <STDIN>)) {
    chomp $command;

    given (lc($command)) {
        when (/^\s*exit\s*$/)    { exit 0; }
        when (/^\s*send\s*$/)    { say 'send' }
        when (/^\s*receive\s*$/) { say 'receive' }
        default { say "Error: unrecognized command $command"; }

If initialisation takes a long time, it is a big saving to only do it once for many commands.

As emacs has full control over the process, I can control what gets sent and filter what is received before display. However, I prefer to be a little bit flexible with what the perl process receives for when I am testing it from the command line. That is why I allow surrounding spaces.

when (/^\s*exit\s*$/)    { ... }
when (/^\s*send\s*$/)    { ... }
when (/^\s*receive\s*$/) { ... }

Those matching regexes look pretty similar. I almost feel like I’m violating DRY.

Unfortunately, the subroutine references only take a single parameter.

sub cexit { return $_[0] =~ /^\s*exit\s*$/ }
sub csend { return $_[0] =~ /^\s*send\s*$/ }
sub creceive { return $_[0] =~ /^\s*receive\s*$/ }

# ...

when (\&cexit)    { exit 0; }
when (\&csend)    { say 'send' }
when (\&creceive) { say 'receive' }

But there is a way of storing data along with a subroutine – a closure. However, the following strangely doesn’t work.

sub match_command
    my $command = shift;
    return sub {
        my $input = shift;
        say "Input is [$input]";
        return $input =~ /^\s*$command\s*$/;

while (defined(my $command = <STDIN>)) {
    chomp $command;

    given (lc($command)) {
        when (match_command('send')) { say 'send' }
        when (match_command('exit')) { exit 0; }
        when (match_command('receive')) { say 'receive' }
        default { say "Error: unrecognized command $command"; }

Whereas storing the closures in a variable before smart matching does work. Unfortunately it looks like smart matching isn’t smart enough for my needs.

my $send = match_command('send');
my $exit = match_command('exit');
my $receive = match_command('receive');

# ...

when ($send) { say 'send' }
when ($exit) { exit 0; }
when ($receive) { say 'receive' }

And the result:

c:\home\juang>perl t.pl
Input is [send]
Input is [exit]

Read Full Post »

Super Languages – Redux

The great thing about the internet is that you can always find someone to disagree with the most self-evident things. It is an enduring mystery.

1+1 is not equal to 2

And I guess if I can’t come up with a counter-argument, your assertion is proved correct.

The unreasonable burden of proof

So, some guy came up with a bunch of cases where apparently language choice makes no difference. If I can summarise his cases, I think they were

  • interaction with [poorly designed] legacy systems
  • algorithmic research
  • limited platforms such as embedded systems, GPUs, parallel computing

And going back to the point I was trying to make

  • for a given problem, some languages will be better than others
  • libraries are (almost always) more important than languages, but selecting the language will determine the libraries

Legacy Systems

I’ve glued together plenty of legacy systems in my time. Some were written in Cobol, some were Pascal auto-translated into C, some were, shudder "modern" C++. I’ve gotta tell ya, I was seriously pleased to have libraries available.

Back in the day we were doing it with CORBA. These days I might hack something RESTful up with HTTP/ Twiggy. Or I could pass data around via the file system, or a database.

Anyway, this is stuff that is unlikely to be built in to the language. It is going to come as a library right? But for sure, if I pick a language that can interact with the file system, the database or sockets I’m going to have an easier job than if I don’t.

Brand New Micro-Processors

Does this even happen these days? I thought everything was a variant of Intel or ARM.

If I understand correctly, the argument is that if there is only a single choice of language (because, for example, it is a new micro-processor without a C compiler yet), then language doesn’t make a difference.

You know, I can’t agree with that. I can think of two possibilities.

Option 1.

Assembler (or assembly, whatever), is the best language for that microprocessor. Because otherwise I have to first implement the other language in assembly.

Option 2

Another language, let’s say C, is better than assemblyer. So first of all I implement C in assembler.

But wait a minute. Then I’ve admitted that one language is better than another.

Embedded Systems

And as for embedded. Why does everyone with a moderately complex embedded system start with Linux? Is it a) because Linux provides loads of libraries that are great for implementing embedded systems, or b) there is no b.

Algorithmic Research

I don’t have much experience in this area, but given the obvious flaws in the rest of the guy’s argument, I have no doubt that a few well chosen libraries can be very helpful. After all, in traditional sciences, progress is made by standing on the shoulders of giants. Why should it not be the same in computer science?

Yes, much of that work is now available in just about any decent language, but I think my point is plain.

No, no it isn’t plain. Sorry. What are you talking about? If languages now implement much of that work that shows that libraries help no?

Linking To Supporting Research

At the risk of making the same mistake, I can’t believe you (Jared) linked to that function point analysis as support. I don’t think much of the technique to begin with and that was a great example of why no one should. They included HTML (Why not include photoshop while you’re at it?) and SQL Forms beat everything else by nearly a factor of 2. Ugh.

I often see articles that begin with a line such as Numerous studies have shown that… and then strangely, the article doesn’t link to a single study. Kinda like the original article that I was refuting.

I linked to a study (I also saw a similar table in one of McConnell’s books) and you think that weakens my argument? So let’s see what their methodology was…

Where does the data come from? The gearing factors in this table were drawn from 2786 completed function point projects in the QSM database. As mixed-language projects are not a reliable source of gearing factors, only single-language projects are used. As an additional resource, the David Consulting Group has graciously allowed QSM to include their data.

Okay, that seems at least somewhat rigorous. But a handwavy "but but, HTML, SQL Forms" has convinced me they don’t know what they are talking about. Thanks.

I don’t know why they included HTML, or why SQL Forms is so good (I haven’t used it), but in specialised areas, I can easily believe a language (+ its libraries) can beat others by a factor of 2.

Read Full Post »

Terse Hashes Miscellany

This is part 4 in my terse hashes in emacs lisp series

Okay, time to wrap up with the terse hashes.

The built-in hash tables have equivalents to the perl exists and delete, but again, the argument order does not feel right to me.

Having looked at ruby a bit over the past few months, I’ve decided I like predicates ending with a question mark1 rather than a p.

(defsubst exists? (hash key)
  (not (eq (gethash key hash 'missing-key) 'missing-key)))

(defsubst del-hash! (hash key) (remhash key hash))

And the other thing I rely on a lot in perl is keys so I can choose the traversal order with sort.

(defun keys (hash)
  (let (keys)
    (maphash (lambda (k v) (push k keys)) hash)

I need a decent less than method

(defun generic/less (val1 val2)
  (if (and (numberp val1) (numberp val2))
      (< val1 val2)
    (let ((v1 (prin1-to-string val1))
          (v2 (prin1-to-string val2)))
      (string< v1 v2))))

and then this sort of thing just works

(defvar %hash (_h { 1 2 'a 'x 5 6 10 "lemon" }))
(sort (keys %hash) #'generic/less) ;; --> (1 5 10 a)

1. Was this first in scheme?

Read Full Post »

Older Posts »


Get every new post delivered to your Inbox.