Archive for July, 2010

chromatic says that using Moose or a similar abstraction mechanism is an important indicator of Perl ability.


I’m a 6th level Perlic User. I’m quite capable of constructing my own objects1 thanks very much.

This is chromatic:

Here’s a quick checklist to help those of you writing Perl to determine if you’re capable of writing Perl well:

  • Do you do this?
  • Do you do that?
  • Do you use Moose or another abstraction mechanism from the CPAN?

Okay you got me. I’m quoting out of context. Here’s chromatic again:

You don’t have to answer all of those questions in the correct way to write good and maintainable Perl, but if you answer most of those questions in the wrong way, of course you’ll write bad code.

I don’t know what these 18 items actually indicate, how well integrated you are with the perlective perhaps or how much you code like chromatic, but for sure only 6 or 7 at best are decent indicators of ability to write Perl well. I answered 9 of them in the wrong way and I’m certainly not rushing to correct the deficiency.

1. Having said that, I have used Moose before, and for the right project, I would use it again.


Read Full Post »

Where to Cache Stuff?

A fairly common task I need to do is:

  • query a remote service which returns a large amount of data
  • extract just the bits I need from that data
  • do something with the extracted bits of data

Often the initial query will take a few seconds to run and I’ll be thinking I can’t be bothered to wait for this, why don’t I just cache the data.

If I decide it is worth it, the next question is where and how to cache.

Perl AnyEvent

And what I generally think of first is an AnyEvent-based Proxy Server. Just as quickly, I discard that option as I can’t be bothered to figure out how to ensure the proxy is up when I need it. For example, what happens when the physical host where the proxy is running reboots? What happens if the sysadmin kills my process? Etc.

Storable and Freeze / Thaw

So, next I think: the filesystem is always available (hopefully!), I’ll use that. So then I’m thinking about Cache::File, Storable and Freeze/Thaw.

This often leads to a couple of issues too – where should I store my datafiles. Should I store it somewhere in $HOME and then if multiple people want to cache the data, they are each hitting the remote service, or should I find I have a shared area? Then, should this shared area be on a network drive, or local to the box?


So, finally, I get to thinking about caching the data in the database. I know for sure that will always be available.

Can you guess which option I generally choose? Does anyone else have any thoughts and where and how to cache data?

Read Full Post »

Why Not Hashes

Someone asked me, why with all of the associated problems does emacs lisp use vectors rather than hashes to represent objects.

There are three reasons that come immediately to mind.

  • speed efficiency
  • space efficiency
  • you generally need to know the type anyway

Speed Efficiency

(person-age dave) ultimately expands to (aref dave 1) which is an array access. Array access and hash access should both be O(1) operations. However, the k for hash access will be much greater.

First of all, you need to call the hash function. Even if that is just mod [size of hash] that will probably more than halve the speed. And then you have collision handling on top of that.

Space Efficiency

A default hash table starts with 65 slots.

(make-hash-table) ;; --> #<hash-table 'eql nil 0/65 0x298a880>

That is quite a lot of overhead for an object with just a few fields. You can select a different initial size, but as you might expact, a hash does take more space than a similarly sized vector.

You Need to Know the Type Anyway

In practice, you need to know the type of an object before you can do anything useful with it. Say I have a variable called dave. How do I know if I can fire dave unless I know that dave is of type employee?

Can you think of any operation you can do on a variable without knowing the type? Stringify perhaps.

So, why was I making a big deal before?

Not having to know the exact type can give you a degree of genericity. I can have a whole bunch of things related to people that I can call age on. Maybe I’ll have an iterator field in a bunch of containers called iterator.

But really, I don’t care about that. I’m into aesthetics and I just like a terse language. I just find, dave.age, or even (*s dave.age) is much nicer than (person-age dave).

Read Full Post »

Of the languages I know, emacs-lisp is unusual in that when you need to access the field of a structure, you need to know the name of the type1.

(defstruct person age name)
(defvar dave (make-person))
(setf (person-age dave) 20) ;; getter specifies the type!
(setf (person-name dave) "David Jones")
(message (person-name dave)) ;; -- David Jones

Notice I can’t say (age dave) here, it has to be (person-age ...).

The underlying reason for this is that structures are really vectors. (person-age object) translates into (aref object 1) which should hopefully be pretty fast. Another structure doesn’t need to keep its age member at an offset of 1, so I can’t say it is (generic-structure-age object).

In basic perl objects, because they are really just a hash reference, I don’t need to specify the type. When it is needed, the VM figures it out at runtime.

package Person;

sub new
    bless {}, $_[0];

package main;

my $dave = Person->new(); # Type specified here
$dave{'age'} = 20;        # but not here
$dave{'name'} = 'David Jones';

print $dave{'name'}, "\n";
$ perl t.pl
David Jones

In many statically typed languages, the fact that the compiler knows what the type is enables it to generate efficient code. I still don’t need to specify it explicitly.


using namespace std;

struct person
    int age;
    string name;

int main()
    person dave;

    dave.age = 20;             // No type explicitly specified here
    dave.name = "David Jones"; // or here

    cout << "Name: " << dave.name << "\n";
$ g++ t.cpp
$ ./a.exe
Name: David Jones

For objects created using the new version of defstruct* I’ve defunned a get-field and set-field that don’t require the type to be specified. Error checking is, as usual, elided

(defsubst get-index (object field)
  (cdr (assoc field (symbol-value (aref object 1)))))

(defun get-field (object field)
  (aref object (get-index object field)))

(defun set-field (object field value)
  (setf (aref object (get-index object field)) value))

(get-field dave 'name)
(set-field dave 'name "Simon Smith")

Although by this time we’re probably both wondering why I didn’t just use a hash like all the (other) scripting languages.

1. To be more accurate, you need to tell the computer what the type is. Obviously, if you want to do something useful with an object, you generally need to know what type it is anyway.

Read Full Post »

I made a complete mess of my earlier recursive macros post. I hadn’t surrounded the two function calls with progn so only the last one was called. Not only that, but I hadn’t fixed up defstruct* to work with the function instead of the macro.

It is all fixed up now. Take a look.

What happened?

Obviously I tested it before I posted (although I admit it was rushed). What went wrong?

I’ve actually been caught out by the convenience of the REPL.

During testing, one of my iterations managed to create a struct--params variable and subsequent tests just worked.

Oops, I’ll need to take more care in the future.

Read Full Post »

and why you probably won’t need them

I was thinking it would be nice to have a type of structure that would know in what index each of its parameters is stored. Why might this be useful? Because then you don’t need to know the name of the type to access its data.

It is simple enough to make a wrapper around defstruct. First define a constant that stores a map of parameter name to index, and then keep a reference to that constant at a fixed position in the structure.

(defmacro defstruct* (struct-name &rest params)
  (let ((constant (intern (format "%s--params" struct-name))))
       (defconst ,constant (_params-indices-map 2 ,@params))
       (defstruct ,struct-name (param-indices ,(list 'quote constant)) ,@params))))

For the map, we’re going to use an association list, and something like a b c should expand to ((a . 2) (b . 3) (c . 4)). As the first two slots in the vector are taken up by the structure name and the map itself, indices start at 2.

(defmacro _params-indices-map (counter &rest params)
  (if params
      `(cons (cons ,(list 'quote (car forms)) ,counter)
             (_params-indices-map (1+ ,counter) ,@(cdr params)))

But we didn’t really need a recursive macro here at all. And they are painful to debug.

(macroexpand '(_params-indices-map 2 a b c)) ;; expands to
(cons (cons (quote a) 2) (_params-indices-map (1+ 2) b c))

We could have used a function much more easily.

(defun _params-indices-map (params)
  (let ((counter 2) retval)
    (dolist (var params)
      (push (cons var counter) retval)
      (setq counter (1+ counter)))

defstruct* needs to be fixed up slightly. We no longer need to pass the counter in to _params-indices-map.

(defmacro defstruct* (struct-name &rest params)
  (let ((constant (intern (format "%s--params" struct-name))))
       (defconst ,constant (_params-indices-map '(,@params)))
       (defstruct ,struct-name (param-indices ,(list 'quote constant)) ,@params))))
(defstruct* struct a b c)
(cdr (assoc 'a struct--params)) ;; --> 2
(make-struct) ;; --> [cl-struct-struct struct--params nil nil nil]

Now of course, this won’t work correctly for more complex structures that initialise their parameters. Fixing that is left as an exercise for the reader 🙂

Read Full Post »

YAML Is Not Dead

Nor is it pining for the fjords.

First sign a technology is dying: supporters start writing articles stating [technology] is not dead.

Ron Savage asserts on my blog:

Whether you like YAML or not, it’s effectively dead and gone. Accept this reality and forget it.

  • No supporting argument
  • No Link
  • No alternative evidence

It’s the first time I’ve heard anyone say this, and normally I’d dismiss a naked assertion like this out of hand. But I’m not too invested in YAML at this point (although getting more invested all the time) and it is probably worth a few minutes digging to avoid several months of time wasted. Whether it is worth the extra minutes to write this blog post is debatable however 🙂

So first of all I search at duckduckgo.com for "YAML is dead". None of the first 30 links I glance at are talking about YAML’s moribundity.

Next I look at Stack Overflow. 170 questions tagged YAML vs 3,554 tagged JSON (10,978 tagged xml). But I’m not particularly worried about popularity, otherwise I’d be using Java over Perl right? A lot of the YAML questions are fairly recent and have responses.

That’s enough for me. It has parsers, a perl module and it looks far nicer than JSON. I’m going to stick with it.

Read Full Post »

Older Posts »