Archive for the ‘Emacs’ Category

Bad Hash Tables

Quick pop quiz. In emacs lisp what does this expression evaluate to?

(let ((%h (make-hash-table)))
  (puthash "key" "value" %h)
  (gethash "key" %h))

Okay, put your hands down, the lispers at the back. Anyone else?

That’s right, the answer is nil.

"Eh nil? Is that the same as false? Why’s that?" you may be asking. The answer is that by default hash tables use eql as the comparison function. (eql "x" "x") evaluates to nil. (equal "x" "x") evaluates to t.

What you actually want is this.

(let ((%h (make-hash-table :test 'equal)))
  (puthash "key" "value" %h)
  (gethash "key" %h))

Quite why you would want a hash where you can’t retrieve a string key is beyond me. Okay, I guess it possibly makes sense if you consider symbols, but damn, it’s not right.

Anyway, no worries, I can fix it. I’ll only ever use my _h macro rather than using make-hash-table, puthash or gethash directly, and I’ll extend it to create hash tables using :test 'equal.

(the original title for this page was Why do Emacs Lisp Hash Tables use Eql)


Read Full Post »

A while ago, I was trying to work around the emacs maintainers’ curious decision to always load a .elc file in preference to a .el file, even if the .el file is newer.

I didn’t want to settle for that. My solution was to recompile my emacs lisp files prior to running emacs.

I see now that there is a better alternative: replace the built-in require with my own version. And it is not just because it is open source that I can do this. I can fix it as a user too.

(require feature &optional filename noerror)

require performs the following steps:

  • do nothing if the feature has already been loaded (this protects against recursive loads)
  • finds the appropriate file in load-path
  • loads either the .el or .elc file
  • throws an error if the required feature is not provided
  • provides a mechanism for supressing errors
  • provides a mechanism to load a feature from a specific file

If I wanted a drop-in replacement, I should do all of those things, and replicate the function signature. However, I don’t need the error suppression or loading from a specific file. I also like to be able to force a load even if a library has been loaded previously. If this doesn’t match your requirements, it should be easy to adapt the code below.

(defun require* (feature &optional force)
  (when (or force (not (featurep feature)))
    (setq feature (symbol-name feature))
    (let ((path load-path)
          (found-filename nil)
          head el-attribs elc-attribs)
      (while (and (not found-filename) path)
        (setq head (pop path))
        (let ((el-filename (format "%s/%s.el" head feature))
              (elc-filename (format "%s/%s.elc" head feature)))
          ;; if .el and .elc both exist, pick the newest
          ;; otherwise pick the one that exists if any
          (cond ((and (file-exists-p el-filename)
                      (file-exists-p elc-filename))
                 (if (file-newer-than-file-p el-filename elc-filename)
                     (setq found-filename el-filename)
                   (setq found-filename elc-filename)))
                ((file-exists-p el-filename)
                 (setq found-filename el-filename))
                ((file-exists-p elc-filename)
                 (setq found-filename elc-filename)))
          ;; load file if found
          (when found-filename
            (message (format "Found: [%s]" found-filename))
            (let ((load-suffixes ()))
              (load found-filename)))))
      (unless found-filename (error "Unable to find %s" feature)))))

The brave might choose to do (defun require ...) instead of (defun require* ...)

Read Full Post »

Why Not Hashes

Someone asked me, why with all of the associated problems does emacs lisp use vectors rather than hashes to represent objects.

There are three reasons that come immediately to mind.

  • speed efficiency
  • space efficiency
  • you generally need to know the type anyway

Speed Efficiency

(person-age dave) ultimately expands to (aref dave 1) which is an array access. Array access and hash access should both be O(1) operations. However, the k for hash access will be much greater.

First of all, you need to call the hash function. Even if that is just mod [size of hash] that will probably more than halve the speed. And then you have collision handling on top of that.

Space Efficiency

A default hash table starts with 65 slots.

(make-hash-table) ;; --> #<hash-table 'eql nil 0/65 0x298a880>

That is quite a lot of overhead for an object with just a few fields. You can select a different initial size, but as you might expact, a hash does take more space than a similarly sized vector.

You Need to Know the Type Anyway

In practice, you need to know the type of an object before you can do anything useful with it. Say I have a variable called dave. How do I know if I can fire dave unless I know that dave is of type employee?

Can you think of any operation you can do on a variable without knowing the type? Stringify perhaps.

So, why was I making a big deal before?

Not having to know the exact type can give you a degree of genericity. I can have a whole bunch of things related to people that I can call age on. Maybe I’ll have an iterator field in a bunch of containers called iterator.

But really, I don’t care about that. I’m into aesthetics and I just like a terse language. I just find, dave.age, or even (*s dave.age) is much nicer than (person-age dave).

Read Full Post »

Of the languages I know, emacs-lisp is unusual in that when you need to access the field of a structure, you need to know the name of the type1.

(defstruct person age name)
(defvar dave (make-person))
(setf (person-age dave) 20) ;; getter specifies the type!
(setf (person-name dave) "David Jones")
(message (person-name dave)) ;; -- David Jones

Notice I can’t say (age dave) here, it has to be (person-age ...).

The underlying reason for this is that structures are really vectors. (person-age object) translates into (aref object 1) which should hopefully be pretty fast. Another structure doesn’t need to keep its age member at an offset of 1, so I can’t say it is (generic-structure-age object).

In basic perl objects, because they are really just a hash reference, I don’t need to specify the type. When it is needed, the VM figures it out at runtime.

package Person;

sub new
    bless {}, $_[0];

package main;

my $dave = Person->new(); # Type specified here
$dave{'age'} = 20;        # but not here
$dave{'name'} = 'David Jones';

print $dave{'name'}, "\n";
$ perl t.pl
David Jones

In many statically typed languages, the fact that the compiler knows what the type is enables it to generate efficient code. I still don’t need to specify it explicitly.


using namespace std;

struct person
    int age;
    string name;

int main()
    person dave;

    dave.age = 20;             // No type explicitly specified here
    dave.name = "David Jones"; // or here

    cout << "Name: " << dave.name << "\n";
$ g++ t.cpp
$ ./a.exe
Name: David Jones

For objects created using the new version of defstruct* I’ve defunned a get-field and set-field that don’t require the type to be specified. Error checking is, as usual, elided

(defsubst get-index (object field)
  (cdr (assoc field (symbol-value (aref object 1)))))

(defun get-field (object field)
  (aref object (get-index object field)))

(defun set-field (object field value)
  (setf (aref object (get-index object field)) value))

(get-field dave 'name)
(set-field dave 'name "Simon Smith")

Although by this time we’re probably both wondering why I didn’t just use a hash like all the (other) scripting languages.

1. To be more accurate, you need to tell the computer what the type is. Obviously, if you want to do something useful with an object, you generally need to know what type it is anyway.

Read Full Post »

I made a complete mess of my earlier recursive macros post. I hadn’t surrounded the two function calls with progn so only the last one was called. Not only that, but I hadn’t fixed up defstruct* to work with the function instead of the macro.

It is all fixed up now. Take a look.

What happened?

Obviously I tested it before I posted (although I admit it was rushed). What went wrong?

I’ve actually been caught out by the convenience of the REPL.

During testing, one of my iterations managed to create a struct--params variable and subsequent tests just worked.

Oops, I’ll need to take more care in the future.

Read Full Post »

and why you probably won’t need them

I was thinking it would be nice to have a type of structure that would know in what index each of its parameters is stored. Why might this be useful? Because then you don’t need to know the name of the type to access its data.

It is simple enough to make a wrapper around defstruct. First define a constant that stores a map of parameter name to index, and then keep a reference to that constant at a fixed position in the structure.

(defmacro defstruct* (struct-name &rest params)
  (let ((constant (intern (format "%s--params" struct-name))))
       (defconst ,constant (_params-indices-map 2 ,@params))
       (defstruct ,struct-name (param-indices ,(list 'quote constant)) ,@params))))

For the map, we’re going to use an association list, and something like a b c should expand to ((a . 2) (b . 3) (c . 4)). As the first two slots in the vector are taken up by the structure name and the map itself, indices start at 2.

(defmacro _params-indices-map (counter &rest params)
  (if params
      `(cons (cons ,(list 'quote (car forms)) ,counter)
             (_params-indices-map (1+ ,counter) ,@(cdr params)))

But we didn’t really need a recursive macro here at all. And they are painful to debug.

(macroexpand '(_params-indices-map 2 a b c)) ;; expands to
(cons (cons (quote a) 2) (_params-indices-map (1+ 2) b c))

We could have used a function much more easily.

(defun _params-indices-map (params)
  (let ((counter 2) retval)
    (dolist (var params)
      (push (cons var counter) retval)
      (setq counter (1+ counter)))

defstruct* needs to be fixed up slightly. We no longer need to pass the counter in to _params-indices-map.

(defmacro defstruct* (struct-name &rest params)
  (let ((constant (intern (format "%s--params" struct-name))))
       (defconst ,constant (_params-indices-map '(,@params)))
       (defstruct ,struct-name (param-indices ,(list 'quote constant)) ,@params))))
(defstruct* struct a b c)
(cdr (assoc 'a struct--params)) ;; --> 2
(make-struct) ;; --> [cl-struct-struct struct--params nil nil nil]

Now of course, this won’t work correctly for more complex structures that initialise their parameters. Fixing that is left as an exercise for the reader 🙂

Read Full Post »

Recently I have been adding more and more functionality into emacs using comint. And as there are more and better perl libraries for hooking into other systems (such as DBI, SOAP, HTTP…), I often start with a script that reads commands from stdin and writes the result to stdout. That is the comint way.

use strict;
use warnings;

use 5.010;

init(); # -- some initialization that takes a while

# ...

while (defined(my $command = <STDIN>)) {
    chomp $command;

    given (lc($command)) {
        when (/^\s*exit\s*$/)    { exit 0; }
        when (/^\s*send\s*$/)    { say 'send' }
        when (/^\s*receive\s*$/) { say 'receive' }
        default { say "Error: unrecognized command $command"; }

If initialisation takes a long time, it is a big saving to only do it once for many commands.

As emacs has full control over the process, I can control what gets sent and filter what is received before display. However, I prefer to be a little bit flexible with what the perl process receives for when I am testing it from the command line. That is why I allow surrounding spaces.

when (/^\s*exit\s*$/)    { ... }
when (/^\s*send\s*$/)    { ... }
when (/^\s*receive\s*$/) { ... }

Those matching regexes look pretty similar. I almost feel like I’m violating DRY.

Unfortunately, the subroutine references only take a single parameter.

sub cexit { return $_[0] =~ /^\s*exit\s*$/ }
sub csend { return $_[0] =~ /^\s*send\s*$/ }
sub creceive { return $_[0] =~ /^\s*receive\s*$/ }

# ...

when (\&cexit)    { exit 0; }
when (\&csend)    { say 'send' }
when (\&creceive) { say 'receive' }

But there is a way of storing data along with a subroutine – a closure. However, the following strangely doesn’t work.

sub match_command
    my $command = shift;
    return sub {
        my $input = shift;
        say "Input is [$input]";
        return $input =~ /^\s*$command\s*$/;

while (defined(my $command = <STDIN>)) {
    chomp $command;

    given (lc($command)) {
        when (match_command('send')) { say 'send' }
        when (match_command('exit')) { exit 0; }
        when (match_command('receive')) { say 'receive' }
        default { say "Error: unrecognized command $command"; }

Whereas storing the closures in a variable before smart matching does work. Unfortunately it looks like smart matching isn’t smart enough for my needs.

my $send = match_command('send');
my $exit = match_command('exit');
my $receive = match_command('receive');

# ...

when ($send) { say 'send' }
when ($exit) { exit 0; }
when ($receive) { say 'receive' }

And the result:

c:\home\juang>perl t.pl
Input is [send]
Input is [exit]

Read Full Post »

« Newer Posts - Older Posts »