Archive for November, 2006

Data Structures in (Mz)Scheme

I sometimes forget how useful the STL is in C++. It provides a good range of data structures that fit the majority of my programming needs. One that I use very frequently is map<T>. Map provides a nice ordered collection which I can insert an element into and delete an element from, in order log(N) time where N is the number of elements. Out of the box, neither Perl, nor R5RS scheme give me the same structure. I’m sure it is available from CPAN for Perl and there are countless examples of implementing similar structures on top of scheme’s wonderful heterogenous trees. However, I’m far to lazy to write my own code that is at such a low level so it was very nice to find the galore library of data structures by Jens Axel S√łgaard.

I was curious to do some comparisons in performance between C++ and scheme here so I quickly coded up some tests to insert a million elements into a leftist heap compared to inserting a million elements into a map. Admittedly that isn’t an apples to apples comparison but that isn’t much of a consideration for me!

The C++ code was nice and concise:

#include <map>
#include <string>
#include <iostream>

using namespace std;

string s()
    char buffer[6];

    for (int i = 0; i < 5; ++i) {
        buffer[i] = static_cast<char>((rand() % 26) + 97);
    buffer[5] = 0;

    return string(buffer);

int main()
    cout << s() << endl;
    map<string, int> m;
    for(int i = 0; i < 1000000; ++i) {
        m[s()] = 1;

The scheme was a bit longer but I had tried to do it a different way and I think it is quite elegant too:

(module test-heap mzscheme

  (require (prefix s: (lib "67.ss" "srfi")))
  (require (prefix s: (lib "69.ss" "srfi")))

  (require (prefix heap: "heap.ss"))

  (define (print-ln . args)
    (for-each display args)

  (define compare-fn s:string-compare)

  (define (random-string len)
     (let loop ((i len) (l '()))
       (if (= i 0) l
           (loop (- i 1)
                 (cons (integer->char
                        (+ (random 26) 97)) l))))))

  (define random-string-gen-reset! #f)

  (define random-string-gen
    (let ((max-strings 1000))
      (set! random-string-gen-reset!
            (lambda (max) (set! max-strings max)))
      (lambda (len)
        (if (= max-strings 0) #f
              (set! max-strings (- max-strings 1))
              (random-string len))))))

  (define (populate-heap n)
    (random-string-gen-reset! n)
    (let loop ((h (heap:empty compare-fn))
               (s (random-string 5)))
      (if s (loop (heap:insert s h) (random-string-gen 5)) h)))

  (define (test:performance)
     (random-seed 17)
     (print-ln "Starting...")
     (print-ln (heap:find-min (populate-heap 1000000)))
     (print-ln "Finished!")))

  (provide (all-defined)))

;; (require test-heap)
;; (test:performance)

I always put a couple of lines at the bottom of my scheme modules that are commented out so that I can execute them within emacs and find out if the whole module works. Does anyone else do this?

Initially when testing I benchmarked the C++ at a bit more than 7 seconds and the scheme at around 21 seconds which I thought was pretty good. However, now I come back to it the scheme is taking closer to a minute which I suspect means that somehow it isn’t triggering the new JIT compiler for the 350 series.


Read Full Post »