Archive for February, 2009

Pretty Printing XML

During the course of a typical workday, in order to track down a problem, I often need to look at logfiles containing xml. The xml is usually poorly formatted and sometimes it is all on one line, making it very difficult to read. What would be ideal would be if I could extract the xml I am interested in into a buffer and pretty-print it with very little effort. The steps would be something like:

  1. create a new buffer called *xml* in another window (C-x 4 b)
  2. delete anything that exists in that buffer already
  3. pretty print the xml into the new buffer

Did I gloss over step 3? That sounds pretty complicated right? Well, I can call a simple perl script from emacs. I’m pretty pragmatic and I don’t feel the need to code absolutely everything in emacs-lisp.


use XML::Twig;
use XML::Parser;

my $xml = XML::Twig->new(pretty_print => 'indented');

if ($ARGV[0]) {
} else {

(defun xml-pretty-print-region (start end)
  (interactive "r")
  (let ((b (get-buffer-create "*xml*")))
    (switch-to-buffer-other-window b)
    (other-window -1)
    (goto-char end)
    (let ((e (point-marker)))
      (join-broken-lines start end)
      (call-process-region start e "xml_pretty_print.pl" nil b))))

Actually, sgml-mode (xml-mode is just an alias) has a method called sgml-pretty-print but firstly I prefer the output from XML::Twig and secondly it is nice to see how easy emacs makes it to call out to an external process and return the results. Anyone without perl installed might prefer to replace the external call with a call to (sgml-pretty-print ...).

(defun xml-pretty-print-region (start end)
  (interactive "r")
  (let ((cb (current-buffer))
        (buf (get-buffer-create "*xml*")))
    (set-buffer buf)
    (set-buffer cb)
    (copy-to-buffer buf start end)

    (switch-to-buffer-other-window buf)
    (join-broken-lines (point-min) (point-max))
    (sgml-pretty-print (point-min) (point-max))
    (other-window -1)))

What is that call to (join-broken-lines ...)? When I cut and paste from putty it breaks lines at the width of my window.

<?xml version="1.0" encoding="UTF-8"?><xsl:stylesheet version="1.0"><xsl:output
 method="html"/><xsl:template match="/"><H2>Customer Listing (in Alternating ro
w colors)</H2><table border="1"><xsl:for-each select="/customers/customer"><tr>
<xsl:choose><xsl:when test="position() mod 2 = 1"><xsl:attribute name="class">c
lsOdd</xsl:attribute></xsl:when><xsl:otherwise><xsl:attribute name="class">clsE
ven</xsl:attribute></xsl:otherwise></xsl:choose><xsl:for-each select="@*"><td><
xsl:value-of select="."/></td></xsl:for-each></tr></xsl:for-each></table><H3>To
tal Customers<xsl:value-of select="count(customers/customer)"/></H3></xsl:templ

This invalidates the xml so I need to fix this before passing the result to the xml parser.

(defconst cr (string ?\n))
(defconst *broken-line-regex* cr)

(defun join-broken-lines (start end)
  (interactive "r")
  (goto-char start)
  (while (re-search-forward *broken-line-regex* end t)
    (replace-match "" nil nil)))

This is the output from the version which calls (sgml-pretty-print ...)

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0">
  <xsl:output method="html"/>
  <xsl:template match="/">
    <H2>Customer Listing (in Alternating row colors)
    <table border="1">
      <xsl:for-each select="/customers/customer">
            <xsl:when test="position() mod 2 = 1">
              <xsl:attribute name="class">clsOdd
              <xsl:attribute name="class">clsEven
          <xsl:for-each select="@*">
              <xsl:value-of select="."/>
    <H3>Total Customers
      <xsl:value-of select="count(customers/customer)"/>

We probably want a function to handle the common case is where all the xml is on one line with some junk before the xml and some junk afterwards. E.g.

asdflkjnalkjnasdf <?xml version="1.0" encoding="UTF-8"?><a><b>c</b></a> xskldfgjnskldf
(defun xml-pretty-print-current ()
    (end-of-line nil)
    (re-search-backward ">" 1)
    (let ((e (+ 1 (point))))
      (beginning-of-line nil)
      (re-search-forward "<?xml[^>]*>" e)
      (xml-pretty-print-region (point) e))))

And, er, I guess that’s all I’ve got to say about that.

Read Full Post »

I really appreciate all of the comments that people leave on this blog. Occasionally, someone drops by and leaves an great comment with an incredibly useful tip.

shell-command and prefix argument

The first tip I’d like to highlight was at the end of a comment by Steve Vinoski. I didn’t realise how useful it would be when I first saw it but now I find it indispensable.

The tip was that using a prefix argument (i.e. pressing C-u before the command) before the synchronous shell calls inserts the output at point. Without the prefix, the output is (far less usefully) dumped in the mini-buffer.

One thing I use this for is for creating Makefiles where I need to pull in the list of headers and source files. I can now type C-u M-! ls *.H<RET>

             b.H \
             c.H \
             ... \

Filtering ido results

Another thing I didn’t know about was that it is possible to filter ido results. Thanks very much to Reynaldo for pointing this out.

Read Full Post »

Project Euler and Calc Mode

What makes a real programmer1. Well, apart from the ability to write a blackjack program that always wins, I think it is probably someone who actually programs for fun outside of work (novel concept, I know), not necessarily someone who is good at it. I’m not a real programmer, but I do play one occasionally on various blogs.

One of the places I look for fun problems to solve is Project Euler. Actually that isn’t true. I’ve only solved two problems there which gives me a smartness rating of 1%. But what is Project Euler? Fortunately, that is one of the FAQs.

Project Euler is a series of challenging mathematical/computer programming problems that will require more than just mathematical insights to solve. Although mathematics will help you arrive at elegant and efficient methods, the use of a computer and programming skills will be required to solve most problems.

I think problem 26 looks pretty interesting.

Problem 26: Find the value of d < 1000 for which 1/d contains the longest recurring cycle.

How would you solve this problem?

(/ 1.0 7) ;; 0.14285714285714285
(length (format "%s" (/ 1.0 7))) ;; 19

It looks like standard floating point can do around 19 sig figs. That probably isn’t going to be enough, but fortunately emacs has an arbitrary precision calculator built in called calc.

  (calc-precision 30)
  (calc-eval (format "1/7"))) ;; "0.142857142857142857142857142857"

Now all I need to do is find a way to measure how long the recurring cycle is. Having solved it prior to writing the article, I can say a couple of things which I discovered by experimentation at the REPL.

  • There is no 1/d for d < 1000 where the cycle is longer than d – 1.
  • The long cycles are all where d is prime number

The second of these means that I only need to process the prime numbers which will save a lot of calculation effort. The first, leads to a strategy to calculate the cycle.

If I set the precision to, say, 1050, there will always be a cycle. Therefore, all I have to do is take digits 1020 to 1040 or so and figure out where they occurred previously.

(let* ((where (- l 30))
       (pos (string-match (substring r where (+ where 20)) r))
       (diff (- where pos)))

If r contains a string representation of the floating-point ratio, then diff will contain the length of the cycle. This assumes the cycle only occurs once in the string which isn’t necessarily the case. I therefore add a check that an arbitrary 20 digit sequence in the middle of the string occurs at that point for the first time2.

(when (and (> (length r) 600)
           (> (string-match (substring r 501 520) r) 400))

Putting it all together along with a method to get a list of prime numbers gets me this entirely adequate solution. And d is therefore… actually, no I’m not going to tell you. You’ll just have to run the code for yourself.

(require 'calc)
(require 'calc-comb)

(defun primes (from to)
  (let ((p (calcFunc-nextprime (1- from))))
    (if (> p to) '()
      (cons p (primes (1+ p) to)))))

(defun find-ratios ()
  (toggle-read-only 0)
  (toggle-truncate-lines 0)
  (calc-precision 1050)
  (insert "\n\n")
  (dolist (i (primes 900 1000))
    (let* ((r (calc-eval (format "1/%s" i)))
           (l (length r)))
      (when (and (> (length r) 600)
                 (> (string-match (substring r 501 520) r) 400))
        (let* ((where (- l 30))
               (pos (string-match (substring r where (+ where 20)) r)))
          (insert (format "%3d : %s (%s)\n\n" i r (- where pos))))))))

1. Actually, for me, a real programmer is someone real (as opposed to Bugs Bunny for example) who writes programs but it’s always fun to redefine terms isn’t it?

2. Yes, I know that first of all, that is not exactly what the code is doing and second of all it isn’t precise in the least, but hey, I’m an engineer, not a mathematician.

Read Full Post »

An Extensive, But Not Comprehensive, Look At Finding Files

The Problem

Finding files can be a tricky business. If you have a number of different directories where your files can be stored, how can you open a particular file as quickly as possible. The standard find-file interface is pretty good if you’re already in the correct directory as it will do tab completion on anything you have already typed in similar to how a decent shell will.

ido improves this interface. If you enter part of a filename that has already been seen, even if it is in a different directory it will be offered to you. I enable flex-matching so that ido will match files containing the characters which have been entered anywhere so long as they are in the correct sequence. e.g. eod will match hello-world.c.

(ido-mode t)
(setq ido-enable-flex-matching t)

(setq ido-create-new-buffer 'always)

I understand that icicles does even better here, enabling you to filter out matches but as I’m not (yet) using it myself, I can’t say much more about it.

If you need more help in finding your files, what other options are there?


I hadn’t heard of Bookmarks before I saw a a post on them recently. I use them in conjunction with directory aliases1.

From the emacs manual:

Bookmarks are somewhat like registers in that they record positions you can jump to. Unlike registers, they have long names, and they persist automatically from one Emacs session to the next. The prototypical use of bookmarks is to record “where you were reading” in various files.


Sacha Chua wrote a post about navigating your source tree using ido and filecache. I’m not sure what additional benefits
filecache offers over vanilla ido as ido finds files in other directories too.


The traditional way for finding function definitions is with Emacs Tags. Once you have generated the tags file you can go from call site to function definition with M-. (find-tag).

When I go on holiday I leave a link to the current method I am working on in an orgfile. Then when I come back, I can click on the current work link and press M-. and I’ll be right back where I left off.

[[SomeClass::SomeMethod][Current Work]]

* Work Projects
*** A Specific Project
Some notes about what I was doing...

I set the tags file so I don’t need to find that file.

(setq tags-file-name "/path/to/tags/file")

1. Directory Aliases

(require 'ido)
(require 'dired)

(defconst *web-top* "~/websites")

(defun web (path)
  (concat *web-top* path))

dired-dirs is an associating list mapping aliases to directories. I need to use (list ...) to construct it so that (web ...) expands correctly.

(defconst *dired-dirs*
  (list (cons "mrc-theme" (web "/blog/mrc/wp-content/themes/mrc"))
        (cons "mrc-pages" (web "/pages/mrc"))
        (cons "webtest"   (web "/test"))))

(defconst *dired-aliases*
  (mapcar (lambda (e) (car e)) *dired-dirs*))

dired-open-alias uses ido-completing-read to choose between the aliases I defined earlier.

(defun dired-open-alias (&optional alias)
  (unless alias
    (setq alias
          (ido-completing-read "Alias: "
                               nil t)))
  (if (and (stringp alias) (> (length alias) 0 ))
      (let ((pair (assoc alias *dired-dirs*)))
        (if pair
            (dired (cdr pair))
          (error "Invalid alias %s" alias)))
    (error "Invalid alias %s" alias)))

I really like f2 as a prefix key. The default binding is related to two-column mode, which I never use, and you can use it without pressing shift, ctrl or alt. I have this defined in my-defaults.el (which probably isn’t strictly necessary but at least it names the prefix map).

(defvar f2-prefix-map nil)
(setq f2-prefix-map (make-sparse-keymap))
(global-set-key [f2] f2-prefix-map)

<f2> d calls up the list of directory aliases.

(global-set-key (kbd "<f2> d") 'dired-open-alias)

Read Full Post »

Emacs Links – 2009 Week 7

This week was pretty quiet on the emacs subreddit with fewer submissions from gst than normal.

Read Full Post »

Multiple Build Commands For Emacs

Alex Bennee left a comment on my dollar editor post:

I then open a number of separate emacs processes for each project I’m actively working on. I mainly do this as each project typically has it’s own make invocation.

I have a similar issue myself where my work projects require different commands but I prefer to avoid running multiple emacs instances if possible. It is not too difficult to fix. The idea is to detect the name of the file and run a different make command depending on which directory the file is in. This can be extended to as many different directories as necessary.

(setq compilation-scroll-output t)

(defun make-lib ()
  (compile "make-lib-cmd"))

(defun make-app ()
  (compile "make-app-cmd"))

(defun run-compile ()
  (let ((file (buffer-file-name (current-buffer))))
    (cond ((not (stringp file))
           (error "[%s] is not a string.  Invalid buffer %s ?"
                  f (buffer-name)))
          ((string-match "/src/lib/" file) (make-lib))
          ((string-match "/src/app/" file) (make-app))
          (t (error "Invalid location %s" file)))))

(define-key c-mode-base-map [f7] 'run-compile)

Read Full Post »

If there is something you find yourself doing over and over again in emacs, you can create a shortcut. For example, something I find myself doing repeatedly is duplicating a line and making a modification on the second line.

    int v1 = 1;
    int v2 = 2;

The way I used to do this is:

  • I moved to the beginning of the line
  • I set the marker
  • I moved to the end of the line
  • I copied the marked region
  • I pressed enter
  • and finally I yanked the copied line

There may be a quicker way, even in vanilla emacs. However, this doesn’t take long, but it is still better to have it available with a single keypress.

(defun duplicate-current-line ()
  (beginning-of-line nil)
  (let ((b (point)))
    (end-of-line nil)
    (copy-region-as-kill b (point)))
  (beginning-of-line 2)
  (open-line 1)

(global-set-key "\C-cd" 'duplicate-current-line)

This does have a few minor problems – e.g. it doesn’t work if we are on the last line of a buffer. Fixing this is left as an exercise for the reader.

Read Full Post »

One thing I don’t understand is what is useful about electric indentation. This is when entering a special character such as a single or double quote or a semi-colon will automatically cause the current line to be re-indented.

I configure my programming modes to do an automatic indentation when I press the return key. For example, in my cc-config.el file I have this line.

(define-key c-mode-base-map (kbd "RET") 'newline-and-indent)

A Small Criticism of Yegge’s Javascript Mode

Steve Yegge’s javascript mode has a particularly unfortunate interaction with electric indentation. He wrote:

[Automatic indentation] turns out to be, oh, about fifty times harder than incremental parsing. Surprise!

I put a few tweaks into Karl’s original indenter code to handle JavaScript 1.7 situations such as array comprehensions, and then wrote a "bounce indenter" that cycles among N precomputed indentation points.

This moved the accuracy, at least for my own JavaScript editing, from 90% accurate with Karl’s mode up to 99.x% accurate, assuming you’re willing to hit TAB multiple times for certain syntactic contexts.

So, how does this interact with electric indentation? Pretty badly. Consider if you have the following line and the default indentation isn’t what you want.

if (...) {
    var x = "some string";

When you begin to enter your line, you tab it into your prefered location, then when you enter the quote for the beginning of the string, it resets the position back to the wrong place. No problem, you fix it again but then it breaks once again when you enter the closing quote. Suppressing your rising fury, you make the mistake of retabbing it for a third time then the semicolon at the end of the line causes another reindent and almost guarantees that you throw your PC through the window.

But, is it fixable? Well, Yegge did include a flag called js2-auto-indent-flag but unfortunately it has to be set before the code is loaded otherwise the definition of the electric keys has already been trashed.

(defvar js2-mode-map
  (let ((map (make-sparse-keymap))
    (when js2-auto-indent-flag
      (mapc (lambda (key)
              (define-key map key #'js2-insert-and-indent))

The default definition of js2-insert-and-indent doesn’t check this flag so I modified my copy to enable me to change the behaviour at runtime.

(defun js2-insert-and-indent (key)
  "Run command bound to key and indent current line. Runs the command
bound to KEY in the global keymap and indents the current line."
  (interactive (list (this-command-keys)))
  (let ((cmd (lookup-key (current-global-map) key)))
    (if (commandp cmd)
        (call-interactively cmd)))
  ;; don't do the electric keys inside comments or strings,
  ;; and don't do bounce-indent with them.
  (let ((parse-state (parse-partial-sexp (point-min) (point)))
        (js2-bounce-indent-flag (js2-code-at-bol-p)))
    (unless (or (not js2-auto-indent-flag)
                (nth 3 parse-state)
                (nth 4 parse-state))

This is the diff:

<    (unless (or (not js2-auto-indent-flag)
>    (unless (or (nth 3 parse-state)
>                (nth 4 parse-state))

So, back to the initial question: Does anyone use/like electric indentation and if so, how do you get around the problem described above (and why is auto-indent on carriage return not sufficient)?

Read Full Post »

Emacs has a few mechanisms for choosing which major mode is selected when you find a file.

auto-mode-alist determines the mode from the file extension, e.g. you might want to load a c++ mode for a file named a.cc.

interpreter-mode-alist chooses a mode depending on the first line of a file. If a file begins with #!/bin/sh you probably want to choose shell-script-mode.

I worked on a code base where the extensions used a variety of capitalization. The perl scripts could be .Perl, .perl, .PL, .Pl or .pl and there were numerous other extensions. It is fairly easy to make a regular expression to match all of these but I thought it warranted a helper function.

I want '(pl perl) to transform to \.\([Pp][Ll]\|[Pp][Ee][Rr][Ll]\)\'. The basic technique is to make a character class with the upper case and lower case version of each character in the string. We also accept a symbol on the input in order that the caller doesn’t have to add double quotes to every element.

(mapconcat (lambda (c)
             (let ((c (upcase (char-to-string c))))
               (concat "[" c (downcase c) "]")))
           (symbol-name s) ""))

This code converts 'pl to [Pp][Ll].

We want to handle a list of extensions so we can cope with permutations of .pl or .perl. We therefore surround the previous mapconcat with the following.

(mapconcat (lambda (s) ...) l "\\|")

The final part of the function adds the appropriate prefix and suffix to make the regex work in auto-mode-alist.

(defun file-extensions (l)
  (concat "\\.\\("
           (lambda (s)
             (mapconcat (lambda (c)
                          (let ((c (upcase (char-to-string c))))
                            (concat "[" c (downcase c) "]")))
                        (symbol-name s) ""))
           l "\\|")

I added a wrapper function to marry the regex to the major mode.

(defun ext-mode-map (extensions mode)
  (cons (file-extensions extensions) mode))

And now you can simply specify your mode mappings like this:

(add-to-list 'auto-mode-alist (ext-mode-map '(pl perl) 'cperl-mode))

Read Full Post »


Get every new post delivered to your Inbox.