Posts Tagged ‘emacs tips’

Pretty Printing XML

During the course of a typical workday, in order to track down a problem, I often need to look at logfiles containing xml. The xml is usually poorly formatted and sometimes it is all on one line, making it very difficult to read. What would be ideal would be if I could extract the xml I am interested in into a buffer and pretty-print it with very little effort. The steps would be something like:

  1. create a new buffer called *xml* in another window (C-x 4 b)
  2. delete anything that exists in that buffer already
  3. pretty print the xml into the new buffer

Did I gloss over step 3? That sounds pretty complicated right? Well, I can call a simple perl script from emacs. I’m pretty pragmatic and I don’t feel the need to code absolutely everything in emacs-lisp.


use XML::Twig;
use XML::Parser;

my $xml = XML::Twig->new(pretty_print => 'indented');

if ($ARGV[0]) {
} else {

(defun xml-pretty-print-region (start end)
  (interactive "r")
  (let ((b (get-buffer-create "*xml*")))
    (switch-to-buffer-other-window b)
    (other-window -1)
    (goto-char end)
    (let ((e (point-marker)))
      (join-broken-lines start end)
      (call-process-region start e "xml_pretty_print.pl" nil b))))

Actually, sgml-mode (xml-mode is just an alias) has a method called sgml-pretty-print but firstly I prefer the output from XML::Twig and secondly it is nice to see how easy emacs makes it to call out to an external process and return the results. Anyone without perl installed might prefer to replace the external call with a call to (sgml-pretty-print ...).

(defun xml-pretty-print-region (start end)
  (interactive "r")
  (let ((cb (current-buffer))
        (buf (get-buffer-create "*xml*")))
    (set-buffer buf)
    (set-buffer cb)
    (copy-to-buffer buf start end)

    (switch-to-buffer-other-window buf)
    (join-broken-lines (point-min) (point-max))
    (sgml-pretty-print (point-min) (point-max))
    (other-window -1)))

What is that call to (join-broken-lines ...)? When I cut and paste from putty it breaks lines at the width of my window.

<?xml version="1.0" encoding="UTF-8"?><xsl:stylesheet version="1.0"><xsl:output
 method="html"/><xsl:template match="/"><H2>Customer Listing (in Alternating ro
w colors)</H2><table border="1"><xsl:for-each select="/customers/customer"><tr>
<xsl:choose><xsl:when test="position() mod 2 = 1"><xsl:attribute name="class">c
lsOdd</xsl:attribute></xsl:when><xsl:otherwise><xsl:attribute name="class">clsE
ven</xsl:attribute></xsl:otherwise></xsl:choose><xsl:for-each select="@*"><td><
xsl:value-of select="."/></td></xsl:for-each></tr></xsl:for-each></table><H3>To
tal Customers<xsl:value-of select="count(customers/customer)"/></H3></xsl:templ

This invalidates the xml so I need to fix this before passing the result to the xml parser.

(defconst cr (string ?\n))
(defconst *broken-line-regex* cr)

(defun join-broken-lines (start end)
  (interactive "r")
  (goto-char start)
  (while (re-search-forward *broken-line-regex* end t)
    (replace-match "" nil nil)))

This is the output from the version which calls (sgml-pretty-print ...)

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0">
  <xsl:output method="html"/>
  <xsl:template match="/">
    <H2>Customer Listing (in Alternating row colors)
    <table border="1">
      <xsl:for-each select="/customers/customer">
            <xsl:when test="position() mod 2 = 1">
              <xsl:attribute name="class">clsOdd
              <xsl:attribute name="class">clsEven
          <xsl:for-each select="@*">
              <xsl:value-of select="."/>
    <H3>Total Customers
      <xsl:value-of select="count(customers/customer)"/>

We probably want a function to handle the common case is where all the xml is on one line with some junk before the xml and some junk afterwards. E.g.

asdflkjnalkjnasdf <?xml version="1.0" encoding="UTF-8"?><a><b>c</b></a> xskldfgjnskldf
(defun xml-pretty-print-current ()
    (end-of-line nil)
    (re-search-backward ">" 1)
    (let ((e (+ 1 (point))))
      (beginning-of-line nil)
      (re-search-forward "<?xml[^>]*>" e)
      (xml-pretty-print-region (point) e))))

And, er, I guess that’s all I’ve got to say about that.

Read Full Post »

I really appreciate all of the comments that people leave on this blog. Occasionally, someone drops by and leaves an great comment with an incredibly useful tip.

shell-command and prefix argument

The first tip I’d like to highlight was at the end of a comment by Steve Vinoski. I didn’t realise how useful it would be when I first saw it but now I find it indispensable.

The tip was that using a prefix argument (i.e. pressing C-u before the command) before the synchronous shell calls inserts the output at point. Without the prefix, the output is (far less usefully) dumped in the mini-buffer.

One thing I use this for is for creating Makefiles where I need to pull in the list of headers and source files. I can now type C-u M-! ls *.H<RET>

             b.H \
             c.H \
             ... \

Filtering ido results

Another thing I didn’t know about was that it is possible to filter ido results. Thanks very much to Reynaldo for pointing this out.

Read Full Post »

Emacs has a few mechanisms for choosing which major mode is selected when you find a file.

auto-mode-alist determines the mode from the file extension, e.g. you might want to load a c++ mode for a file named a.cc.

interpreter-mode-alist chooses a mode depending on the first line of a file. If a file begins with #!/bin/sh you probably want to choose shell-script-mode.

I worked on a code base where the extensions used a variety of capitalization. The perl scripts could be .Perl, .perl, .PL, .Pl or .pl and there were numerous other extensions. It is fairly easy to make a regular expression to match all of these but I thought it warranted a helper function.

I want '(pl perl) to transform to \.\([Pp][Ll]\|[Pp][Ee][Rr][Ll]\)\'. The basic technique is to make a character class with the upper case and lower case version of each character in the string. We also accept a symbol on the input in order that the caller doesn’t have to add double quotes to every element.

(mapconcat (lambda (c)
             (let ((c (upcase (char-to-string c))))
               (concat "[" c (downcase c) "]")))
           (symbol-name s) ""))

This code converts 'pl to [Pp][Ll].

We want to handle a list of extensions so we can cope with permutations of .pl or .perl. We therefore surround the previous mapconcat with the following.

(mapconcat (lambda (s) ...) l "\\|")

The final part of the function adds the appropriate prefix and suffix to make the regex work in auto-mode-alist.

(defun file-extensions (l)
  (concat "\\.\\("
           (lambda (s)
             (mapconcat (lambda (c)
                          (let ((c (upcase (char-to-string c))))
                            (concat "[" c (downcase c) "]")))
                        (symbol-name s) ""))
           l "\\|")

I added a wrapper function to marry the regex to the major mode.

(defun ext-mode-map (extensions mode)
  (cons (file-extensions extensions) mode))

And now you can simply specify your mode mappings like this:

(add-to-list 'auto-mode-alist (ext-mode-map '(pl perl) 'cperl-mode))

Read Full Post »

How To Disable Indent-Tabs-Mode

Emacs has a feature where it can automatically convert leading spaces to tabs. This is controlled by the indent-tabs-mode variable. I find mixing tabs and spaces extremely annoying as graphical diff tools will often display a tab and an equivalent number of spaces differently. Therefore I like to disable this behaviour. However, the obvious solution doesn’t work.

(setq indent-tabs-mode nil)

The documentation explains the reason

indent-tabs-mode is a variable defined in `C source code’.
Its value is nil

Automatically becomes buffer-local when set in any fashion.

I’m surprised it works like this. Fortunately, there is a simple fix:

(customize-set-variable 'indent-tabs-mode nil)

If it wasn’t a customizable variable1 you might use something similar to the following.

(add-hook 'first-change-hook
          (lambda () (setq indent-tabs-mode nil)))

Update: (setq-default ...) looks like a better solution. Thanks to Ron for the tip.

(setq-default 'indent-tabs-mode nil)

1. Is this even possible for a variable that automatically becomes buffer-local when set?

Read Full Post »

« Newer Posts


Get every new post delivered to your Inbox.