During the course of a typical workday, in order to track down a problem, I often need to look at logfiles containing xml. The xml is usually poorly formatted and sometimes it is all on one line, making it very difficult to read. What would be ideal would be if I could extract the xml I am interested in into a buffer and pretty-print it with very little effort. The steps would be something like:
- create a new buffer called *xml* in another window
(C-x 4 b)
- delete anything that exists in that buffer already
- pretty print the xml into the new buffer
Did I gloss over step 3? That sounds pretty complicated right? Well, I can call a simple perl script from emacs. I’m pretty pragmatic and I don’t feel the need to code absolutely everything in emacs-lisp.
#!/usr/bin/perl use XML::Twig; use XML::Parser; my $xml = XML::Twig->new(pretty_print => 'indented'); if ($ARGV[0]) { $xml->parse($ARGV[0]); } else { $xml->parse(\*STDIN); } $xml->print();
(defun xml-pretty-print-region (start end) (interactive "r") (let ((b (get-buffer-create "*xml*"))) (switch-to-buffer-other-window b) (xml-mode) (erase-buffer) (other-window -1) (goto-char end) (let ((e (point-marker))) (join-broken-lines start end) (call-process-region start e "xml_pretty_print.pl" nil b))))
Actually, sgml-mode (xml-mode is just an alias) has a method called sgml-pretty-print but firstly I prefer the output from XML::Twig and secondly it is nice to see how easy emacs makes it to call out to an external process and return the results. Anyone without perl installed might prefer to replace the external call with a call to (sgml-pretty-print ...)
.
(defun xml-pretty-print-region (start end) (interactive "r") (let ((cb (current-buffer)) (buf (get-buffer-create "*xml*"))) (set-buffer buf) (erase-buffer) (set-buffer cb) (copy-to-buffer buf start end) (switch-to-buffer-other-window buf) (xml-mode) (join-broken-lines (point-min) (point-max)) (sgml-pretty-print (point-min) (point-max)) (other-window -1)))
What is that call to (join-broken-lines ...)
? When I cut and paste from putty it breaks lines at the width of my window.
<?xml version="1.0" encoding="UTF-8"?><xsl:stylesheet version="1.0"><xsl:output method="html"/><xsl:template match="/"><H2>Customer Listing (in Alternating ro w colors)</H2><table border="1"><xsl:for-each select="/customers/customer"><tr> <xsl:choose><xsl:when test="position() mod 2 = 1"><xsl:attribute name="class">c lsOdd</xsl:attribute></xsl:when><xsl:otherwise><xsl:attribute name="class">clsE ven</xsl:attribute></xsl:otherwise></xsl:choose><xsl:for-each select="@*"><td>< xsl:value-of select="."/></td></xsl:for-each></tr></xsl:for-each></table><H3>To tal Customers<xsl:value-of select="count(customers/customer)"/></H3></xsl:templ ate></xsl:stylesheet>
This invalidates the xml so I need to fix this before passing the result to the xml parser.
(defconst cr (string ?\n)) (defconst *broken-line-regex* cr) (defun join-broken-lines (start end) (interactive "r") (goto-char start) (while (re-search-forward *broken-line-regex* end t) (replace-match "" nil nil)))
This is the output from the version which calls (sgml-pretty-print ...)
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0"> <xsl:output method="html"/> <xsl:template match="/"> <H2>Customer Listing (in Alternating row colors) </H2> <table border="1"> <xsl:for-each select="/customers/customer"> <tr> <xsl:choose> <xsl:when test="position() mod 2 = 1"> <xsl:attribute name="class">clsOdd </xsl:attribute> </xsl:when> <xsl:otherwise> <xsl:attribute name="class">clsEven </xsl:attribute> </xsl:otherwise> </xsl:choose> <xsl:for-each select="@*"> <td> <xsl:value-of select="."/> </td> </xsl:for-each> </tr> </xsl:for-each> </table> <H3>Total Customers <xsl:value-of select="count(customers/customer)"/> </H3> </xsl:template> </xsl:stylesheet>
We probably want a function to handle the common case is where all the xml is on one line with some junk before the xml and some junk afterwards. E.g.
asdflkjnalkjnasdf <?xml version="1.0" encoding="UTF-8"?><a><b>c</b></a> xskldfgjnskldf
(defun xml-pretty-print-current () (interactive) (save-excursion (end-of-line nil) (re-search-backward ">" 1) (let ((e (+ 1 (point)))) (beginning-of-line nil) (re-search-forward "<?xml[^>]*>" e) (xml-pretty-print-region (point) e))))
And, er, I guess that’s all I’ve got to say about that.
Have you seen http://blog.bookworm.at/2007/03/pretty-print-xml-with-emacs.html
Hi Keegan,
Yes, I did see that article when I searching for similar posts. I think (bf-pretty-print-xml-region …) has an equivalent in sgml mode now – (sgml-pretty-print …)
I use the following in my .emacs with nxml. Note that I need to select the entire buffer first before invoking it.
(defun indent-xml-region (begin end)
“Pretty format XML markup in region. You need to have nxml-mode
http://www.emacswiki.org/cgi-bin/wiki/NxmlMode installed to do
this. The function inserts linebreaks to separate tags that have
nothing but whitespace between them. It then indents the markup
by using nxml’s indentation rules.”
(interactive “r”)
(save-excursion
(nxml-mode)
(goto-char begin)
(while (search-forward-regexp “\>[ \\t]*\<” nil t)
(backward-char) (insert “\n”)
)
(mark-whole-buffer)
(indent-region begin end)
;(indent-region point-min point-max)
)
(message “Ah, much better!”))
Hi Matt,
It looks like that snippet is quite widely used – it is the same one as at http://blog.bookworm.at/2007/03/pretty-print-xml-with-emacs.html.
I tested it and I do prefer the indentation slightly over (sgml-pretty-print …) but I still need join-broken-lines for when my cut and paste breaks a tag in the middle. E.g.
<ab
cd>
….
</abcd>
I like the Twig output, but needed a subroutine version and couldn’t assume availability of XML::Twig for my end-users. Here’s a light-weight xml pretty printer implemented as a subroutine. It excepts a single string, possibly with embedded newline, and returns a single string, certainly with embedded newline. Embedded “less than” in input must be properly escaped.
sub pretty{
$_ = shift;
# Put a newline before every ‘open’ tag.
s”(<[^/])"\n$1"g;
# Eliminate duplicate newline.
s"[\n]{2,}(\n])\n</"$1</"g;
# Now we have lines the way we like them.
# It remains only to calculate indentations.
my @lines = split('\n',$_);
my $depth = -1;
foreach my $line (@lines){
# If line starts with an open tag, increase depth.
$depth+=1 if $line=~'^)|(</)';
}
$_ = join "\n", @lines;
$_ = $_ . "\n";
$_
}
pretty("text“)
… gives …
text
(Apologies for the mangled example above. Subroutine looks mostly ok except for a fancy double quote that should be plain.)
[…] pasting xml and you find your terminal is chopping the lines in arbitrary places you can use this pretty printer which fixes broken lines […]