Feeds:
Posts
Comments

Posts Tagged ‘regular expressions’

Zawinski not withstanding, regular expressions are a hugely useful addition to a programmer’s toolbox. I learnt about them from Jeffrey Friedl’s excellent Mastering Regular Expressions1.

Why am I thinking about this now?

So, I was asked to review a perl script and it was pretty good, apart from the fact that the main regex2 didn’t quite match all the expected patterns.

The line to match was:

<some_tag>quickbrownfox</some_tag>

Other valid values within the tags included quick,brown,fox or quick brown fox.

And the regex in the script was:

m{\<some_tag\>(\w+)\<\/some_tag\>}

As you can see, there were also a few unnecessary backslashes.

I suggested [^<]+ instead of \w+ with my reasons.

The developer changed the regex to:

m{\<some_tag\>\s*(\w+)\s*\<\/some_tag\>}

Okay slightly better, but it still didn’t match all the possibilities.

I pointed out a valid pattern it didn’t match and asked again for [^<]+.

This was the result:

m{\<some_tag\>(\s*[^<]+\s*)\<\/some_tag\>}

Okay, fine. The \s space matchers are redundant, but at least it covers everything.

So basically, if you find yourself writing a bunch of regex regularly, and you don’t really understand it, you could do worse than read Friedl’s book.


1. Not an affiliate link

2. I know… "parsing" xml with regex will unleash cuthulu or something. It’s not my script.

Read Full Post »

Some time ago, I wrote about doing batch processing of text with an external process running, e.g. Perl. Similarly, emacs-lisp has a lot of functionality for manipulating text.

The Problem

I have a file like this:

John James,Admin,other data,...
Dave Jones,Sales,...
Lisa Sims,IT,...
...

I want to convert it into the following1:

AND name IN ("Dave Jones", "John James", "Lisa Sims")
AND dept IN ("Admin", "IT", "Sales")

The Solution

First of all I need a helper function that converts lisp lists into a quoted comma-separated list.

(defun make-csv (seq)
  (mapconcat (lambda (e) (format "\"%s\"" e)) seq ", "))

And then I can iterate over the text with re-search-forward, collecting the matched strings. At the end, I’ll output the collected strings. in a sql clause fragment.

(defun process-lines (&optional begin end)
  (interactive "r")
  (goto-char begin)
  (let (names depts)
    (while (re-search-forward "\\([^,]+\\),\\([^\n,]+\\)" end t)
      (push (match-string 1) names)
      (push (match-string 2) depts)
      (next-line))
    (insert (format (concat "\n"
                            "AND name IN (%s)\n"
                            "AND dept IN (%s)\n")
                    (make-csv (sort names #'string-lessp))
                    (make-csv (sort depts #'string-lessp))))))

If you liked this post, why not subscribe to my RSS feed.


1. Okay, you got me, I don’t really want to convert it into this. But for the purpose of the example, this will do. Exercise for the reader – how can I convert it into sql that will efficiently extract just the lines I want?

Read Full Post »

Follow

Get every new post delivered to your Inbox.