Posts Tagged ‘regex’

Zawinski not withstanding, regular expressions are a hugely useful addition to a programmer’s toolbox. I learnt about them from Jeffrey Friedl’s excellent Mastering Regular Expressions1.

Why am I thinking about this now?

So, I was asked to review a perl script and it was pretty good, apart from the fact that the main regex2 didn’t quite match all the expected patterns.

The line to match was:


Other valid values within the tags included quick,brown,fox or quick brown fox.

And the regex in the script was:


As you can see, there were also a few unnecessary backslashes.

I suggested [^<]+ instead of \w+ with my reasons.

The developer changed the regex to:


Okay slightly better, but it still didn’t match all the possibilities.

I pointed out a valid pattern it didn’t match and asked again for [^<]+.

This was the result:


Okay, fine. The \s space matchers are redundant, but at least it covers everything.

So basically, if you find yourself writing a bunch of regex regularly, and you don’t really understand it, you could do worse than read Friedl’s book.

1. Not an affiliate link

2. I know… "parsing" xml with regex will unleash cuthulu or something. It’s not my script.


Read Full Post »