Zawinski not withstanding, regular expressions are a hugely useful addition to a programmer’s toolbox. I learnt about them from Jeffrey Friedl’s excellent Mastering Regular Expressions1.
Why am I thinking about this now?
So, I was asked to review a perl script and it was pretty good, apart from the fact that the main regex2 didn’t quite match all the expected patterns.
The line to match was:
<some_tag>quickbrownfox</some_tag>
Other valid values within the tags included quick,brown,fox
or quick brown fox
.
And the regex in the script was:
m{\<some_tag\>(\w+)\<\/some_tag\>}
As you can see, there were also a few unnecessary backslashes.
I suggested [^<]+
instead of \w+
with my reasons.
The developer changed the regex to:
m{\<some_tag\>\s*(\w+)\s*\<\/some_tag\>}
Okay slightly better, but it still didn’t match all the possibilities.
I pointed out a valid pattern it didn’t match and asked again for [^<]+
.
This was the result:
m{\<some_tag\>(\s*[^<]+\s*)\<\/some_tag\>}
Okay, fine. The \s
space matchers are redundant, but at least it covers everything.
So basically, if you find yourself writing a bunch of regex regularly, and you don’t really understand it, you could do worse than read Friedl’s book.
1. Not an affiliate link
2. I know… "parsing" xml with regex will unleash cuthulu or something. It’s not my script.