Zawinski not withstanding, regular expressions are a hugely useful addition to a programmer’s toolbox. I learnt about them from Jeffrey Friedl’s excellent Mastering Regular Expressions1.
Why am I thinking about this now?
So, I was asked to review a perl script and it was pretty good, apart from the fact that the main regex2 didn’t quite match all the expected patterns.
The line to match was:
<some_tag>quickbrownfox</some_tag>
Other valid values within the tags included quick,brown,fox
or quick brown fox
.
And the regex in the script was:
m{\<some_tag\>(\w+)\<\/some_tag\>}
As you can see, there were also a few unnecessary backslashes.
I suggested [^<]+
instead of \w+
with my reasons.
The developer changed the regex to:
m{\<some_tag\>\s*(\w+)\s*\<\/some_tag\>}
Okay slightly better, but it still didn’t match all the possibilities.
I pointed out a valid pattern it didn’t match and asked again for [^<]+
.
This was the result:
m{\<some_tag\>(\s*[^<]+\s*)\<\/some_tag\>}
Okay, fine. The \s
space matchers are redundant, but at least it covers everything.
So basically, if you find yourself writing a bunch of regex regularly, and you don’t really understand it, you could do worse than read Friedl’s book.
1. Not an affiliate link
2. I know… "parsing" xml with regex will unleash cuthulu or something. It’s not my script.
Hi, Jared,
there is no need in ‘\s*’ in your last regexp.
@zloyrusskiy: Did you read what Jared wrote? Or did you just look at the regexen? Jared already said that the whitespace matchers weren’t needed and that it’s not his code.
Hi Matthew,
Thanks – that’s what I came here to say…
It would have been a bit embarrassing if I made such a basic error when exhorting others to learn regexs eh? 🙂
Ok, i was hurried with conclusions.