Feeds:
Posts
Comments

Posts Tagged ‘regex’

Zawinski not withstanding, regular expressions are a hugely useful addition to a programmer’s toolbox. I learnt about them from Jeffrey Friedl’s excellent Mastering Regular Expressions1.

Why am I thinking about this now?

So, I was asked to review a perl script and it was pretty good, apart from the fact that the main regex2 didn’t quite match all the expected patterns.

The line to match was:

<some_tag>quickbrownfox</some_tag>

Other valid values within the tags included quick,brown,fox or quick brown fox.

And the regex in the script was:

m{\<some_tag\>(\w+)\<\/some_tag\>}

As you can see, there were also a few unnecessary backslashes.

I suggested [^<]+ instead of \w+ with my reasons.

The developer changed the regex to:

m{\<some_tag\>\s*(\w+)\s*\<\/some_tag\>}

Okay slightly better, but it still didn’t match all the possibilities.

I pointed out a valid pattern it didn’t match and asked again for [^<]+.

This was the result:

m{\<some_tag\>(\s*[^<]+\s*)\<\/some_tag\>}

Okay, fine. The \s space matchers are redundant, but at least it covers everything.

So basically, if you find yourself writing a bunch of regex regularly, and you don’t really understand it, you could do worse than read Friedl’s book.


1. Not an affiliate link

2. I know… "parsing" xml with regex will unleash cuthulu or something. It’s not my script.

Read Full Post »

Follow

Get every new post delivered to your Inbox.