Kangabru logo Kangabru logo text
Articles Portfolio

Practical Regex #1

Regex looks scary but it doesn't have to be. Learn by example from noob to guru.
August 2020

Contents


Summary

Regex is search on steroids. This post demonstrates how to create powerful searches by example.

Examples start basic and build up to more complex expressions. They are designed for devs who search via their IDE.

We’ll use this text throughout (demo):

Alabama       (AL)  Birmingham  (Dec 14, 1819)  4,903,185
Hawaii*       (HI)  Honolulu    (Aug 21, 1959)  1,415,872
Michigan      (MI)  Detroit     (Jan 26, 1837)  9,986,857
North Dakota  (ND)  Fargo       (Nov  2, 1889)    762,062
Wyoming*      (WY)  Cheyenne    (Jul 10, 1890)    578,759

(state, id, largest city, founding date, population)


How to use this guide

Setup

Follow along here or via your IDE:


Basic matches

Letters [a-zA-Z]


Words [a-zA-Z]+


Specific words (Jan|Jul|Dec)


2 numbers [0-9]{2}


4 numbers [0-9]{4}


2-3 letters [a-z]{2,3}


6+ letters [a-z]{6,}


3 letters/numbers \w{3}


3 whole letters/numbers \b\w{3}\b


3 whole letter words \b[a-z]{3}\b


Two words [a-zA-Z]+\s[a-zA-Z]+


One or two words [a-zA-Z]+(\s[a-zA-Z]+)?


Wildcards

Everything in brackets (greedy) \(.*\)


Everything in brackets (non-greedy) \(.*?\)


Lines with the * character ^.*\*.*$


Lines without the * character ^[^\*]+$


All lines with the e character ^.*[e].*$


All lines without the e character ^[^e]+$


Brackets starting with certain words \((Jan|Jul|Dec).*\)


Mixed matches

The short date in brackets [a-z]{3}\s+[0-9]+


The date in brackets [a-z]{3}\s+[0-9]+,\s[0-9]+


Words with m (in the middle) [a-z]+[m][a-z]+


Words with m (anywhere) ([a-z]+)?[m]([a-z]+)?


Exclusive matches

Match expressions but exclude them from the result. Officially known as ‘look arounds’.

Word in brackets (inclusive) \([a-z]+\)


Word in brackets (exclusive) (?<=\()[a-z]+(?=\))


Everything in brackets (exclusive) (?<=\().*?(?=\))


Everything in brackets on lines with * (exclusive)

(?<=\*.*\().*?(?=\))


Everything up to * (exclusive) ^.*(?=\*)


Cheat sheet

. ^ $ * + ? ( ) [ { \ | reserved characters

[a-zA-Z] letters (case-sensitive)

[0-9] or \d match numbers

[a-c1-3#] matches characters a b c 1 2 3 #

.* greedy wildcard. .*? non-greedy wildcard.

^ start of line. $ end of line.

\s space. \t tab. \n new line.

\w letters and numbers. \W not letters and numbers.

\b word break. \B not word break.

+ repeat matches

{3} repeat match exactly thrice

{1,3} repeat match 1, 2, or 3 times

{3,} repeat match 3+ times

[^ ... ] match all but given characters

(?<= ... ) start match with given characters and exclude them (look behind)

(?= ... ) end match with given characters and exclude them (look ahead)