Regex is search on steroids. This post demonstrates how to create powerful searches by example.
Examples start basic and build up to more complex expressions. They are designed for devs who search via their IDE.
We’ll use this text throughout (demo):
Alabama (AL) Birmingham (Dec 14, 1819) 4,903,185
Hawaii* (HI) Honolulu (Aug 21, 1959) 1,415,872
Michigan (MI) Detroit (Jan 26, 1837) 9,986,857
North Dakota (ND) Fargo (Nov 2, 1889) 762,062
Wyoming* (WY) Cheyenne (Jul 10, 1890) 578,759
(state, id, largest city, founding date, population)
Follow along here or via your IDE:
ctrl + f
or cmd + f
).*
icon)Letters [a-zA-Z]
[a-z]
lowercase letters[A-Z]
uppercase lettersAa
) option is enabled in VSCodea-z
charactersWords [a-zA-Z]+
[a-zA-Z]
letters+
repeats the match for consecutive charactersa-z
wordsSpecific words (Jan|Jul|Dec)
(Jan|Jul|Dec)
matches Jan
, Jul
, or Dec
specifically2 numbers [0-9]{2}
[0-9]
numbers{2}
match twice4 numbers [0-9]{4}
[0-9]
numbers{4}
match 4 times2-3 letters [a-z]{2,3}
[a-z]
letters{2,3}
match between 2 to 3 times (inclusive)6+ letters [a-z]{6,}
[a-z]
letters{6,}
match 6 or more times (inclusive)3 letters/numbers \w{3}
\w
letters and numbers (see special chars){3}
match 3 times3 whole letters/numbers \b\w{3}\b
\w{3}
match 3 letters and numbers\b
word boundaries (see special chars)3 whole letter words \b[a-z]{3}\b
[a-z]{3}
match 3 letters\b
word boundariesTwo words [a-zA-Z]+\s[a-zA-Z]+
word space word
[a-zA-Z]+
word\s
space (see special chars)One or two words [a-zA-Z]+(\s[a-zA-Z]+)?
word (space word)?
[a-zA-Z]+
word\s
space( ... )?
optionalNorth Dakota
is considered one match nowEverything in brackets (greedy) \(.*\)
\(
and \)
match brackets (see special chars).*
greedy wildcard)
bracketEverything in brackets (non-greedy) \(.*?\)
\(
and \)
match brackets.*?
non-greedy wildcard)
bracketLines with the *
character ^.*\*.*$
^
and $
match the start/end of the line (optional).*
wildcard\*
the star *
character (see special chars)Lines without the *
character ^[^\*]+$
^
and $
match the start/end of the line[^ ... ]
matches anything not in the brackets
\*
the star *
character[^\*]
matches anything not a *
character+
repeats the match for consecutive charactersAll lines with the e
character ^.*[e].*$
^
and $
match the start/end of the line.*
wildcard[e]
the letter e
All lines without the e
character ^[^e]+$
^
and $
match the start/end of the line[^ ... ]
matches anything not in the brackets
[^e]
matches anything not an e
character+
repeats the match for consecutive charactersBrackets starting with certain words \((Jan|Jul|Dec).*\)
\(
and \)
match brackets(Jan|Jul|Dec)
matches Jan
, Jul
, or Dec
words.*
wildcardThe short date in brackets [a-z]{3}\s+[0-9]+
[a-z]{3}
3 letters exactly\s+
one or more spaces[0-9]+
one or more numbersThe date in brackets [a-z]{3}\s+[0-9]+,\s[0-9]+
word number, number
[a-z]{3}
3 letters exactly\s+
one or more spaces,
comma[0-9]+
one or more numbersWords with m
(in the middle) [a-z]+[m][a-z]+
[a-z]+
one or more letters[m]
the letter m
Michigan
because m
is at the start of the wordWords with m
(anywhere) ([a-z]+)?[m]([a-z]+)?
(word)? m (word)?
( ... )?
optional
[a-z]+
a word([a-z]+)?
an optional word[m]
the letter m
m
can be anywhere in the word so Michigan
is matched nowMatch expressions but exclude them from the result. Officially known as ‘look arounds’.
Word in brackets (inclusive) \([a-z]+\)
\(
and \)
match brackets[a-z]+
a wordWord in brackets (exclusive) (?<=\()[a-z]+(?=\))
[a-z]+
a word(?<= ... )
starts a match but excludes it from the result
\(
the bracket (
character(?<=\()
matches from bracket (
without including it(?= ... )
ends a match but excludes it from the result
\)
the bracket )
character(?=\))
matches up to bracket )
without including itEverything in brackets (exclusive) (?<=\().*?(?=\))
(?<=\()
matches from bracket (
without including it.*?
non-greedy wildcard(?=\))
matches up to bracket )
without including itEverything in brackets on lines with *
(exclusive)
(?<=\*.*\().*?(?=\))
(?<= ... )
starts a match but excludes it
\*
the star *
character.*
wildcard\(
the bracket (
character(?<=\*.*\()
wildcard from *
to (
without including them.*?
non-greedy wildcard(?=\))
matches up to )
without including itEverything up to *
(exclusive) ^.*(?=\*)
^
start of a line.*
wildcard(?=\*)
matches up to *
without including it. ^ $ * + ? ( ) [ { \ |
reserved characters
\
(abc)
matches abc
(in a regex group)\(abc\)
matches (abc)
(with brackets)[a-zA-Z]
letters (case-sensitive)
[0-9]
or \d
match numbers
[a-c1-3#]
matches characters a b c 1 2 3 #
.*
greedy wildcard. .*?
non-greedy wildcard.
^
start of line. $
end of line.
\s
space. \t
tab. \n
new line.
\w
letters and numbers. \W
not letters and numbers.
\b
word break. \B
not word break.
+
repeat matches
{3}
repeat match exactly thrice
{1,3}
repeat match 1, 2, or 3 times
{3,}
repeat match 3+ times
[^ ... ]
match all but given characters
(?<= ... )
start match with given characters and exclude them (look behind)
(?= ... )
end match with given characters and exclude them (look ahead)