Techalicious Academy / 2026-01-15-regex-therapy

(Visit our meetup for more great tutorials)

REGEX FUNDAMENTALS - COMBINING PATTERNS

Now we get powerful. Quantifiers let you match repeated patterns.

Quantifiers - How Many?

These modify the thing that comes before them:

*     Zero or more (greedy)
+     One or more (greedy)
?     Zero or one (optional)
{n}   Exactly n times
{n,}  At least n times
{n,m} Between n and m times

Examples:

Pattern:  ab*c
Matches:  ac, abc, abbc, abbbc (zero or more b's)

Pattern:  ab+c
Matches:  abc, abbc, abbbc (one or more b's, NOT ac)

Pattern:  ab?c
Matches:  ac, abc (zero or one b only)

Pattern:  a{3}
Matches:  aaa (exactly three a's)

Pattern:  a{2,4}
Matches:  aa, aaa, aaaa (two to four a's)

Real World Quantifier Examples

Phone numbers (simple):

Pattern:  \d{3}-\d{3}-\d{4}
Matches:  123-456-7890

Variable length words:

Pattern:  \w+
Matches:  Any word (one or more word characters)

Optional country code:

Pattern:  \+?\d{10,11}
Matches:  1234567890 or +12345678901

The question mark makes the plus sign optional.

Greedy vs Non-Greedy

By default, quantifiers are "greedy" - they match as much as possible.

Given text: <title>Hello</title>

Pattern:  <.*>
Matches:  <title>Hello</title> (the whole thing!)

That's often not what you want. Add ? to make it non-greedy:

Pattern:  <.*?>
Matches:  <title> (stops at first >)

Non-greedy quantifiers:

*?    Zero or more (lazy)
+?    One or more (lazy)
??    Zero or one (lazy)

The difference matters when you're extracting data.

Alternation - OR Logic

The pipe character means "or":

Pattern:  cat|dog
Matches:  "cat" or "dog"

Pattern:  gray|grey
Matches:  Both spellings

You can have multiple options:

Pattern:  red|green|blue
Matches:  Any of the three colors

Grouping with Parentheses

Parentheses group patterns together. This is useful for:

1. Applying quantifiers to groups:

Pattern:  (ab)+
Matches:  ab, abab, ababab (one or more "ab")

2. Creating alternation groups:

Pattern:  gr(a|e)y
Matches:  gray or grey

3. Capturing for later use (more on this in the rename section):

Pattern:  (hello) (world)
Captures: "hello" in group 1, "world" in group 2

Non-Capturing Groups

Sometimes you need grouping but don't need to capture:

Pattern:  (?:ab)+
Groups "ab" but doesn't capture it.

Use (?: ) when you only need grouping for logic, not extraction.

Word Boundaries

The \b anchor matches the boundary between a word and non-word character.

Pattern:  \bcat\b
Matches:  "cat" but NOT "category" or "concatenate"

Without word boundaries:

Pattern:  cat
Matches:  cat, category, concatenate (anywhere "cat" appears)

Word boundaries are essential for precise matching.

Putting It Together

Complex pattern example - email-like strings:

Pattern:  \w+@\w+\.\w+
Matches:  Simple email patterns

Let's break it down:

\w+     One or more word characters (username)
@       Literal @ sign
\w+     One or more word characters (domain)
\.      Literal dot
\w+     One or more word characters (TLD)

This isn't a production email validator, but it illustrates how patterns combine.

Another Example - Date Patterns

Matching YYYY-MM-DD format:

Pattern:  \d{4}-\d{2}-\d{2}
Matches:  2024-01-15, 2023-12-31

More flexible (allows single digits):

Pattern:  \d{4}-\d{1,2}-\d{1,2}
Matches:  2024-1-5, 2024-01-15

Quick Reference - What We've Learned

Building Blocks:
  .           Any character
  [abc]       Character class
  [^abc]      Negated class
  \d \w \s    Digit, word char, whitespace
  ^  $        Start/end anchors
  \b          Word boundary

Quantifiers:
  *           Zero or more
  +           One or more
  ?           Zero or one
  {n}         Exactly n
  {n,m}       n to m times
  *? +? ??    Non-greedy versions

Grouping:
  (...)       Capture group
  (?:...)     Non-capture group
  |           Alternation (OR)