MODIFIERS AND ANCHORS

You probably know /i for case-insensitive and /g for global. There's a lot more. And anchors go deeper than ^ and $ once you factor in multiline mode.

THE MODIFIER FLAGS

Every regex flag explained. Know what each one does and you'll never be confused by someone else's pattern again.

The i Flag: Case Insensitive

The one everybody knows. Makes the pattern ignore case.

Pattern:  /hello/i
Matches:  hello, Hello, HELLO, hElLo, HeLLO

Without it, regex is case-sensitive by default. Always has been.

One subtlety: inside character classes, /i affects ranges too.

Pattern:  /[a-z]+/i
Matches:  any mix of upper and lowercase letters

That's the same as writing [a-zA-Z]+ but shorter.

The m Flag: Multiline

This one causes more confusion than any other flag. Here's what it actually does.

Without /m, the ^ anchor means "start of the entire string" and $ means "end of the entire string." The whole input is treated as one blob.

With /m, ^ means "start of any line" and $ means "end of any line." Every newline creates a new start and end.

Text (three lines):
  apple
  banana
  cherry

Pattern:  /^\w+$/
Without m: No match (^ is start of "apple\nbanana\ncherry", and
           $ is the end. \w+ can't cross newlines without /s)
With m:    Matches "apple", "banana", "cherry" (each line)

Pattern:  /^\w+$/m
Matches each line independently.

When you're processing multi-line text and want ^ and $ to work per-line, turn on /m. When you want them to mean the absolute start and end of the input, leave it off.

The s Flag: Single-Line (Dotall)

By default, the dot . matches any character EXCEPT newline. This is an old convention from when regex was always line-oriented.

The /s flag removes that exception. With /s, dot matches everything including newline characters.

Text:
  <div>
  hello
  </div>

Pattern:  /<div>.*<\/div>/
Without s: No match (the .* stops at the first newline)
With s:    Matches the entire block across all three lines

Pattern:  /<div>.*?<\/div>/s
Matches:  <div>\nhello\n</div>

The name "single-line" is misleading. It doesn't make your text one line. It makes the engine TREAT the text as if newlines are just regular characters. "Dotall" is a better name for it, which is what Python and other languages call it.

+-----------------------------------------------------------+
|  /m changes what ^ and $ mean (per-line vs whole string)  |
|  /s changes what . matches (with or without newlines)     |
|                                                           |
|  They're independent. You can use both at once: /ms       |
+-----------------------------------------------------------+

The x Flag: Extended / Verbose

This is the readability flag. It makes the regex engine ignore unescaped whitespace and treat # as a comment character.

Without /x:

/^(\d{4})-(\d{2})-(\d{2})T(\d{2}):(\d{2}):(\d{2})$/

Hard to read. What does each group capture? You have to count parentheses and mentally parse the whole thing.

With /x:

/
  ^
  (\d{4})     # year
  -
  (\d{2})     # month
  -
  (\d{2})     # day
  T
  (\d{2})     # hour
  :
  (\d{2})     # minute
  :
  (\d{2})     # second
  $
/x

Same pattern. Completely readable. The whitespace is ignored. The comments explain each piece.

If you need to match an actual space with /x, use \ (backslash space) or \s or put it in a character class [ ].

Pattern:  /hello\ world/x     matches "hello world"
Pattern:  /hello[ ]world/x    also matches "hello world"
Pattern:  /hello world/x      matches "helloworld" (space ignored!)

Use /x for any pattern longer than about 30 characters. Your future self will thank you. Your teammates will thank you. Anyone reading your code will thank you.

The g Flag: Global

Without /g, the engine finds the first match and stops. With /g, it finds all matches.

perl -e '$t = "cat dog cat"; @m = ($t =~ /cat/g); print "@m\n"'
Output: cat cat

In grep, this is the default behavior (grep shows all matching lines). In programming languages and substitutions, you need /g explicitly when you want to replace or match ALL occurrences.

perl -pe 's/cat/CAT/' <<< "cat dog cat"
Output: CAT dog cat   (only first replaced)

perl -pe 's/cat/CAT/g' <<< "cat dog cat"
Output: CAT dog CAT   (all replaced)

The U Flag: Ungreedy

This is the "opposite world" flag. It flips the default greediness of all quantifiers.

Normally, * and + are greedy (match as much as possible). With /U, they become non-greedy by default (match as little as possible). And adding ? to a quantifier makes it greedy again. Everything inverted.

Without U:  .*   is greedy,   .*?  is non-greedy
With U:     .*   is non-greedy, .*? is greedy

Pattern:  /<.+>/U
Text:     <b>bold</b>
Matches:  <b>   (non-greedy by default because of /U)

Pattern:  /<.+?>/U
Text:     <b>bold</b>
Matches:  <b>bold</b>   (? makes it greedy under /U)

This flag is rarely used because it confuses everyone. Most people prefer to be explicit with *? and +? for non-greedy where needed. But you should know it exists because you'll encounter it in other people's code and wonder why the quantifiers seem backward.

INLINE MODIFIERS

You don't have to apply modifiers to the whole pattern. You can turn them on and off mid-pattern.

Turn on case-insensitive for the rest of the pattern:

Pattern:  CASE(?i)insensitive
Matches:  CASEinsensitive, CASEInsensitive, CASEINSENSITIVE
Won't:    caseinsensitive (the CASE part is still case-sensitive)

Turn off a modifier:

Pattern:  (?i)hello(?-i)WORLD
Matches:  helloWORLD, HeLLoWORLD
Won't:    helloworld (WORLD part is case-sensitive again)

Scoped modifiers apply only within a group:

Pattern:  (?i:hello) WORLD
Matches:  hello WORLD, HELLO WORLD, HeLLo WORLD
Won't:    hello world (WORLD stays case-sensitive)

Combine multiple modifiers:

(?im)     Turn on case-insensitive AND multiline
(?i-m)    Turn on case-insensitive, turn OFF multiline
(?ims)    Turn on all three at once

This is particularly useful in programming where you can't always pass flags to the regex engine directly. You embed the flags in the pattern itself.

ADVANCED ANCHORS

You know ^ for start and $ for end. But there's more to the story, especially once multiline mode enters the picture.

\A: Absolute Start of String

\A always means the very beginning of the input string. Always. Regardless of whether /m is on or off.

Pattern:  \Ahello
Text:     hello world
Matches:  hello

Pattern:  \Ahello  (with /m flag)
Text:     "goodbye\nhello"
Matches:  nothing (hello is on line 2, not the absolute start)

Pattern:  ^hello   (with /m flag)
Text:     "goodbye\nhello"
Matches:  hello (^ now means start of any line)

See the difference? When /m is on, ^ shifts meaning. \A never does. If you always mean "the very start of the entire input," use \A.

\Z and \z: End of String

Two flavors of "end of string" that differ in a subtle way.

\Z matches at the end of the string, but it allows a trailing newline. Most text files and strings end with \n, and \Z tolerates that.

\z matches at the absolute, final, nothing-after-this-point end. If there's a trailing newline, \z is AFTER it.

Text:  "hello\n"

/hello\Z/    Matches (allows the trailing newline)
/hello\z/    Doesn't match (the \n is still after "hello")
/hello\n\z/  Matches (explicitly includes the newline)

For most purposes, \Z is what you want. It's more forgiving. Use \z when you need to be absolutely certain nothing follows.

\b and \B: Word Boundaries

You probably used \b in the basics class. It matches the boundary between a word character (\w) and a non-word character (\W), or between a word character and the start/end of the string.

Pattern:  \bcat\b
Text:     the cat sat on the caterpillar
Matches:  cat (the standalone word, not inside caterpillar)

\B is the opposite. It matches where there is NOT a word boundary.

Pattern:  \Bcat\B
Text:     the cat sat on the concatenation
Matches:  cat (inside concatenation, where both sides are word chars)

Pattern:  \Bcat
Text:     the cat sat on the caterpillar scat
Matches:  cat (in scat, where a word char precedes it)

\B is less commonly used, but it's handy when you specifically want to find a pattern INSIDE a word rather than as a standalone match.

Multiline and Anchors: The Full Picture

Here's a summary of how anchors behave with and without /m:

Without /m:
  ^     start of string (same as \A)
  $     end of string (same as \Z)

With /m:
  ^     start of any line
  $     end of any line
  \A    still start of string (unchanged)
  \Z    still end of string (unchanged)
  \z    still absolute end of string (unchanged)

The safe rule: if you always mean the whole string and your pattern might be used with /m, use \A and \z. If you want per-line behavior, use ^ and $ with /m. Be intentional about which one you pick.

Putting It Together

A real pattern using multiple modifiers and anchors:

/
  \A                     # absolute start of string
  (?=.*[A-Z])            # must contain uppercase
  (?=.*\d)               # must contain digit
  [A-Za-z\d@#$%!]{8,}   # allowed chars, 8+ length
  \z                     # absolute end of string
/x

That's a password validator using the /x flag for readability, \A and \z for absolute boundaries, and lookaheads for requirements. Every feature from tonight in one pattern.

perl -e 'print "valid\n" if "Secret42!" =~ /\A(?=.*[A-Z])(?=.*\d)[A-Za-z\d@#\$%!]{8,}\z/x'
Output: valid