Techalicious Academy / 2026-02-19-advanced-regex

(Visit our meetup for more great tutorials)

PRACTICE EXERCISES

Time to put everything together. These exercises use every technique from tonight. They're ordered roughly from moderate to hard. Work through them at your own pace. Every exercise includes sample input, what you should match, and the solution with a full explanation.

For all of these, test with grep -P or perl -ne on the command line:

echo "test string" | grep -P 'pattern'
echo "test string" | perl -ne 'print if /pattern/'

EXERCISE 1: Extract Prices

Challenge: Match all prices with a dollar sign and optional cents.

Sample input:  "Items cost $19.99 and $5 and $1,234.56 but not 3.50"

Expected matches: $19.99, $5, $1,234.56

Solution:

\$\d{1,3}(?:,\d{3})*(?:\.\d{2})?

Breaking it down:

  \$              literal dollar sign
  \d{1,3}         one to three digits (first group)
  (?:,\d{3})*     optionally followed by comma-separated thousands
  (?:\.\d{2})?    optionally followed by a dot and exactly two cents

Test it:

echo "Items cost \$19.99 and \$5 and \$1,234.56 but not 3.50" | grep -oP '\$\d{1,3}(?:,\d{3})*(?:\.\d{2})?'

EXERCISE 2: Valid IPv4 Addresses

Challenge: Match valid IPv4 addresses where each octet is 0 to 255. This is the classic "hard" regex problem.

Sample input: "10.0.0.1 and 192.168.1.100 and 999.999.999.999 and 256.1.1.1"

Expected: 10.0.0.1 and 192.168.1.100 match. The others do not.

Solution:

\b(?:(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)\.){3}(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)\b

Breaking it down (one octet):

  25[0-5]       250 through 255
  2[0-4]\d      200 through 249
  1\d\d         100 through 199
  [1-9]?\d      0 through 99 (optional leading digit)

The octet pattern repeats: three times with a dot, once without.
Word boundaries prevent partial matches.

Test it:

echo "10.0.0.1 192.168.1.100 999.999.999.999 256.1.1.1" | grep -oP '\b(?:(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)\.){3}(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)\b'

EXERCISE 3: Find Duplicated Words

Challenge: Find repeated words in text using a backreference.

Sample input: "The the quick brown fox fox jumped over the the lazy dog"

Expected: "The the", "fox fox", "the the"

Solution:

\b(\w+)\s+\1\b

Breaking it down:

  \b         word boundary
  (\w+)      capture a word (group 1)
  \s+        one or more whitespace characters
  \1         backreference: same text as group 1
  \b         word boundary

This is case-sensitive. "The the" matches if you add /i:

echo "The the quick brown fox fox jumped" | grep -oiP '\b(\w+)\s+\1\b'

EXERCISE 4: Named Group Key-Value Extraction

Challenge: Extract key-value pairs from config-style text using named capture groups.

Sample input:
  host=localhost
  port=5432
  dbname=myapp
  timeout=30

Expected: capture each key and value separately by name.

Solution:

(?P<key>\w+)=(?P<value>\S+)

Breaking it down:

  (?P<key>\w+)     named group "key": one or more word characters
  =                literal equals sign
  (?P<value>\S+)   named group "value": one or more non-whitespace

Test it with Perl to see the named captures:

echo "host=localhost" | perl -ne 'print "key=$+{key} value=$+{value}\n" if /(?P<key>\w+)=(?P<value>\S+)/'

EXERCISE 5: Multiple Date Formats

Challenge: Match dates in three formats: YYYY-MM-DD, MM/DD/YYYY, and DD.MM.YYYY.

Sample input: "Dates: 2026-02-19, 02/19/2026, 19.02.2026, and not 13/32/2026"

Expected: all three valid dates match.

Solution:

(?:\d{4}-\d{2}-\d{2}|\d{2}/\d{2}/\d{4}|\d{2}\.\d{2}\.\d{4})

Breaking it down:

  \d{4}-\d{2}-\d{2}     YYYY-MM-DD
  |                      or
  \d{2}/\d{2}/\d{4}     MM/DD/YYYY
  |                      or
  \d{2}\.\d{2}\.\d{4}   DD.MM.YYYY

Note: this checks format, not calendar validity. 13/32/2026
matches the shape of MM/DD/YYYY even though it's nonsense. Full
validation requires the octet-style range checking from Exercise 2
applied to month and day ranges.

Test it:

echo "2026-02-19, 02/19/2026, 19.02.2026" | grep -oP '(?:\d{4}-\d{2}-\d{2}|\d{2}/\d{2}/\d{4}|\d{2}\.\d{2}\.\d{4})'

EXERCISE 6: Password Strength Validator

Challenge: Match passwords that have ALL of these: at least 8 characters, one uppercase letter, one lowercase letter, one digit, and one special character from !@#$%^&*.

Sample input:
  "password"       (fail: no upper, digit, special)
  "Password1"      (fail: no special)
  "P@ssw0rd"       (pass: has everything)
  "Sh0rt!"         (fail: only 6 characters)

Solution:

^(?=.*[A-Z])(?=.*[a-z])(?=.*\d)(?=.*[!@#$%^&*]).{8,}$

Breaking it down:

  ^                  start of string
  (?=.*[A-Z])        lookahead: somewhere there's an uppercase letter
  (?=.*[a-z])        lookahead: somewhere there's a lowercase letter
  (?=.*\d)           lookahead: somewhere there's a digit
  (?=.*[!@#$%^&*])   lookahead: somewhere there's a special char
  .{8,}              main match: any 8 or more characters
  $                  end of string

Each lookahead checks its condition independently from the start.
They don't consume characters, so they all fire from position 0.

Test it:

echo "P@ssw0rd" | grep -P '^(?=.*[A-Z])(?=.*[a-z])(?=.*\d)(?=.*[!@#$%^&*]).{8,}$'

EXERCISE 7: Extract URLs

Challenge: Match URLs from mixed text, including those with query strings and fragments.

Sample input: "Visit https://example.com/path?key=val&foo=bar#section or
               http://test.org/page and not just random.text"

Expected: https://example.com/path?key=val&foo=bar#section and
          http://test.org/page

Solution:

https?://[a-zA-Z0-9][\w.-]*(?:/[^\s]*)?

Breaking it down:

  https?://                 http:// or https://
  [a-zA-Z0-9]              domain starts with alphanumeric
  [\w.-]*                  rest of domain (letters, dots, hyphens)
  (?:/[^\s]*)?             optional path: slash then anything non-space

Test it:

echo "Visit https://example.com/path?key=val&foo=bar#section or http://test.org/page" | grep -oP 'https?://[a-zA-Z0-9][\w.-]*(?:/[^\s]*)?'

EXERCISE 8: Parse Log Lines Into Named Groups

Challenge: Parse Apache/Nginx-style log lines into named groups for IP address, date, HTTP method, path, status code, and response size.

Sample input:
  192.168.1.1 - - [19/Feb/2026:10:30:22 +0000] "GET /index.html HTTP/1.1" 200 1234

Expected: each component captured by name.

Solution:

(?P<ip>\S+) - - \[(?P<date>[^\]]+)\] "(?P<method>\w+) (?P<path>\S+) \S+" (?P<status>\d{3}) (?P<size>\d+)

Breaking it down:

  (?P<ip>\S+)            non-whitespace block for IP
  \ -\ -\                literal " - - "
  \[(?P<date>[^\]]+)\]   date in square brackets
  "(?P<method>\w+)       HTTP method (GET, POST, etc.)
  \ (?P<path>\S+)        the request path
  \ \S+"                 the HTTP version (not captured)
  (?P<status>\d{3})      three-digit status code
  \ (?P<size>\d+)        response size in bytes

Test it:

echo '192.168.1.1 - - [19/Feb/2026:10:30:22 +0000] "GET /index.html HTTP/1.1" 200 1234' | perl -ne 'if (/(?P<ip>\S+) - - \[(?P<date>[^\]]+)\] "(?P<method>\w+) (?P<path>\S+) \S+" (?P<status>\d{3}) (?P<size>\d+)/) { print "IP: $+{ip}\nDate: $+{date}\nMethod: $+{method}\nPath: $+{path}\nStatus: $+{status}\nSize: $+{size}\n" }'

EXERCISE 9: Non-Self-Closing HTML Tags

Challenge: Match opening HTML tags but NOT self-closing ones. Match <div> and <p> but not <br/> or <img />.

Sample input: "<div> <br/> <p class='x'> <img src='pic.jpg' /> <span>"

Expected: <div>, <p class='x'>, <span>

Solution:

<[a-zA-Z][\w]*(?:\s[^>]*?)?(?<!/)>

Breaking it down:

  <                    opening angle bracket
  [a-zA-Z][\w]*        tag name starting with a letter
  (?:\s[^>]*?)?        optional attributes (whitespace then stuff)
  (?<!/)               negative lookbehind: NOT preceded by /
  >                    closing angle bracket

The lookbehind checks that the character just before > is not a
slash. Self-closing tags like <br/> have a / right before >.

Test it:

echo "<div> <br/> <p class='x'> <img src='pic.jpg' /> <span>" | grep -oP '<[a-zA-Z][\w]*(?:\s[^>]*?)?(?<!/)>'

EXERCISE 10: Extract Function Calls

Challenge: Match function calls from code, capturing the function name and its arguments separately.

Sample input:
  result = calculate(42, "hello")
  data = transform(x + y)
  nothing = my_func()

Expected: capture function name and arguments for each.

Solution:

(\w+)\(([^)]*)\)

Breaking it down:

  (\w+)        capture group 1: function name
  \(           literal opening paren
  ([^)]*)      capture group 2: everything inside parens (no nested)
  \)           literal closing paren

Test it:

echo 'result = calculate(42, "hello")' | perl -ne 'print "func: $1, args: $2\n" if /(\w+)\(([^)]*)\)/'

Limitation: this doesn't handle nested parentheses. For
func(a(b)), it would match a(b) instead. Handling arbitrary
nesting is where you need recursive patterns, which is beyond
what most people need day to day.

EXERCISE 11: Match Balanced Quotes

Challenge: Match double-quoted strings that may contain escaped quotes inside them.

Sample input: He said "hello \"world\"" and "goodbye"

Expected: "hello \"world\"" and "goodbye"

Solution:

"(?:[^"\\]|\\.)*"

Breaking it down:

  "               opening quote
  (?:             non-capturing group:
    [^"\\]           any char that's NOT a quote or backslash
    |                or
    \\.              a backslash followed by any character (escape)
  )*              zero or more times
  "               closing quote

The alternation handles two cases: normal characters that aren't
special, or escaped characters (backslash plus whatever). This
correctly skips over \" without treating it as a closing quote.

Test it:

echo 'He said "hello \"world\"" and "goodbye"' | grep -oP '"(?:[^"\\]|\\.)*"'

EXERCISE 12: DEFINE Block Timestamp Parser

Challenge: Build a pattern using (?(DEFINE)...) that parses a full ISO 8601 timestamp with timezone.

Sample input: "2026-02-19T22:30:45+05:00"

Expected: capture year, month, day, hour, minute, second, timezone.

Solution:

(?(DEFINE)
  (?P<year>\d{4})
  (?P<month>(?:0[1-9]|1[0-2]))
  (?P<day>(?:0[1-9]|[12]\d|3[01]))
  (?P<hour>(?:[01]\d|2[0-3]))
  (?P<min>[0-5]\d)
  (?P<sec>[0-5]\d)
  (?P<tz>(?:Z|[+-]\d{2}:\d{2}))
)
(?P>year)-(?P>month)-(?P>day)T(?P>hour):(?P>min):(?P>sec)(?P>tz)

Breaking it down:

  The DEFINE block creates named patterns that don't match on their
  own. They're like function definitions.

  (?P<year>\d{4})      4 digits for year
  (?P<month>...)       01 through 12
  (?P<day>...)         01 through 31
  (?P<hour>...)        00 through 23
  (?P<min>...)         00 through 59
  (?P<sec>...)         00 through 59
  (?P<tz>...)          Z or +/-HH:MM

  Then (?P>year) calls each pattern by name.

Test it (as a one-liner, you'd collapse the whitespace or use /x):

echo "2026-02-19T22:30:45+05:00" | perl -ne 'print "matched: $&\n" if /(?(DEFINE)(?P<year>\d{4})(?P<month>(?:0[1-9]|1[0-2]))(?P<day>(?:0[1-9]|[12]\d|3[01]))(?P<hour>(?:[01]\d|2[0-3]))(?P<min>[0-5]\d)(?P<sec>[0-5]\d)(?P<tz>(?:Z|[+-]\d{2}:\d{2})))(?P>year)-(?P>month)-(?P>day)T(?P>hour):(?P>min):(?P>sec)(?P>tz)/'

BONUS EXERCISE: The Kitchen Sink

Challenge: Write one pattern that matches a full URL with named capture groups for protocol, domain, port (optional), path (optional), query string (optional), and fragment (optional).

Sample: https://www.example.com:8080/path/to/page?name=mike&age=42#section

Solution:

(?P<protocol>https?)://(?P<domain>[a-zA-Z0-9][\w.-]*)(?::(?P<port>\d+))?(?P<path>/[^?#\s]*)?(?:\?(?P<query>[^#\s]*))?(?:#(?P<fragment>\S*))?

This is a monster. But break it into named groups and it reads
like a spec. That's the power of everything you learned tonight.

Test it:

echo "https://www.example.com:8080/path/to/page?name=mike&age=42#section" | perl -ne 'if (/(?P<protocol>https?):\/\/(?P<domain>[a-zA-Z0-9][\w.-]*)(?::(?P<port>\d+))?(?P<path>\/[^?#\s]*)?(?:\?(?P<query>[^#\s]*))?(?:#(?P<fragment>\S*))?/) { print "Protocol: $+{protocol}\nDomain: $+{domain}\nPort: $+{port}\nPath: $+{path}\nQuery: $+{query}\nFragment: $+{fragment}\n" }'

WHERE TO GO FROM HERE

Tonight gave you the tools. Now you need the reps.

1. regex101.com is your best friend. Paste a pattern, paste test

strings, and it breaks down every piece of the match in real
time. It supports PCRE, Python, JavaScript, and Go flavors. Use
the PCRE2 flavor for everything you learned tonight.

2. Practice on real data. Take your actual server logs, config files,

code, CSV exports, whatever you work with daily. Write patterns
against real text, not contrived examples.

3. Read other people's regex. When you encounter a pattern in a

codebase, stop and parse it. Use regex101.com if you need to.
Every pattern you decode makes you faster at writing your own.

4. Keep the PCRE cheat sheet from tonight's materials nearby. You

won't memorize all the syntax at first. Reference it until you
don't need to anymore.

5. Challenge yourself to use the /x flag on every pattern longer than

30 characters. If you can't explain each piece in a comment,
you don't fully understand it yet. The comments are for you.

Regular expressions are one of those skills that pay dividends for your entire career. Every language supports them. Every system has text to parse. And now you know how to parse it.