Regex Quick Guide

Sara Cemal
3 min readJul 23, 2021

Your pocket guide to learning Regular Expressions

If the photo above looks like some sort of weird hieroglyphics, then you are in the right place! Regular expressions look quite strange to the untrained eye, but with a little bit of practice, it’s easy to figure out what people may be asking for.

What are Regular Expressions?

Regular expressions, or Regex for short, are a sequence of characters that helps specify a search pattern. These patterns are predominately used for matching, locating, and managing text. Like I stated previously, regex can look wild if it’s a foreign concept to you. But if you’re someone who works with a lot of text, it can be a huge time saver to take the time to learn regular expressions and how to use them to your best advantage. In addition to that, the searching for specific text can also be useful for debugging. It’s also helpful for parsing large amounts of data. If used from the terminal, regular expressions can be used to find text within a large file. So… the possibilities are endless!

The coolest and most useful part of learning regular expressions is its syntax isn’t specific to just one language, and once you learn it, it can be used for most programming languages. Javascript, Java, Perl, R, C#, C++ (just to name a few).

There are many rules when it comes to regular expressions, but I will start off with the basics.

Anchors

Anchors are ^ and $

^hello        matches any string that starts with hello-> 
world$
matches a string that ends with world
^hello world$ exact string match
(starts and ends with hello world)
dog matches any string that has the text dog in it

Quantifiers

Quantifiers are *, +, ?, {}

xyz*        matches a string that has xy followed by zero or more z
xyz+
matches a string that has xy followed by one or more z
xyz?
matches a string that has xy followed by zero or one z
xyz{2}
matches a string that has xy followed by 2 z
xyz{2,}
matches a string that has xy followed by 2 or more z
xyz{2,5}
matches a string that has xy followed by 2 up to 5 z
x(yz)*
matches a string that has x followed by zero or more copies of the sequence yz
x(yz){2,5}
matches a string that has x followed by 2 up to 5 copies of the sequence yz

OR

OR operators can be | and []

x(y|z)     matches a string that has x followed by y or z(and captures y or z) 
x[yz]
same as previous, but without capturing y or z

Character classes

Character classes can be \d \w \s or .

\d         matches a single character that is a digit
\w
matches a word character (alphanumeric character plus underscore)
\s
matches a whitespace character (includes tabs and line breaks)
. matches any character

If you notice, these are all using the lowercase alphabet. There is a difference to be noted between lowercase and uppercase — the uppercase will do the inverse. For example, \D will match a single character that is NOT a digit. Something to keep in mind!

Flags

Flags are very important when it comes to constructing a regular expression!

  • g (global) does not return after the first match, restarting the subsequent searches from the end of the previous match
  • m (multi-line) when enabled ^ and $ will match the start and end of a line, instead of the whole string
  • i (insensitive) makes the whole expression case-insensitive (for instance /aBc/i would match AbC)

--

--

Sara Cemal

Flatiron School alumni with a sociology and neuroscience background.