Regular Expression in JavaScript

In this section of the tutorial, we will discuss Regular Expressions in JavaScript and their different types which are used to match the character combinations in strings in a JavaScript program with some good examples for in-depth understanding.

What is Regular Expression in JavaScript?

In JavaScript, an object that describes a textual pattern is called a Regular Expression. Regular Expressions are represented by the JavaScript RegExp class.

  • Regular Expressions are used to perform pattern-matching and search/replace functions on text using the String and RegExp methods.
  • The grammar used in JavaScript Regular Expression is similar to the grammar used by other programming languages.
  • For using the RegExp API, it is necessary to understand the use of patterns of text using regular expression grammar.

How to define Regular Expression in JavaScript?

For representing Regular Expressions, RegExp objects are used in JavaScript. Such objects can be created using the RegExp() constructor or using the literal syntax preferably.

We will take an example to understand how we can use the literals to define regular expression in JavaScript.

Example:

<html>
<head>
<title>JavaScript</title>
<script type="text/javascript">

let pattern = /s$;

let pattern = new RegExp("s$");

</script>
</head>
<body>
</body>
</html>
  • Here, we can clearly see in line 6 that it creates a new RegExp object assigned to the variable pattern.
  • This RegExp object will match those strings ending with the letter “s”.
  • Also, in line 8, we have alternatively used the RegExp() constructor to create a RegExp object.
  • In the above code, s is the character that matches with itself and $ is the meta-character that matches the end of a string.

We will discuss more about the Characters and Meta-characters in the coming section.

Character and Meta-characters used in JavaScript

A regular expression consists of different characters to match the strings.

In the following section, we will discuss some characters and meta-characters used in JavaScript Regular Expression.

Literal Characters

Literal Characters include all alphabetic characters and digits that match themselves literally in regular expressions.

Although, the regular expression syntax also supports certain nonalphabetic characters through escape sequence that we have learnt in detail in JavaScript Strings tutorial.

Below is a table that lists such characters and their relevant matches in a string.

Character
Matches
Alphanumeric Itself
\0 Null character
\t Tab
\n Newline
\v Vertical tab
\f Form feed
\r Carriage return
\Xnn Latin character specified by hexadecimal number xxxx
\Uxxxx Unicode character specified by hexadecimal number xxxx
\u{n} Unicode character specified by the codepoint n, where n is one to six hexadecimal digits(only used with u flag)
\cX Control character ^X

These characters can be used in a regular expression using the escape sequences.

However, if we are using the RegExp() constructor, the backslashes in the regular expressions should be doubled since strings also use backslashes as an escape character.

Character Classes

Character Classes are used by combining the individual literal characters within the square brackets.

JavaScript regular expression syntax includes special characters and escape sequences for representing the common classes.

Below is a table that lists character and their relevant matches in a string.

Character
Matches
[…] One character within the brackets
[^…] One character not within the brackets
. Any character except newline or another Unicode terminator
\w Any ASCII word character
\W Any non-ASCII word character
\s Any Unicode whitespace character
\S Any non-Unicode whitespace character
\d Any ASCII digit
\D Any character except ASCII digit
[\b] Literal backspace

Also, the special character-class escapes can be used within square brackets.

Repetition

Repetition specifies the number of times an element of a regular expression may have been repeated.

It is a more complex form of pattern that uses regular expression syntax.

For example, a number having any number of digits or a string of three letters followed by an optional digit.

Below is a table that lists such repetition characters and their relevant matches in a string.

Character
Meaning

{n,m}

Match previous item n times at least and m times at most
{n,}

Match previous item n or more times

{n}

Match previous item exactly n times
?

Match zero or one occurrences of previous item

+

Match one or more occurrences of previous item
*

Match zero or more occurrences of previous item

We must keep in mind while using the * and ? repetition characters that these characters may match zero instances of whatever precedes them as they are allowed to match nothing.

Flags

Flags are used in the regular expression to alter its matching behavior. There can be one or more flags associated with a regular expression.

  • In total, there are six possible flags, each represented using a single letter.
  • Flags are specified after the second / character of a regular expression literal.
  • Flags are specified as a string passed as the second argument to the RegExp() constructor.

In the following section, we are about to discuss the supported flags in the regular expression and their meanings.

The g flag
  • This flag indicates that the regular expression is global which means that it can be used to find all the matches within a string instead of just finding the first match.
  • The g flag doesn’t alter the way pattern matching is done, but it can alter the behavior of String match() method and RegExp exec() methods.
The i flag
  • This flag is intended to use in a case-insensitive pattern matching.
The m flag
  • This flag specifies that matching should only be done in a multiline mode. That the RegExp will be used with multiline strings.
  • Also, the ^ and $ anchors should match both the beginning and the ending of the string.
The s flag
  • Similarly to the m flag, this s flag is intended to be used when working with text that includes newlines.
  • Usually, the “.” in a regular expression matches any character except a line terminator.
The u flag
  • The u flag stands for Unicode, and it is intended to make the regular expression match with full Unicode codepoints instead of matching the 16-bit values.
  • This flag is used to make the RegExp works seamlessly with text having emoji and other characters like Chinese characters that require more than 16-bit values.
The y flag
  • This y flag points that the regular expression is sticky and that it should at the beginning of a string or at the first character following the previous match.
  • This flag is useful with regular expressions that are used repeatedly to find all matches within a string.

All of these flags can be used with regular expressions in any combination and in any order.

Now that we have covered all the basics of JavaScript, we will use its implementation from the next tutorial in some good JavaScript Problems along with a brief use of HTML and CSS.