Python RegEx

RegEx is short for regular expression. It is a sequence of characters that defines a search pattern. RegEx is used to find the specified pattern in a string.

RegEx Module in Python

In Python, RegEx is available as a built-in module called re. Import the re module to use RegEx in your code.

Example

import re

Use RegEx in Python

After importing the RegEx module, you can use it's functions to find any pattern in a string.

Example

Here is an example of searching the word "code is" in a string:


import re

text = "It is a sequence of numbers from 0 to 9. The code is 9091."
match = re.search(r'\bcode is\b',text)
if match:
    print("Found")
else:
    print("Not Found")
Output
Found
Example

Following is an example to extract the specified word "code is 9091" from the string:


import re

text = "It is a sequence of numbers from 0 to 9. The code is 9091."
result = re.search(r'\bcode is \d+\b',text).group(0)
print(result)
Output
code is 9091

RegEx Functions

The re module provides several functions to find the specified pattern in a string.

The compile() Function

The compile(pattern, repl, string) function compiles a regular expression and return a Pattern object.

Example

import re

text = "The code is 9091. The code is 9091."
pattern = "9091"
pattern_obj = re.compile(pattern)
print(pattern_obj.findall(text))
Output
['9091', '9091']

The findall() Function

The findall(pattern, string, flags=0) function returns a list of all non-overlapping matches of pattern in string.

Example

import re

text = "It is a sequence of numbers from 0 to 9. The code is 9091."
pattern = "\d+"
result = re.findall(pattern,text)
print(result)
['0', '9', '9091']

The search() Function

The search(pattern, string, flags=0) function scans a string looking for a match to the pattern and returns a Match object if match is found or None if no match is found.

Example

import re

text = "It is a sequence of numbers from 0 to 9. The code is 9091."
pattern = "\d+"
result = re.search(pattern,text)
print(result)
<re.Match object; span=(33, 34), match='0'>

The split() Function

The split(pattern, string, maxsplit=0, flags=0) function splits the source string at each match of the pattern and returns the resulting list of substrings.

Example

import re

text = "mango1apple2grapes3orange4pear"
pattern = "\d+"
result = re.split(pattern,text)
print(result)
['mango', 'apple', 'grapes', 'orange', 'pear']

The sub() Function

The sub(pattern, repl, string, count=0, flags=0) function replaces the leftmost non-overlapping occurrences of the pattern in string and returns the resulting string. The unchanged string is returned if the pattern is not found.

Example

import re

text = "mango1apple2grapes3orange4pear"
pattern = "\d+"
repl = " "
result = re.sub(pattern,repl,text)
print(result)
mango apple grapes orange pear

RegEx Metacharacters

Metacharacters are characters with special meaning. Following are the list of RegEx metacharacters in Python:

Character Description Example
. This (dot) matches any character except a new line. "sec..t"
^ This (caret) matches the start of the string. "^complete"
$ This matches the end of the string. "secret$"
[ ] This matches a set of characters. "[A-Za-z]"
+ This matches 1 or more occurrence of the preceding regex pattern. "[A-Za-z]+"
* This matches 0 or more occurrence of the preceding pattern. "[A-Za-z]*"
{ } This matches the exact number of occurrences of the preceding pattern. "[a-z]{3,10}"
| This is regex or condition. "a|b"
(re) This matches a pattern as a group. "(hello)"

RegEx Sequence Characters

RegEx uses the \ character to allow special characters to be used without invoking their special meaning. Following are the list of RegEx Backslash Characters in Python:

Character Description Example
\A Matches the specified pattern at the beginning. "\Ahello"
\b Matches the specified pattern at the beginning or at the end of a word. The use of "r" at the beginning treats the string to be used as a raw string. r"\bhello"
r"hello\b"
\B Matches if the specified pattern is not at the beginning or at the end of a word. The use of "r" at the beginning treats the string to be used as a raw string. r"\Bhello"
r"hello\B"
\d Matches digits in a string. "\d"
\D Matches if a string does not contain digits. "\D"
\s Matches whitespaces in a string. "\s"
\S Matches if a string does not contain a whitespace. "\s"
\w Matches a word in a string "\w"
\W Matches if a string does not contain a word. "\W"
\Z Matches the specified pattern at the end of a string. "buddy\Z"