Python RegEx
In Python, RegEx is short for regular expression. RegEx is a sequence of characters that defines a search pattern. It is used to search for and manipulate text strings based on patterns, rather than exact matches of specific characters.
Python has built-in support for regex through the re module.
The re module in Python provides support for regular expressions. It includes several functions and methods that allow you to search and manipulate text strings using regular expressions. Some of the most commonly used functions include search(), compile(), findall(), split(), and sub().
Here is an example of how to use regex in Python:
import re
text = "Learn to code by reading and understanding the concept."
pattern = r"code"
# Search for the word "code" in the text
match = re.search(pattern, text)
if match:
print("Found the word:", match.group())
else:
print("Not found")
Output
In this example, we first start by importing the re module. The re.search() function is used to search for the string "code" in the text variable. The r before the string indicates that it is a raw string, which is used to avoid having to escape backslashes in the string.
If the pattern is found, the search() function returns a match object, which can be used to extract the matched substring. The group() method is used to extract the matched substring.
Here is another example to search for the words "code is 9091" from a given text:
import re
text = "It is a sequence of numbers from 0 to 9. The code is 9091."
pattern = r"\bcode is \d+\b"
result = re.search(pattern, text)
if result:
print("Found: ", result.group(0))
else:
print("Not found")
Output
In this example, \b in the pattern matches the specified pattern at the beginning or at the end of a word. The \d+ pattern matches one or more numeric digit in a string.
Here are a few more examples with complex pattern matching:
import re
text = "apple is a fruit. The boy is eating apple. sun rise in the east."
# pattern to search for words that have "e" as the second letter
pattern1 = r"\b\w[p]\w*\b"
# pattern to search for words that end with "ing"
pattern2 = r"\w+ing\b"
# pattern to search for words that start with "s"
pattern3 = r"\bs\w*"
match = re.search(pattern1, text)
print(match.group())
match = re.search(pattern2, text)
print(match.group())
match = re.search(pattern3, text)
print(match.group())
Output
eating
sun
The search() Function
The search(pattern, string, flags=0) function scans a string looking for a match to the pattern and returns a Match object if match is found or None if no match is found.
Example
import re
text = "The code is 9097"
pattern = r"\d+"
result = re.search(pattern, text)
print(result.group())
The compile() Function
The compile(pattern, repl, string) function compiles a regular expression and return a Pattern object.
Example
import re
text = "The code is 9091. The code is 9091."
pattern = "9091"
pattern_obj = re.compile(pattern)
print(pattern_obj.findall(text))
Output
The findall() Function
The findall(pattern, string, flags=0) function returns a list of all non-overlapping matches of pattern in string.
Example
import re
text = "It is a sequence of numbers from 0 to 9. The code is 9091."
pattern = "\d+"
result = re.findall(pattern,text)
print(result)
The split() Function
The split(pattern, string, maxsplit=0, flags=0) function splits the source string at each match of the pattern and returns the resulting list of substrings.
Example
import re
text = "mango1apple2grapes3orange4pear"
pattern = "\d+"
result = re.split(pattern,text)
print(result)
The sub() Function
The sub(pattern, repl, string, count=0, flags=0) function replaces the leftmost non-overlapping occurrences of the pattern in string and returns the resulting string. The unchanged string is returned if the pattern is not found.
Example
import re
text = "mango1apple2grapes3orange4pear"
pattern = "\d+"
repl = " "
result = re.sub(pattern,repl,text)
print(result)
RegEx Metacharacters
Metacharacters are characters with special meaning. Following are the list of RegEx metacharacters in Python:
Character | Description | Example |
. | This (dot) matches any character except a new line. | "sec..t" |
^ | This (caret) matches the start of the string. | "^complete" |
$ | This matches the end of the string. | "secret$" |
[ ] | This matches a set of characters. | "[A-Za-z]" |
+ | This matches 1 or more occurrence of the preceding regex pattern. | "[A-Za-z]+" |
* | This matches 0 or more occurrence of the preceding pattern. | "[A-Za-z]*" |
{ } | This matches the exact number of occurrences of the preceding pattern. | "[a-z]{3,10}" |
| | This is regex or condition. | "a|b" |
(re) | This matches a pattern as a group. | "(hello)" |
RegEx Sequence Characters
RegEx uses the \ character to allow special characters to be used without invoking their special meaning. Following are the list of RegEx Backslash Characters in Python:
Character | Description | Example |
\A | Matches the specified pattern at the beginning. | "\Ahello" |
\b | Matches the specified pattern at the beginning or at the end of a word. The use of "r" at the beginning treats the string to be used as a raw string. |
r"\bhello"
r"hello\b" |
\B | Matches if the specified pattern is not at the beginning or at the end of a word. The use of "r" at the beginning treats the string to be used as a raw string. |
r"\Bhello"
r"hello\B" |
\d | Matches digits in a string. | "\d" |
\D | Matches if a string does not contain digits. | "\D" |
\s | Matches whitespaces in a string. | "\s" |
\S | Matches if a string does not contain a whitespace. | "\s" |
\w | Matches a word in a string | "\w" |
\W | Matches if a string does not contain a word. | "\W" |
\Z | Matches the specified pattern at the end of a string. | "buddy\Z" |