Introduction

Setup

Operators

variable

String

Number

Dtype & Comment

Lists

Tuples

Sets

Dictionaries

if else statement

For Loop

While Loop

iterator & iterable

Pre-Built functions

Function

Decorators

Generators

Types of errors

Exceptions Handling

OOP

Regular expression

Create module

OS module

Date module

Random module

Math

Statistics module

Exercise 1

Exercise 2

Exercise 3

Learn Data Structure & Algorithm

Learn Numpy

Learn Pandas

Learn Matplotlib

Learn Seaborn

Learn Statistics

Learn Math

Learn MATLAB

Learn Machine learning

Learn Github

Learn OpenCV

Learn Deep Learning

Learn MySQL

Learn MongoDB

Learn Web scraping

Learn Excel

Learn Power BI

Learn Tableau

Learn Docker

Learn Hadoop

python regular expression

What is python regular expression?

The regular expression is a special sequence of characters that helps to find or match strings using some special patterns.

Meta characters


character Description Example
[] A set of characters "[a-z]"
It means any or set of characters can between a to z.
"[0-9]"
It means any number or set of numbers can come between 0 to 9.
. It means any character or set of characters can come except newline "raf...mad"
It means that the pattern will start from raf and end wiht mad and between of these characters any thing can come except newline
^ Starts with. "^Rafsan"
It means the pattern should starts with Rafsun and it doesn't matter what comes after Rafsun.
$ Ends with "$Ahmad"
It means the pattern should end with Ahmad and it doesn't matter that what comes before Ahmad
* Zero or more occurrences "X*"
It means that the pattern can have one X, zero X or more than one X
+ One or more occurrences "X+"
It means that the pattern can have at least one X but can have more than one
{} Exactly the specified number of occurrences "X{2}"
It means that the pattern can have X only 2 times,not more than 2 or less than 2.
() Capture and group "X{2}"
It means that the pattern can have X only 2 times,not more than 2 or less than 2.
| Either or "YES|NO"
It means that the pattern can have YES or NO.
? Zero or one occurrence "x?"
It means that the pattern can have x zero time or one time but not more than one time.
\d Returns a match where the string contain digits.Numbers between 0-9 "\d
\D It returns a match where the string doesn't contain digits "\D"
\s It returns a match where the string contains a white space character "\s"
\S It returns a match where the string doesn't contain a white space character "\S"
\w It returns a match where the string contains any word characters.The characters should be from a to Z,digit from 0-9 and the underscore "\W"
\W It returns a match where the string doesn't contain any"\W" word characters "\W"

Special sequence


character Description Example
\d Returns a match where the string contain digits.Numbers between 0-9 "\d
\D It returns a match where the string doesn't contain digits "\D"
\s It returns a match where the string contains a white space character "\s"
\S It returns a match where the string doesn't contain a white space character "\S"
\w It returns a match where the string contains any word characters.The characters should be from a to Z,digit from 0-9 and the underscore "\W"
\A It returns a match if the given patterns are at the beginning of the string "\A"

Sets


character Description
[abc] It returns a match where one of the specified characters (a, b, or c) are present
[a-e] It returns a match for any lower case character, alphabetically between a and e
[^abc] It returns a match for any character EXCEPT a, b, and c
[0356] It returns a match where any of the specified digits (0, 3, 5, or 6) are present
[0-9] It returns a match for any digit between 0 and 9
[0-6][0-8] It returns a match for any two-digit numbers from 00 and 68
[a-zA-Z] It returns a match for any character alphabetically between , lower case a and z OR upper case a and z
[+] In sets, +, *, ., |, (), $,{} has no special meaning, so [+] means, return a match for any + character present in the specified string

findall() function

findall() function returns a list that contains all the matches. In the findall function, we will pass two parameters, the first one is the pattern and the second is that variable name where we have our text.
We have to pass the pattern as a raw string.

Example:1
Input
import re
"""
To work with the regular expression we have to import re model
"""

txtVar="Rafsun scored 100 and Ahmad scored 150 and
      Jenni scored 139 and Sophie scored 70"
re_examp_1=re.findall(r"[A-Z][a-z]*",txtVar)
print(re_examp_1)

"""

What is the pattern saying?
This pattern says that the first letters must be capital letters between A to Z and the second letter must be the lower case between a to z. After this, we use the asterisk sign. It means there can be multiple lower case letters.
"""

Output
["Rafsun","Ahmad","Jennie","Sophie"]

Look in the pattern, we want the first letter capital and second letter small and there can be multiple small letters. So in the output, we got some names. We get names because in the txtVar only names have the first letter in upper case and the second letter is in lower case and after the second letter there are some other small letters too. That's why we got the names.

Example:2
Input
import re
txtVar="Rafsun scored 100 and Ahmad scored 150 and
      Jenni scored 139 and Sophie scored 70"
re_examp_1=re.findall(r"\d{3}",txtVar)
print(re_examp_1)

Output
["100","150","139"]

Here the pattern says that find all the 3 numerical digits. In txtVar we have a total of four numbers but we got three-digit in the output. It happens because we need three-digit numbers and among four the numbers 70 have two digits.

Example:3
Input
import re
txtVar="Bat Rat Cow Bird Cat"
#Form here find Bat, Rat and Cat.
output=re.findall("[BCR]at",txtVar)
print(output)
"""
Here in the third bracket BCR means find words that start with the letter B or C or R and after B or C or R we want letters a and t.
"""
Output
["Bat","Cat","Rat"]

search() function

Search() function returns a match object if there is match.

Example:1
Input
import re
txtVar="Her name is Sophie and Sophie is a good girl"
if re.search("Sophie",txtVar):
  print("Item found")

Output
Item found
Example:2
Input
import re
txtVar="Her name is Sophie and Sophie is a good girl"
if re.search("Sophie",txtVar):
  print("Item found")

match_value=re.findall("Sophie",txtVar)
print(match_value)
Output
Item found
["Sophie","Sophie"]

finditer() function

finditer() function works same like findall() function. But it returns the matches in the from of iterator. It scans the string from left to right.

Example:
Input
import re
txtVar="Her name is Sophie and Sophie is a good girl"
for i in re.finditer("Sophie",txtVar):
  index=i.span()
  #span() function used to get output in the form of tuple
  print(index)
   '''
Here we will print the starting and ending index of Sophie. Here we have Sophie two times, so we will get two starting and ending indexes in the form of a tuple.
'''
Output
(12,17)
(23,28)

compile() function

Suppose you want to created a pattern. Now you want to apply that pattern in different strings which are stored in different variable. Now if you write the pattern every time when you want to use then it is wasting your time. In this case you will use compile() function. You will define the pattern in the compile function and can use how many times you want just using the variable.

Example:
Input
import re

#Defining the pattern
reg=re.compile("\d{3}")
txtVar="Jenny score is 255 152 222"
txtVar2=reg.findall(txtVar)
print("Jenny:",txtVar2)

txtVar3="Rafsun score is 555 333 111"
txtVar4=reg2.findall(txtVar3)
print("Rafsun:",txtVar4)

'''
Look in the example we created the patterns one time but used two times.
'''
Output
Bat Fish Cow Bird Cat

sub() function

Suppose you have created a pattern. Now you want to find that pattern and then replace all matched pattern with a specific text. In this case you will use sub() function to replace the matches.

Example:1
Input
import re
txtVar="Bat Rat Cow Bird Cat"
#Replace Rat with Fish
reg=re.compile("[R]at")
txtVar2=reg.sub("Fish",txtVar)
print(txtVar2)
Output
Bat Fish Cow Bird Cat
Example:2
Input
import re
txtVar="(212)-456-7890"
reg=re.compile("\D")
txtVar2=reg.sub("",txtVar)
print(txtVar2)
Output
2124567890
Example:3
Input
import re
txtVar="How are you.\nI'm fine.\nHope you are also fine."
"""
After each full stop, we will get a line break. It means after each full stop we will get the sentence in a new line. Now remove the new line and give space between each line
"""
reg=re.compile("\n")
txtVar2=reg.sub(" ",txtVar)
print(txtVar2)
Output
How are you? I'm fine. Hope you are also fine.

CodersAim is created for learning and training a self learner to become a professional from beginner. While using CodersAim, you agree to have read and accepted our terms of use, privacy policy, Contact Us

© Copyright All rights reserved www.CodersAim.com. Developed by CodersAim.