代码独立开发者

这个屌丝很懒，什么也没留下！

热门标签

【Python基础】常用模块（模块导入、包、错误和异常、正则表达式、pymysql、进程与线程）

作者：代码独立开发者 | 2024-02-03 13:50:30

踩

文章目录

常用模块

常用模块

1 模块导入

模块：每个python文件都是一个独立的模块

模块作用：实际工作中，整个项目代码比较多，可以将相同功能代码放到一个文件中，不同功能代码放到不同文件中，使代码易于维护；

模块：引入命名空间与作用域

1.1 导入方式

#导入整个模块
import 模块

#导如指定的属性
from 模块 import xxx

#导入多个属性
from 模块 import xxx, xxx

#导入后起别名
import 模块 as 别名

from 模块 import xxx as 别名1，xxx as 别名2
1
2
3
4
5
6
7
8
9
10
11
12
13

import os
from functools import reduce
import time as tm
from random import randint, randrange
from os.path import join as os_join

1
2
3
4
5
6

1.2 导入过程

模块导入要点：

模块导入中会被加载，加载过程中会被执行；
模块可以被导入多次，但是只会加载1次；

实例：
准备工作：在vscode一个文件中，创建两个文件：my_add.py, main_test.py，在mian_test.py中导入my_add，观察现象？

结果：my_add.py运行一次。

问题：实际工作中，每当编写一个模块，一般会有测试代码，如何使测试代码在导入中不执行？

# my_add.py
def func_add(x,y):
    return x+y

print("test func_add(1,2)=%d"%func_add(1,2))
func_add(1,2)
1
2
3
4
5
6

test func_add(1,2)=3





3
1
2
3
4
5
6
7

# main_test.py
import my_add
1
2

---------------------------------------------------------------------------

ModuleNotFoundError                       Traceback (most recent call last)

Cell In[3], line 2
      1 # main_test.py
----> 2 import my_add


ModuleNotFoundError: No module named 'my_add'
1
2
3
4
5
6
7
8
9
10

1.3 导入搜索路径

查找过程：

在当前目录下搜索该模块
在环境变量 PYTHONPATH 中指定的路径列表中依次搜索
在 Python 安装路径的 lib 库中搜索

具体可以查看sys.path的值：

import sys
sys.path
1
2

['d:\\study\\code\\jupyter\\PythonLearning',
 'D:\\software\\py3.11\\python311.zip',
 'D:\\software\\py3.11\\DLLs',
 'D:\\software\\py3.11\\Lib',
 'D:\\software\\py3.11',
 '',
 'C:\\Users\\26822\\AppData\\Roaming\\Python\\Python311\\site-packages',
 'D:\\software\\py3.11\\Lib\\site-packages',
 'D:\\software\\py3.11\\Lib\\site-packages\\win32',
 'D:\\software\\py3.11\\Lib\\site-packages\\win32\\lib',
 'D:\\software\\py3.11\\Lib\\site-packages\\Pythonwin']
1
2
3
4
5
6
7
8
9
10
11

1.4 name变量

__name__ 说明：

文件被执行：__name__值为__main__
文件被导入：__name__值为模块名

需求：当文件被执行时，执行测试代码，当文件作为模块被导入，不执行测试代码：

def func_add(x, y):
    return x + y

#通过__name__的值，判断是否导入
if __name__ == "__main__":
    print("test func_add(1, 2)=%d"%func_add(1,2))
    func_add(1, 2)

1
2
3
4
5
6
7
8

2 包

主要内容：

包的概念
相对导入与绝对导入

2.1 包的概念

包：是一个包含__init__.py文件的文件夹，

作用：更好的管理源码；

2.2 相对导入与绝对导入

绝对导入:

import 模块
from 模块 import 属性
1
2

相对导入：在包内部进行导入，基本语法：

from .模块 import xxx
from ..模块 import xxx
import .模块
#注意：
#.代表当前目录
#..代表上一级目录
#...代表上上级目录，依次类推

1
2
3
4
5
6
7
8

注意点：

绝对导入：一个模块只能导入自身的子模块或和它的顶层模块同级别的模块及其子模块；

相对导入：一个模块必须有包结构且只能导入它的顶层模块内部的模块

3 错误和异常

错误：

语法错误，Python解释器会进行提示；
逻辑错误，程序运行结果与预期不一致，需要自己排查；

# 语法错误
1a =10
1
2

  Cell In[11], line 2
    1a =10
    ^
SyntaxError: invalid decimal literal
1
2
3
4

# 语法错误
a = 10
 b = 10
1
2
3

  Cell In[12], line 3
    b = 10
    ^
IndentationError: unexpected indent
1
2
3
4

异常：

程序运行出错，Python解释器进行提示，定位代码位置进行修改；
运行环境问题，例如：内存不足，网络错误等

3.1 异常处理

3.1.1 try…except

作用：捕获指定的异常；

基本语法：

try:
    try_suite
except Exception as e:
    except_suite
1
2
3
4

Exception：指定捕获的异常类型，如果设置捕获异常与触发异常不一致，不能捕获；

捕获多种异常：

try:
    try_suite
except Exception1 as e:
    except_suite1
except Exception2 as e:
    except_suite2
1
2
3
4
5
6

try:
    print(abc)
except Exception as e:
    print('error',e)
print("abc")
1
2
3
4
5

error name 'abc' is not defined
abc
1
2

try:
    print(abc)
except ValueError as e:
    print('ValueError:',e)
print("abc")
1
2
3
4
5

---------------------------------------------------------------------------

NameError                                 Traceback (most recent call last)

Cell In[18], line 2
      1 try:
----> 2     print(abc)
      3 except ValueError as e:
      4     print('ValueError:',e)


NameError: name 'abc' is not defined
1
2
3
4
5
6
7
8
9
10
11
12

try:
    print(abc)
except ValueError as e:
    print('ValueError:',e)
except NameError as e:
    print('NameError:',e)
print("abc")
1
2
3
4
5
6
7

NameError: name 'abc' is not defined
abc
1
2

try:
    int("abc")
    print(abc)
except ValueError as e:
    print('ValueError:',e)
except NameError as e:
    print('NameError:',e)
print("abc")
1
2
3
4
5
6
7
8

ValueError: invalid literal for int() with base 10: 'abc'
abc
1
2

3.1.2 try…finally

作用：不管是否捕获异常，程序都会执行finally中的语句；

使用场景：释放资源等；

基本语法：

try:
    try_suite
except Exception as e:
    except_suite
finally:
    pass
1
2
3
4
5
6

try:
    print('test')
    l = []
    print(l[10])
except ValueError as e:
    print('ValueError:',e)
except NameError as e:
    print('NameError:',e)
finally:
    print("go to finally")
1
2
3
4
5
6
7
8
9
10

test
go to finally



---------------------------------------------------------------------------

IndexError                                Traceback (most recent call last)

Cell In[21], line 4
      2     print('test')
      3     l = []
----> 4     print(l[10])
      5 except ValueError as e:
      6     print('ValueError:',e)


IndexError: list index out of range
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

如果程序触发异常，同时，except捕获异常，这时，先执行except中的代码，再去执行finally
如果程序触发异常，同时except没有捕获异常，这是先执行tinal1y语句，然后将异常抛给python解释器

while True:
    msg = input('输入：')
    if msg == 'q':
        break
    try:
        num = int(msg)
        print(num)
    except Exception as e:
        print('erro:',e)
1
2
3
4
5
6
7
8
9

输入： abc


erro: invalid literal for int() with base 10: 'abc'


输入： 10


10


输入： q
1
2
3
4
5
6
7
8
9
10
11
12
13

3.2 raise与assert语句

raise与assert语句，用于主动产生异常；

例如：

参数检查；
程序执行中逻辑错误，主动抛出异常；

3.2.1 raise语句

raise语句：检查程序异常，主动抛出异常；

基本语法：

raise Exception(args)
raise NameError(‘value not define’)
1
2

raise ValueError('name error')
1

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

Cell In[29], line 1
----> 1 raise ValueError('name error')


ValueError: name error
1
2
3
4
5
6
7
8
9

3.2.2 assert语句

assert语句：判断表达式结果是否为真，如果不为真，抛出AssertError异常；

基本语法：

assert expression [,args]
1

def my_add(x,y):
    assert isinstance(x, int),"x must be int"
    assert isinstance(y, int),"y must be int"
    return x + y

my_add(1,2)
1
2
3
4
5
6

3
1

my_add(1, "2")
1

---------------------------------------------------------------------------

AssertionError                            Traceback (most recent call last)

Cell In[27], line 1
----> 1 my_add(1, "2")


Cell In[26], line 3, in my_add(x, y)
      1 def my_add(x,y):
      2     assert isinstance(x, int),"x must be int"
----> 3     assert isinstance(y, int),"y must be int"
      4     return x + y


AssertionError: y must be int
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

3.3 自定义异常类

自定义异常类注意点：

必须继承Exception类
通过raise语句主动触发

class Net404Error(Exception):
    def __init__(self):
        args = ("访问连接不存在", "404")
        super().__init__(*args)

net_error_404 = Net404Error()
1
2
3
4
5
6

raise net_error_404
1

---------------------------------------------------------------------------

Net404Error                               Traceback (most recent call last)

Cell In[33], line 1
----> 1 raise net_error_404


Net404Error: ('访问连接不存在', '404')
1
2
3
4
5
6
7
8
9

3.4 with/as 语句

with/as：操作上下文管理器（context manager），达到自动分配且释放资源目标；

3.4.1 with/as应用

基本语法：

with context as var:
    with_suite
1
2

注意点：context对象必须支持上下文协议

使用场景：打开文件，忘记关闭；

文件操作：

fpath = r'D:\study\code\jupyter\DATA\csv_learn\2017_data.csv'
with open(fpath) as f:
    pass
print("f closed:", f.closed)
1
2
3
4

f closed: True
1

3.4.2 上下文管理

上下文管理理解：

支持__enter__()和__exit__()方法
__enter__()：进入上下文,设置as var,var接收该方法返回值
__exit__()：退出上下文

class TestContext:
    def __enter__(self):
        print("call __enter__")
        return self
    def __exit__(self, exc_type, exc_val, exc_tb):
        print("call __exit__")
with TestContext() as tc:
    print(tc)
1
2
3
4
5
6
7
8

call __enter__
<__main__.TestContext object at 0x00000188DDEB31D0>
call __exit__
1
2
3

tc
1

<__main__.TestContext at 0x188ddeb31d0>
1

4 正则表达式

4.1 主要内容

re模块及相关方法
- 查找
- 切分
- 替换
正则表达式语法
- 文本匹配
- 次数匹配
- 边界匹配
- 分组
- 特殊匹配

学习目标：掌握一种对文本处理的一种方式

4.2 re模块

正则表达式（Regular Expression）：是用于描述一组字符串特征的模式，用来匹配特定的字符串。

应用场景：

验证，例如：对字符串按照设置规则进行检验，比如：用户名，密码格式；
查找，例如：在文本中查找指定规则字符串；
替换，例如：将指定的文本替换为新文本；
切分，例如：按照指定的分隔符对文本进行切分；

4.2.1 re详解

#需求：匹配以数字开头的字符串,以match方法进行匹配
import re
s1 = '001_sun'
s2 = 'qimiao'
#\d表示匹配任意数字
ma = re.match(r'\d',s1)
print(ma)
1
2
3
4
5
6
7

<re.Match object; span=(0, 1), match='0'>
1

ma = re.match(r'\d',s2)
print(ma)
1
2

None
1

import re

pattern = r'\d+'
string = '123abc456' 

'''
re.match(pattern, string):
尝试从字符串开头匹配正则表达式,如果匹配成功,返回一个匹配对象;如果匹配失败,返回None。
'''
match = re.match(pattern, string)
print(match) # 打印出匹配对象
1
2
3
4
5
6
7
8
9
10
11

<re.Match object; span=(0, 3), match='123'>
1

'''
re.search(pattern, string)
扫描整个字符串,找到第一个成功的匹配然后返回匹配对象。
'''
search = re.search(pattern, string)
print(search)
1
2
3
4
5
6

<re.Match object; span=(0, 3), match='123'>
1

'''
re.findall(pattern, string)
找到字符串中所有匹配正则表达式的子串,返回结果列表。
'''
results = re.findall(pattern, string)
print(results)
1
2
3
4
5
6

['123', '456']
1

'''
re.split(pattern, string)
根据匹配进行分割字符串,返回分割后子串列表。
'''
results = re.split(pattern, string)
print(results)
1
2
3
4
5
6

['', 'abc', '']
1

'''
re.sub(pattern, repl, string)
使用repl替换字符串中匹配正则表达式的部分,返回替换后的字符串。
'''
new_string = re.sub(pattern, '*NUMBER*', string)
print(new_string)
1
2
3
4
5
6

*NUMBER*abc*NUMBER*
1

4.2.2 Match

import re
s1 = "001_sun"
#\d表示匹配任意数字
ma = re.match(r'\d', s1)
print("ma:",ma)
#m.group() 匹配的字符串
print("ma.group:", ma.group())
#m.span() 匹配索引开始结束组成元组
print("ma.span:", ma.span())
#m.start()/m.end() 匹配开始和结束时的索引
print("ma.start:%d, ma.end:%d"%(ma.start(), ma.end()))
1
2
3
4
5
6
7
8
9
10
11

ma: <re.Match object; span=(0, 1), match='0'>
ma.group: 0
ma.span: (0, 1)
ma.start:0, ma.end:1
1
2
3
4

4.2.3 compile对象

re.compile用于将字符串形式的正则表达式编译为Pattern对象，可以使用Pattern对象种方法完成匹配查找等操作；

应用场景：如果在循环中进行重复的操作，推荐先将正则表达式转成Pattern对象；

re_cmp = re.compile(r'\d')
ma = re_cmp.match("0123")
print(ma)
1
2
3

<re.Match object; span=(0, 1), match='0'>
1

4.3 正则表达式

4.3.1 字符匹配

需求：

字符串以大写字母开头；
字符串以数字开头；
字符串以数字或者小写字母开头；
字符串第一个字符位数字，第二个字符为小写字符；
字符串以ABCDE中某个字符开头；

import re
s1 = "Python"
s2 = "15011345578"
s3 = "AB_test"
s4 = "test"
1
2
3
4
5

#字符串以大写字母开头
re.match(r'[A-Z]', s1)
1
2

<re.Match object; span=(0, 1), match='P'>
1

#字符串以数字开头
re.match(r'\d', s2)
1
2

<re.Match object; span=(0, 1), match='1'>
1

#字符串以数字或者小写字母开头
re.match(r'[0-9a-z]', s4)
1
2

<re.Match object; span=(0, 1), match='t'>
1

s5 = "1aabc"
#字符串第一个字符位数字，第二个字符为小写字符
re.match('\d[a-z]', s5)
1
2
3

<re.Match object; span=(0, 2), match='1a'>
1

#字符串以ABCDE中某个字符开头
re.match(r'[ABCDE]', s3)
1
2

<re.Match object; span=(0, 1), match='A'>
1

4.3.2 匹配次数

需求：

字符串开头以小写字符+数字或数字开头；
判断100以内的有效数字字符串；
有效的QQ号，长度6到15位；

#* 匹配前一个内容0次或者无限次
s0 = 'c'
s1 = "AAAc"
print(re.match(r'A*', s1))
print(re.match(r'A*', s0))
1
2
3
4
5

<re.Match object; span=(0, 3), match='AAA'>
<re.Match object; span=(0, 0), match=''>
1
2

#+ 匹配前一个内容一次或者无限次
s2 = "AAc"
print(re.match(r'A+', s2))
print(re.match(r'A+', s0))
1
2
3
4

<re.Match object; span=(0, 2), match='AA'>
None
1
2

#？ 匹配前一个内容一次或者0次
s3 = '1ab'
print(re.match(r'\d?', s3))
print(re.match(r'\d?', s0))
1
2
3
4

<re.Match object; span=(0, 1), match='1'>
<re.Match object; span=(0, 0), match=''>
1
2

#*? 尽可能少匹配，最少0次
s4 = "AAC"
re.match(r'A*?', s4)
1
2
3

<re.Match object; span=(0, 0), match=''>
1

#+? 尽可能少匹配，最少1次
s4 = "AAC"
re.match(r'A+?', s4)
1
2
3

<re.Match object; span=(0, 1), match='A'>
1

#{m,n} 匹配前一个内容m到n次
s5 = "123456abc"
re.match(r'\d{3,5}', s5)
1
2
3

<re.Match object; span=(0, 5), match='12345'>
1

s6 = "my age is 10cm"
ma = re.search(r'\d+', s6)
ma.group()
1
2
3

'10'
1

#字符串开头以小写字符+数字或数字开头
s7 = 'a1abc'
re.match(r'[a-z]?\d', s7)
1
2
3

<re.Match object; span=(0, 2), match='a1'>
1

#判断100以内的有效数字字符串；0-99
s8 = '10'
s8_1 = '0'
s8_2 = '100'
print(re.match(r'[1-9]?\d$', s8))
print(re.match(r'[1-9]?\d$', s8_1))
print(re.match(r'[1-9]?\d$', s8_2))
1
2
3
4
5
6
7

<re.Match object; span=(0, 2), match='10'>
<re.Match object; span=(0, 1), match='0'>
None
1
2
3

#有效的QQ号，长度6到15位
s9 = '123458888888'
re.match(r'\d{5,9}', s9)
1
2
3

<re.Match object; span=(0, 9), match='123458888'>
1

4.3.3边界匹配

需求：

匹配有效的邮箱，邮箱格式：邮箱名：由数字，字母，下划线组成,长度6~15，后缀：@xxx.com；
找到以t结尾的单词；
找到以t开头的单词；

s1 = 'AAAAc'
# $匹配以该格式为结尾
print(re.match(r'A+',s1))
print(re.match(r'A+$',s1))
print(re.match(r'A+c$',s1))
1
2
3
4
5

<re.Match object; span=(0, 4), match='AAAA'>
None
<re.Match object; span=(0, 5), match='AAAAc'>
1
2
3

#匹配有效的邮箱，邮箱格式：邮箱名：由数字，字母，下划线组成,长度6~15，后缀：@xxx.com；
mail = 'testbcd@qq.com'
re.match(r'[\da-zA-Z_]{6,15}@qq.com$', mail)
1
2
3

<re.Match object; span=(0, 14), match='testbcd@qq.com'>
1

#找到以t结尾的单词；
s = "where what hat the this that thtot"
#\w 表示匹配字母、数字和下划线，等价于字符集:[A-Za-z0-9_]
#\b 表示匹配单词边界
print(re.findall(r'\w+?t',s))
print(re.findall(r'\w+?t\b',s))
re.findall(r't\w+?\b',s)
1
2
3
4
5
6
7

['what', 'hat', 'that', 'tht', 'ot']
['what', 'hat', 'that', 'thtot']





['the', 'this', 'that', 'thtot']
1
2
3
4
5
6
7
8

4.3.4 分组匹配

需求：

匹配100内的有效数字字符串(0~99)；
给定字符串：“apple:8, pear:20, banana:10”，提取文本与数字;
提取html文本中所有的url;
文本开头与结尾为相同的数字；

#匹配100内的有效数字字符串(0~99)；
snum = '100'
snum2 = '99'
# |匹配左右任意一个表达式
print(re.match(r'\d$|[1-9]\d$', snum))
print(re.match(r'\d$|[1-9]\d$', snum2))
1
2
3
4
5
6

None
<re.Match object; span=(0, 2), match='99'>
1
2

items = ["01", "100", "10", "9", "99"]
re_cmp = re.compile(r"^\d$|[1-9]\d$")
item = "99"
for item in items:
    ma = re_cmp.match(item)
    print(ma)
1
2
3
4
5
6

None
None
<re.Match object; span=(0, 2), match='10'>
<re.Match object; span=(0, 1), match='9'>
<re.Match object; span=(0, 2), match='99'>
1
2
3
4
5

#给定字符串："apple:8, pear:20, banana:10"，提取文本与数字;
s = "apple:8, pear:20, banana:10"
#()进行分组
print(re.findall(r'[a-z]+:\d+', s))
print(re.findall(r'([a-z]+):(\d+)', s))
dict(re.findall(r'([a-z]+):(\d+)', s))
1
2
3
4
5
6

['apple:8', 'pear:20', 'banana:10']
[('apple', '8'), ('pear', '20'), ('banana', '10')]





{'apple': '8', 'pear': '20', 'banana': '10'}
1
2
3
4
5
6
7
8

html = """<a href="https://movie.douban.com/subject/6786002/"><img width="100" alt="触不可及" src="https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1454261925.webp" class=""></a>"""
#.*? 表示匹配任意数量的任意字符，但是尽量少匹配，也就是非贪婪模式。这样可以避免匹配到多个双引号之间的内容。
re.findall(r'"(https:.*?)"', html)

1
2
3
4

['https://movie.douban.com/subject/6786002/',
 'https://img9.doubanio.com/view/photo/s_ratio_poster/public/p1454261925.webp']
1
2

#文本开头与结尾为相同的数字；
text = '1021'
#\1 对分组1的引用
re.match(r'(\d).*?(\1)$', text)
1
2
3
4

<re.Match object; span=(0, 4), match='1021'>
1

#使用分组索引
texts = ['101', "2223", '1omyhat', '5abc6']
for text in texts:
    print(re.match(r'(\d).*?(\1)', text))
1
2
3
4

<re.Match object; span=(0, 3), match='101'>
<re.Match object; span=(0, 2), match='22'>
None
None
1
2
3
4

#使用别名
text = "1234541"
ma = re.match(r'(?P<start>.*).*?(?P=start)', text)
ma.groupdict()

1
2
3
4
5

{'start': '1'}
1

4.3.5 split与sub方法

split切分

split:按照规则对文本切分，返回列表；

需求：

给定英文句子，统计单词的数量；
给定文本，将"python/c\C++/Java/Php/Nodejs",切分成编程语言列表；

import re
s = "When someone walk out your life, let them. They are just making more room for someone else better to walk in."
words = re.split(r'\W', s)
words = [word for word in words if word.strip()]
words
1
2
3
4
5

['When',
 'someone',
 'walk',
 'out',
 'your',
 'life',
 'let',
 'them',
 'They',
 'are',
 'just',
 'making',
 'more',
 'room',
 'for',
 'someone',
 'else',
 'better',
 'to',
 'walk',
 'in']
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

len(words)
1

21
1

s = "python/c\C++/Java/Php/Nodejs"
# \ 本身有转义的意思，要匹配\需要\\完成
re.split(r'[/\\]', s)
1
2
3

['python', 'c', 'C++', 'Java', 'Php', 'Nodejs']
1

sub-替换

函数原型：

re.sub(pattern, repl, string, count=0, flags=0)
1

主要参数：

pattern：匹配内容；
repl：替换值，字符串或者函数，若为函数，替换为函数的返回字符串；
string：替换字符串；
1
2
3

需求：

将所有的数字替换成4个*;
给定绩效文本，大于等于6，替换为"A", 否则替换为"B";
给定多个运动员三次运动成绩，只保留最大值；

#s1 = "name:sun, pwd:123456, name:zhang,pwd:667788"
s1 = "name:sun, pwd:123456, name:zhang,pwd:667788"
re.sub(r'\d+', "****", s1)
1
2
3

'name:sun, pwd:****, name:zhang,pwd:****'
1

#给定绩效文本，大于等于6，替换为"A", 否则替换为"B";
def replace_ab(ma):
    value = ma.group()
    value = int(value)
    if value >= 6:
        return "A"
    return "B"
s2 = "sun:5, li:10, zhao:7, gao:8, wang:5"
re.sub(r'\d+', replace_ab, s2)
1
2
3
4
5
6
7
8
9

'sun:B, li:A, zhao:A, gao:A, wang:B'
1

#给定多个运动员三次运动成绩，只保留最大值；
def replace_max(ma):
    value = ma.group()
    #print('value:',value)
    values = value.split(',')
    #print('values:',values)
    values = [float(value) for value in values if value.strip()]
    max_val = max(values)
    
    return str(max_val)
s3 = "谷爱凌:9.8,9.7,9.6,高梨沙罗:9.88,9.6,9.7"
re.sub(r'[\d,\.]+', replace_max, s3)
1
2
3
4
5
6
7
8
9
10
11
12

'谷爱凌:9.8高梨沙罗:9.88'
1

4.3.6 练习

匹配xml
xml语法：

<tag>内容</tag>
1

s = '<li>tushu</li>'
re.match(r'<(.*?)>.+?</\1>', s)
1
2

<re.Match object; span=(0, 14), match='<li>tushu</li>'>
1

ma = re.match(r'<(?P<tag>.*?)>.+?</(?P=tag)>', s)
print(ma.groups())
print(ma.groupdict())
1
2
3

('li',)
{'tag': 'li'}
1
2

s = '<li>xxx</li>'
re.match(r'<([\w]+)>.+</\1>', s)
1
2

<re.Match object; span=(0, 12), match='<li>xxx</li>'>
1

提取src链接地址

html = '<img class="main_img" data-imgurl="https://ss0.bdstatic.com/0.jpg" src="https://ss0.bdstatic.com/=0.jpg" style="background-color: rgb(182, 173, 173); width: 263px; height: 164.495px;'
re.findall(r'src="(http.*?)"', html)
1
2

['https://ss0.bdstatic.com/=0.jpg']
1

统计th开头单词个数

s = 'that this,theme father/this teeth'
list = re.findall(r'\bth[a-zA-Z]*?\b', s)
print(f'{list},{len(list)}')
1
2
3

['that', 'this', 'theme', 'this'],4
1

提取所有数字

info = 'apple:21, banana:8, pear:7'
result = re.findall(r'\d+', info)
result
1
2
3

['21', '8', '7']
1

统计单词数量

info = 'Things turned out quite nicely after four years of hard work in college.With a GPA of 3.9,I currently rank the top 3% among 540 peers in my grade.'
len(re.split(r'\W', info))
1
2

33
1

不及格成绩替换为xx

scores = '90,100,66,77,33,80,27'
def replace_faild(ma):
    values = ma.group()
    v = int(values)
    if v < 60:
        return 'xx'
    return values

re.sub(r'\d+', replace_faild, scores)
1
2
3
4
5
6
7
8
9

'90,100,66,77,xx,80,xx'
1

匹配有效的163邮箱

#规则：邮箱以字母开头，由下划线，数字，字母组成，长度8~13，并以@163.com结尾；
mail = 'qimao1234@163.com'
mail_wrong = 'a123456789abcd@163.com'
print(re.match(r'[a-zA-Z]\w{7,12}@163.com$',mail))
print(re.match(r'[a-zA-Z]\w{7,12}@163.com$',mail_wrong))
1
2
3
4
5

<re.Match object; span=(0, 17), match='qimao1234@163.com'>
None
1
2

re.I

#统计th开头单词，不区分大小写
s = 'This that the who'
print(re.findall(r'th[a-zA-Z]*', s, flags=re.I))
print(re.findall(r'th[a-zA-Z]*', s))
1
2
3
4

['This', 'that', 'the']
['that', 'the']
1
2

re.M

#多行匹配，统计代码中函数数量
code = '''
def func1():
    pass
 
Def func2():
    pass
class t:
    def func():
    pass
'''
print(re.findall(r'^def ', code))
print(re.findall(r'^def ', code, flags=re.M))
print(re.findall(r'^def ', code, flags=re.M | re.I))
1
2
3
4
5
6
7
8
9
10
11
12
13
14

[]
['def ']
['def ', 'Def ']
1
2
3

5 pymysql模详解与应用

操作流程：

连接数据库；
创建游标；
执行sql语句：增删改查；
提交；
关闭数据库；

5.1 连接数据库

import pymysql
#链接数据库
db = pymysql.connect(host = "localhost",user="root",password = "",database="test")
1
2
3

config = {
    'user':'root', #用户名
    'password':'', #密码
    'host':'localhost', #mysql服务地址
    'port':3306, #端口,默认3306
    'database':'test' #数据库名字，test
}
db = pymysql.connect(**config)
1
2
3
4
5
6
7
8

5.2 获取游标

#获取游标
cursor = db.cursor()
1
2

5.3 执行sql语句

#查看表名
f = cursor.execute("show tables;")
#读取所有数据
data = cursor.fetchall()
#输出数据
for item in data:
    print(item)
1
2
3
4
5
6
7

('user_info',)
1

5.4 插入数据

#执行sql语句,插入一条数据
sql = 'insert into user_info (user_name, user_id, channel) values(%s,%s,%s)'
#插入一条数据
cursor.execute(sql, ('何同学', "10001", "B站"))
#插入多条数据
cursor.executemany(sql, [('张同学', "10002", "抖音"),('奇猫', "10003", "抖音")])
db.commit()
1
2
3
4
5
6
7

5.5 查询数据

sql = 'select * from user_info'
cursor.execute(sql)
1
2

3
1

#读取所有数据
data = cursor.fetchall()
#打印数据
for item in data:
    print(item)
1
2
3
4
5

('何同学', '10001', 'B站')
('张同学', '10002', '抖音')
('奇猫', '10003', '抖音')
1
2
3

5.6 关闭连接

cursor.close()
#关闭连接
db.close()
1
2
3

6 进程与线程

6.1 进程详解与应用

进程：程序运行的实例，执行的过程，它是系统调度与资源分配基本单元；

场景：

一个手机应用：微信，抖音，浏览器，淘宝，游戏等；
一个PC应用：浏览器，办公软件，游戏等；

6.1.1 进程相关知识

进程的ID：程序运行的唯一标识；

Python中获取进程ID方式：

os.getpid():获取当前进程ID
os.getppid()：获取当前父进程ID
1
2

Python中进程相关模块：multiprocessing

import os
# 获取该进程id
os.getpid()
1
2
3

10588
1

# 获取父进程
os.getppid()
1
2

3712
1

6.1.2 创建进程

#导入模块
import multiprocessing
import os
#定义子进程函数：
def func(*args, **kwargs):
    print("subProcess pid:%d ppid:%d"%(os.getpid(), os.getppid()))

if __name__ == "__main__":
    #创建进程对象
    p = multiprocessing.Process(target=func)
    #创建进程，并执行进程函数
    p.start()
    #等待子进程结束
    p.join()
    print("main process pid:%d"%os.getpid())
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

main process pid:10588
1

6.1.3 父子进程理解

子进程是父进程的拷贝，子进程继承父进程的所有资源；

import multiprocessing
import os
import time
tmp = 10
def work():
    global tmp
    tmp = 100
    print('work pid:', os.getpid(), os.getppid())
    print("tmp in work:", tmp)

if __name__ == '__main__':
    # 创建进程
    p = multiprocessing.Process(target=work)
    # 运行进程
    p.start()
    print("call main process pid:", os.getpid())
    # 等待程序结束
    p.join()
    #tmp的输出值
    print("tmp in main:", tmp)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

call main process pid: 10588
tmp in main: 10
1
2

输出：

call main process pid: 15636
work pid: 5708 15636
tmp in work: 100    
tmp in main: 10
1
2
3
4

6.1.4 进程应用场景

使用场景：并行计算，某个函数执行时间过长，阻塞等；

一个例子：某函数，执行过程中休眠1秒，执行6次，使用单进程与多进程调用，对比耗时；

import multiprocessing
import os
import time
tmp = 10

def work():
    print("call work")
    time.sleep(1)
if __name__ == '__main__':
    n = 6
    plist = []
    ts = time.time()
    #if内使用多进程，else不使用多进程
    if False:
        for i in range(n):
            p = multiprocessing.Process(target=work)
            p.start()
            plist.append(p)
        for i in range(n):
            p.join()
    else:
        for i in range(n):
            work()
    print("run time:%.2f"%(time.time() - ts))

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

call work
call work
call work
call work
call work
call work
run time:6.00
1
2
3
4
5
6
7

使用多进程：

call work
call work
call work
call work
call work
call work
run time:1.14
1
2
3
4
5
6
7

不使用多进程：

call work
call work
call work
call work
call work
call work
run time:6.01
1
2
3
4
5
6
7

6.1.5 进程间通信

常用方法：

消息队列：from multiprocessing import Queue
共享内存：from multiprocessing import Value,Array

import multiprocessing
import os
import time

from multiprocessing import Queue

def work(msgq):
    while True:
        msg = msgq.get()
        if msg == "Q":
            break
        else:
            print(f"pid:{os.getpid()} recv msg:{msg}")

if __name__ == '__main__':
    msgq = Queue()
    list_p = []
    for i in range(1, 10):
        p = multiprocessing.Process(target=work, args=(msgq,))
        list_p.append(p)
        p.start()
    
    #发送不同的消息
    for i in range(1, 10):
        msgq.put("Test%d"%i)
    #发出退出命令
    for p in list_p:
        msgq.put("Q")
    #等待进程退出
    for p in list_p:
        p.join()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

结果：

pid:15464 recv msg:Test1
pid:15464 recv msg:Test2
pid:7124 recv msg:Test3
pid:7124 recv msg:Test5
pid:7124 recv msg:Test6
pid:7124 recv msg:Test7
pid:7124 recv msg:Test8
pid:7124 recv msg:Test9
pid:15464 recv msg:Test4
1
2
3
4
5
6
7
8
9

import multiprocessing
import os
import time

from multiprocessing import Queue

def work(msgq):
    while True:
        msg = msgq.get()
        time.sleep(0.5)
        if msg == "Q":
            break
        else:
            print(f"pid:{os.getpid()} recv msg:{msg}")

if __name__ == '__main__':
    msgq = Queue()
    list_p = []
    for i in range(1, 10):
        p = multiprocessing.Process(target=work, args=(msgq,))
        list_p.append(p)
        p.start()
    
    #发送不同的消息
    for i in range(1, 10):
        msgq.put("Test%d"%i)
    #发出退出命令
    for p in list_p:
        msgq.put("Q")
    #等待进程退出
    for p in list_p:
        p.join()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32

运行结果：

pid:9776 recv msg:Test1
pid:11560 recv msg:Test2
pid:4024 recv msg:Test3
pid:10828 recv msg:Test4
pid:7696 recv msg:Test5
pid:8292 recv msg:Test6
pid:2528 recv msg:Test7
pid:10152 recv msg:Test8
pid:14476 recv msg:Test9
1
2
3
4
5
6
7
8
9

加入sleep延迟后，程序可以按序接收

6.1.6 进程池

进程池：创建一定数量的进程，供用户调用；

进程池类：

from multiprocessing import Pool
1

基本实现过程：

from multiprocessing import Pool
#创建进程池对象，指定进程数量3
pool = Pool(processes = 3)
#添加任务与参数
pool.apply_async(func, (msg, ))
#停止添加
pool.close()#停止添加
#等待所有任务结束
pool.join()
1
2
3
4
5
6
7
8
9

---------------------------------------------------------------------------

NameError                                 Traceback (most recent call last)

Cell In[167], line 5
      3 pool = Pool(processes = 3)
      4 #添加任务与参数
----> 5 pool.apply_async(func, (msg, ))
      6 #停止添加
      7 pool.close()#停止添加


NameError: name 'msg' is not defined
1
2
3
4
5
6
7
8
9
10
11
12
13

Python中的进程池是一种用于并行处理的高级工具，通常用于同时执行多个函数或任务。它允许您管理一组工作进程，从而更有效地利用多核处理器。Python标准库中的multiprocessing模块提供了Pool类，它是一个常用的进程池实现。

以下是有关Python进程池的详细介绍：

导入模块：
```
import multiprocessing
1
```
首先，导入multiprocessing模块。
创建进程池：
```
pool = multiprocessing.Pool(processes=4)
1
```
使用multiprocessing.Pool类创建一个进程池。在这里，我们创建了一个最大进程数为4的进程池，这意味着最多同时运行4个进程。
提交任务：
```
result = pool.apply_async(function, (arg1, arg2))
1
```
使用apply_async方法将函数提交到进程池中执行。function是要执行的函数，(arg1, arg2)是函数的参数。此方法会返回一个AsyncResult对象，可以用来获取函数的结果。
获取结果：
```
result.get()
1
```
使用get()方法获取函数的结果。这个方法会阻塞主线程，直到进程池中的任务执行完毕并返回结果。
关闭进程池：
```
pool.close()
pool.join()
1
2
```
使用close()方法关闭进程池，然后使用join()方法等待所有任务完成。一旦进程池关闭，将不再接受新任务。

进程池示例：

下面是一个完整的进程池示例，演示如何使用进程池并行执行函数：

import multiprocessing

def square(x):
    return x * x

if __name__ == '__main__':
    pool = multiprocessing.Pool(processes=4)
    inputs = [1, 2, 3, 4, 5]

    results = [pool.apply_async(square, (x,)) for x in inputs]
    pool.close()
    pool.join()

    for result in results:
        print(result.get())
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

在此示例中，我们使用进程池并行计算了一组数字的平方。

进程池在多核处理器上执行多个任务时非常有用，因为它可以显著提高程序的性能。它简化了并行编程，并处理了底层的进程管理和调度，使得并行化变得更加容易。但请注意，使用进程池时要谨慎，确保不会创建过多的进程，以避免资源竞争和性能下降。

使用进程池统计文件数量：

from multiprocessing import Pool
import os
import time
from unittest import result

#统计文件行数
def countLine(fpath):
    linenum = 0
    if fpath.endswith('.py'):
        with open(fpath, encoding="utf-8") as f:
            linenum = len(f.readlines())
    return linenum

def sacndir(fpath, pools):
    result = []
    # 获取指定目录下所有文件
    for root, sundir, flist in os.walk(fpath):
        if flist:
            for fname in flist:
                # 判断是否为.py
                if fname.endswith('.py'):
                    # 拼接目录
                    path = os.path.join(root, fname)
                    #进程池添加任务
                    r = pools.apply_async(countLine, args=(path,))
                    #将结果保存到result中
                    result.append(r)
    #计算统计结果
    total= sum([r.get() for r in result])
    return total

if __name__ == "__main__":
    total = 0
    nums = 20
    src_dir = r'E:\vscode_dir\part_7\process\django'
    
    start_time = time.time()
    pools = Pool(processes=10)
    for i in range(nums):
        total += sacndir(src_dir, pools)
    
    #停止添加任务
    pools.close()
    #等待程序结束
    pools.join()
    end_time = time.time()
    #输出统计结果
    print("run time:%.2f, code total nums:%d"%(end_time-start_time, total))
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48

6.2 线程

6.2.1 多线程

线程：系统进行运算调度的最小单元，线程依赖与进程；

多线程：在一个进程中，启动多线程并发执行任务，线程之间全局资源可以共享；

进程与线程区别：

线程依赖于进程；
线程之间资源共享；
线程调度开销小于进程开销；

Python中多线程限制
GIL（Global Interpreter Lock）：实现CPython（Python解释器）时引入的一个概念，

GIL锁：实质是一个互斥锁(mutex);

GIL作用：防止多个线程同时去执行字节码，降低执行效率；

GIL问题：在多核CPU中，Python的多线程无法发挥其作用，降低任务执行效率；

6.2.2 多线程相关模块及应用

import threading
#线程函数
def thread_func(*args, **kwargs):
    print("in thread func")

def main():
    #创建线程对象
    t = threading.Thread(target=thread_func, args=())
    #创建线程，启动线程函数
    t.start()
    print("in main func")
    #等待线程结束
    t.join()

if __name__ == "__main__":
    main()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

in thread func
in main func
1
2

# 多线程应用
import threading
import time

g_value = 1

#线程函数
def thread_func(*args, **kwargs):
    global g_value
    g_value += 1
    #休眠1秒
    time.sleep(1)
    #获取线程ID
    ident = threading.get_ident()
    #获取当前线程
    t = threading.current_thread()
    #获取线程名称与ident
    print("name:%s ident:%d"%(t.getName(), t.ident))

def main():
    thread_num = 5
    thread_list = []
    
    #创建线程对象
    for i in range(thread_num):
        name = "thread_%d"%i
        t = threading.Thread(name=name, target=thread_func, args=())
        thread_list.append(t)
        t.start()
    
    #等待线程结束
    for t in thread_list:
        t.join()

if __name__ == "__main__":
    main()
    print("g_value:", g_value)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37

name:thread_0 ident:16832
name:thread_4 ident:16416
name:thread_2 ident:3236
name:thread_1 ident:17212
name:thread_3 ident:10840
g_value: 6
1
2
3
4
5
6

从输出结果中可以看到：

线程之间执行是随机的；
线程之间资源共享(g_value的值发生变化)；

6.2.3 全局变量操作问题

需求：

定义变量a=10000，
主线程对变量进行加1操作，执行50W次；
创建子线程，同时对变量进行减1操作，执行50W次；
最后查看该变量的值

from threading import Thread

g_value = 10000
nums = 500000

def sub_func():
    # 减1操作
    global g_value
    for i in range(nums):
        g_value -= 1

def add_func():
    # 加1操作
    global g_value
    for i in range(nums):
        g_value += 1

if __name__ == "__main__":
    # 创建线程对象
    t = Thread(target=sub_func, name='test')
    # 创建线程运行程序
    t.start()
    add_func()
    # 等待线程执行完成
    t.join()
    print(f'g_value={g_value}')

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

g_value=10000
1

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/article/detail/56560

【Python基础】常用模块（模块导入、包、错误和异常、正则表达式、pymysql、进程与线程）

文章目录

常用模块

1 模块导入

1.1 导入方式

1.2 导入过程

1.3 导入搜索路径

1.4 __name__变量

2 包

2.1 包的概念

2.2 相对导入与绝对导入

3 错误和异常

3.1 异常处理

3.1.1 try…except

3.1.2 try…finally

3.2 raise与assert语句

3.2.1 raise语句

3.2.2 assert语句

3.3 自定义异常类

3.4 with/as 语句

3.4.1 with/as应用

3.4.2 上下文管理

4 正则表达式

4.1 主要内容

4.2 re模块

4.2.1 re详解

4.2.2 Match

4.2.3 compile对象

4.3 正则表达式

4.3.1 字符匹配

4.3.2 匹配次数

4.3.3边界匹配

4.3.4 分组匹配

4.3.5 split与sub方法

4.3.6 练习

5 pymysql模详解与应用

5.1 连接数据库

5.2 获取游标

5.3 执行sql语句

5.4 插入数据

5.5 查询数据

5.6 关闭连接

6 进程与线程

6.1 进程详解与应用

6.1.1 进程相关知识

6.1.2 创建进程

6.1.3 父子进程理解

6.1.4 进程应用场景

6.1.5 进程间通信

6.1.6 进程池

6.2 线程

6.2.1 多线程

6.2.2 多线程相关模块及应用

6.2.3 全局变量操作问题

1.4 name变量