空白诗007

这个屌丝很懒，什么也没留下！

热门标签

1、OpenCV 基本知识及 DNN模块介绍（Python 接口）_opencv dnn

作者：空白诗007 | 2024-08-04 07:53:04

踩

opencv dnn

一、OpenCV 安装

1.1、通过 pip 安装

离线安装
- 从国内清华镜像网站下载安装包，然后在本地通过 pip install xxx 安装即可
- 官方版下载地址：https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple/opencv-python/
- 社区版下载地址：https://pypi.tuna.tsinghua.edu.cn/simple/opencv-contrib-python/
在线安装：
- pip install -i https://pypi.tuna.tsinghua.edu.cn/simple opencv-python
- pip install -i https://pypi.tuna.tsinghua.edu.cn/simple opencv-contrib-python

1.2、通过 conda 安装

conda install -c https://conda.anaconda.org/menpo opencv3
打开 ipython 测试一下
```
import cv2
print(cv2.__version__)
1
2
```

二、OpenCV 基础知识

2.1. 读取、显示和写入图像

import cv2
import imageio
import matplotlib.pyplot as plt

# 读取图像，第二个参数可以为1(默认读入彩图, 可省略), 0(以灰度图读入)
# 注意：第一个参数，文件名中不能包含中文
im = cv2.imread('empire.jpg', 1)   # 函数imread()返回图像为一个标准的 NumPy 数组
// hwc 排列的，所以如果裁剪的化要先给 y 后给 x
cropped_im = im[ny: ny + size, nx: nx + size, :]  # 注意：height 和 width 分别对用图像坐标系的 y 轴 和 x 轴
# 注意：caffe.io.load_image 返回值为 0-1 的 float 型数据，通道顺序为 RGB（相当于 cv2 执行　im[:,:,::-1]/255.0）
# 注意：imageio.imread(img_path) 返回 uint8 RGB HWC 格式的数据，保存图片使用 imageio.imwrite(save_path, img)

height, weight. channel = im.shape 
print height, weight. channel


# 显示图像，第一个参数是窗口的名字，其次才是我们的图像，窗口会自动调整为图像大小。
# 注意：im 必须为 BGR 格式，且一般将其转成 uint8 保存，以防 im 为 float32 的类型 
cv2.imshow('image', im.astype('uint8')) 
cv2.waitKey(0)  # 为防止图像一闪而过，无限期的等待键盘输入
cv2.destroyAllWindows()  # 关闭所有图像

# 保存图像(必须设置保存图像的路径和扩展名)
# 注意：im 必须为 BGR 格式，且一般将其转成 uint8 保存，以防 im 为 float32 的类型 
cv2.imwrite('result.png', im.astype('uint8'))

# 使用 plt 显示图像(可显示像素坐标及像素值)、保存图像
# 使用 plt 显示图像时，必须先把图像转换为 RGB 格式 
im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)  # cv2.COLOR_BGR2HSV & cv2.COLOR_HSV2BGR
# BGR2RGB 另解，im = im[:, :, ::-1]， TF 中可使用 im = tf.reverse(im, axis=[-1])
# swap color channel: im = im[:, :, (2, 1, 0)]

plt.imshow(im, cmap='gray', interpolation='bicubic')
plt.savefig('figpath.png', bbox_inches='tight')
plt.show()


-------------------------
# 中文路径下图像的读取和保存 
-------------------------

# 1、读取图片：注意 fromfile 中数据类型的指定，此处 1 不能像 imread 中那样省略
img_bgr = cv2.imdecode(np.fromfile(img_path, dtype=np.uint8), 1)

# 2、保存图片：[1] 为图像内容、[0] 为 True or False， 
cv2.imencode('.jpg', img_bgr)[1].tofile(os.path.join(img_save_path, img_name))
# 注意： '.jpg' 仅为图片编码方式，保存的 img_name 中仍需包含 '.jpg' 后缀  

-------------------------
# 创建一副纯白图像并保存 
-------------------------
img = np.zeros((2176, 4096), np.uint8) # h,w
img.fill(255) # 白色背景
cv2.imwrite('4k.jpg', img.astype('uint8'))
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54

在这里插入图片描述

2.2. 颜色空间转换

在OpenCV 中，图像不是按传统的 RGB 颜色通道，而是按 BGR 顺序（即 RGB 的倒序）存储的。读取图像时默认的是BGR，但是还有一些可用的转换函数。颜色空间的转换可以用函数cvtColor() 来实现。

# 1.使用opencv读取并创建灰度图像，按 BGR 顺序
im = cv2.imread('empire.jpg')
gray = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)

# 2.使用 matplotlib.image 读入并创建灰度图像，按 RGB 顺序
import matplotlib.image as mpl_img
im = mpl_img.imread('empire.jpg')
gray = cv2.cvtColor(im, cv2.COLOR_RGB2GRAY)

# Note: 注意1和2的区别在颜色转换代码
# 常用：cv2.COLOR_BGR2RGB、cv2.COLOR_GRAY2BGR、cv2.COLOR_BGR2HSV
# C++ 中 HWC 图像（BGR）在内存中的排列顺序(BGRBGRBGR...)-->C-->W-->H (C 通道的三个值 BGR 构成一个像素点)
for (i=0; i<h; i++)
	for (j=0, j<w; j++)
		for (k=0, k<c; k++)  

# C++ 中 CHW 图像（RGB）在内存中的排列顺序(RRR...GGG...BBB...)-->W-->H-->C (C 通道的三个值 RGB 构成一个像素点)
for (i=0; i<c; i++)
	for (j=0, j<h; j++)
		for (k=0, k<w; k++)

# HWC2CHW-->BGRBGRBGR2BBBGGGRRR
for (int c = 0; c < 3; c++)
{
    for (int hw = 0; hw < hw_p; hw++)
    {
        pInputChw[c * hw_p + hw] = pInputHwc[c + hw * 3];
    }
}


# PIL Image 转 BGR( PIL convert 方法不支持 BGR，只支持 "L", "RGB" and "CMYK)
from PIL import Image
import numpy as np

image = Image.open("rgb_image.jpg")  # 打开RGB图像
np_image = np.array(image)  # 将Image对象转换为NumPy数组
bgr_image = np_image[:, :, ::-1]  # 将RGB图像转换为BGR图像
bgr_image = Image.fromarray(bgr_image)  # 将NumPy数组转换回Image对象
bgr_image.save("bgr_image.jpg")  # 保存BGR图像

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41

在这里插入图片描述

2.3. 在图像上画直线、矩形、圆、多边形(曲线)

注意：传入的坐标必须为 int 型

画直线：cv2.line()
在这里插入图片描述

import cv2

# 读取图像，按 BGR 顺序
img = cv2.imread('empire.jpg')

# 传入图像、起点坐标、终点坐标、线的颜色(color)、线的厚度(thickness)
# color : Color of the shape. for BGR, pass it as a tuple, eg: (255,0,0) for blue. For grayscale, just pass the scalar value.
# thickness : if -1 is passed for closed figures like circles, it will fill the shape, default thickness = 1.
img = cv2.line(img, (0, 0), (511, 511), (255, 0, 0), 5)
1
2
3
4
5
6
7
8
9

画矩形：cv2.rectangle()
在这里插入图片描述

# 需要传入图像、左上角顶点坐标、右下角顶点坐标、颜色、线宽
img = cv2.rectangle(img, (384, 0), (510, 128), (0, 255, 0), 3)
1
2

画圆：cv2.circle()
在这里插入图片描述

# 需要传入图像、圆的中心点坐标、半径、颜色、线宽
img = cv2.circle(img, (447, 63), 3, (0, 0, 255), -1)
# If -1 is passed for closed figures like circles, it will fill the shape. default thickness = 1
1
2
3

画多边形(包括曲线)：cv2.polylines()
在这里插入图片描述

# 数组的数据类型必须为int32，若知道曲线方程，可以生成一堆点，就可以画出曲线来啦
pts = np.array([[10,5],[20,30],[70,20],[50,10]], np.int32)

# 第一个参数为-1, 表明这一维的长度(点的数量)是根据后面的维度的计算出来的
# 顶点个数 4，矩阵变成 4*1*2
pts = pts.reshape((-1,1,2))

# 如果第三个参数是False，我们得到的多边形是不闭合的(首尾不相连)
img = cv2.polylines(img, [pts], True, (0, 255, 255))
1
2
3
4
5
6
7
8
9

在图片上添加文字：cv2.putText()
在这里插入图片描述

det_box, det_conf, det_cls = plate_det(img_bgr)

# 第 3~6 个参数为：bottom-left corner where data starts、font size、color、thickness
label = ('background', 'plate')
det_out_info = "%s:%.3f" % (label[int(det_cls[i])], det_conf[i])
cv2.putText(img_bgr, det_out_info, (xmin, ymin-10), cv2.FONT_HERSHEY_PLAIN, 2, (0, 0, 255), 2)
1
2
3
4
5
6

在图片上画出检测的 bbox

# 1、bbox 及 英文 text　的绘制
PLATE_CLASSES2 = ("background", "plate")

# sp: save_path，注意要包含路径＋图片名及后缀
def draw_boxes(img, boxes, conf, cls, sp):
    for i in range(len(boxes)):
        p1 = (boxes[i][0], boxes[i][1])
        p2 = (boxes[i][2], boxes[i][3])
        cv2.rectangle(img, p1, p2, (0, 255, 0), 2)
        p3 = (max(p1[0], 15), max(p1[1]-10, 15))
        title = "%s:%.2f" % (PLATE_CLASSES2[int(cls[i])], conf[i])
        cv2.putText(img, title, p3, cv2.FONT_ITALIC, 0.7, (0, 0, 255), 2)
    cv2.imencode('.jpg', img)[1].tofile(sp)


# ２、bbox 及中文 text　的绘制
from PIL import Image, ImageDraw, ImageFont

def cv2_add_cn(img, text, left, top, text_color=(255, 0, 0), text_size=17):
    if isinstance(img, np.ndarray):  # 判断是否 OpenCV 图片类型
        img = Image.fromarray(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))

    # 使用 PIL 绘制文字
    # 字体的存放路径一般是： /usr/share/fonts/opentype/noto/，　可使用　locate *.ttc　查找
    draw = ImageDraw.Draw(img)
    font_text = ImageFont.truetype("/usr/share/fonts/opentype/noto/NotoSansCJK-Bold.ttc", text_size, encoding="utf-8")
    draw.text((left, top), text, text_color, font=font_text)

    return cv2.cvtColor(np.asarray(img), cv2.COLOR_RGB2BGR)
 
#  draw bbox&save the error img 
for i in range(len(plate_bbox)):
    xmin, ymin, xmax, ymax = plate_bbox[i]
    cv2.rectangle(img_bgr, (xmin - 5, ymin - 5), (xmax + 5, ymax + 5), (0, 255, 0), 2)  # in order to see better, extend 5 pixel
    img_bgr = cv2_add_cn(img_bgr, plate_num[i], max(xmin, 15), max(ymin - 30, 15), (255, 0, 0), 17)
    cv2.imencode('.jpg', img_bgr)[1].tofile(os.path.join('imgs/error_ori_imgs', img_name))
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36

2.4. 图像的基础操作

获取并修改像素值

import cv2
import numpy as np

img = cv2.imread('messi5.jpg')

px = img[100, 100]
print px
[57 63 68]

# accessing only blue pixel
blue = img[100, 100, 0]
print blue
57 

# modify the pixel
img[100, 100] = [255, 255, 255]
print img[100, 100]
[255 255 255]
# channel 2 所有值置为0 
img[:, :, 2] = 0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

获取图像属性

img = cv2.imread('messi5.jpg')

print img.shape
(960L, 1280L, 3L)
print img.size
3686400
print img.dtype
uint8
1
2
3
4
5
6
7
8

选取图像块

img = cv2.imread('messi5.jpg')

# select the ball and copy it to another region
ball = img[280:340, 330:390]  # 注意：340 和 390 取不到，所以切片取 bbox 时，右边界要 +1
img[273:333, 100:160] = ball
1
2
3
4
5

归一化到 $(- 1, 1)$

#***** 归一化到（-1，1） *****#
def processImage(imgs):
    """
        process images before feeding to CNNs
        imgs: N x 1 x W x H
    """
    imgs = imgs.astype(np.float32)
    for i, img in enumerate(imgs):
        imgs[i] = (img - 127.5) / 128
    return imgs
1
2
3
4
5
6
7
8
9
10
'运行

反色

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import cv2
import os
import numpy as np

def inverse_color(image):
	height, width = image.shape[:2]
	img = image.copy()
	for i in range(height):
		for j in range(width):
			img[i, j, :] = (255 - image[i, j, 0], 255 - image[i, j, 1], 255 - image[i, j, 2])
	return img

for img_name in os.listdir('test_img'):
	# 1、读取图片：注意 fromfile 中数据类型的指定，此处 1 不能像 imread 中那样省略
	img_bgr = cv2.imdecode(np.fromfile('test_img/' + img_name, dtype=np.uint8), 1)
	
	# 2、反色处理
	img_bgr = inverse_color(img_bgr)
	
	# 3、保存图片：[1] 为图像内容、[0] 为 True or False
	cv2.imencode('.jpg', img_bgr)[1].tofile('inverse_color/%s_inv.jpg' % img_name.split('.')[0])

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

这里写图片描述

2.5. 视频处理

帧：视频是由一系列图像构成的，这一系列图像被称为帧，帧是以固定的时间间隔(1000/FPS)从视频中获取的
帧率： 获取（播放）帧的速度称为帧速率（ FPS，Frames Per Second），单位通常使用“帧/秒”表示，代表在 1 秒内所出现的帧数，一般视频的帧率为 25 或 30
视频处理： 从视频中提取出独立的帧，使用图像处理的方法对其进行处理，然后再保存为视频就达到了处理视频的目的

(1)、cv2.VideoCapture 类

# 1、cv2.VideoCapture 类初始化（调用构造函数），既可以是摄像头也可以是本地视频，返回捕获的对象
vc = cv2.VideoCapture（"摄像头ID号"） # int 类型，默认值为 -1，表示随机选取一个摄像头；如果有多个摄像头，则用数字 0 表示第 1 个摄像头，用数字 1 表示第 2 个摄像头
vc = cv2.VideoCapture（"文件名"） # str 类型，文件路径+文件名

# 2、获取捕获对象（视频）的一些参数信息
fps = round(vc.get(cv2.CAP_PROP_FPS))  # 帧率，也可直接传入 5
frame_count = round(vc.get(cv2.CAP_PROP_FRAME_COUNT))  # 总帧数，也可直接传入 7
frame_width = round(vc.get(cv2.CAP_PROP_FRAME_WIDTH))  # 帧宽度，也可直接传入 3
frame_height = round(vc.get(cv2.CAP_PROP_FRAME_HEIGHT))  # 帧高度，也可直接传入 4

# 3、判断初始化是否成功，初始化成功返回 True，失败返回 False
vc_ret = vc.isOpened()

# 4、初始化成功后，就可以从摄像头或本地视频中不断捕获帧信息了
# 如果帧读取正确返回 True 和当前帧的内容（BGR）
# 如果读取到结尾，它的返回值就为 False
ret, frame = vc.read()

# 5、关闭摄像头或视频文件
vc.release()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

(2)、cv2.VideoWriter 类

# 1、指定视频的编解码格式
fourcc = cv2.VideoWriter_fourcc(*'MP4V')  # MPEG-4 编码类型，扩展名为 .mp4
fourcc = cv2.VideoWriter_fourcc(*"FLVI")  # 表示 Flash 视频，扩展名为 .flv
fourcc = cv2.VideoWriter_fourcc(*"I420")  # 表示未经压缩的 YUV 颜色编码格式，扩展名为 .avi（兼容性好，文件较大）
fourcc = cv2.VideoWriter_fourcc(*'XVID')  # MPEG-4 编码类型，扩展名为 .avi（视频的大小为平均值，推荐）
fourcc = cv2.VideoWriter_fourcc(*"X264/X265")  # gives very small size video 可保存 .mp4 或 .mkv 格式



# 2、初始化一个 VideoWriter 对象
vw = cv2.VideoWriter(filename, fourcc, fps, frameSize, isColor)
- filename：输出视频的存放路径和文件名，如果指定文件已存在，则会覆盖
- fourcc：输出视频的编解码格式，一般跟读入视频格式相同
- fps：输出视频的帧率，一般为 25 或 30
- frameSize：输出视频每一帧的（长，宽），一般通过读入的视频来确定
- isColor：表示是否为彩色图像
vw = cv2.VideoWriter("./output/output.mp4", fourcc, 30, (frame_width, frame_height), True)


# 3、将每一帧图像写入视频（BGR 格式）
vw.write(frame)  # 把 frame 传入名为 output.mp4 的 out 对象内

# 4、释放 VideoWriter 对象
vw.release()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

(3)、视频读取并保存示例

在播放每一帧时,使用 cv2.waiKey() 设置适当的持续时间。如果设置的太低视频就会播放的非常快,如果设置的太高就会播放的很慢(你可以使用这种方法控制视频的播放速度)。通常情况下 25 毫秒就可以了

import cv2
import numpy as np

# 读入输入视频文件，返回捕获的对象
vc = cv2.VideoCapture('./input/test.mp4')

# 获取输入视频的各项参数
fps = round(vc.get(cv2.CAP_PROP_FPS))  # 帧率，也可直接传入 5
frame_count = round(vc.get(cv2.CAP_PROP_FRAME_COUNT))  # 总帧数，也可直接传入 7
frame_width = round(vc.get(cv2.CAP_PROP_FRAME_WIDTH))  # 帧宽度，也可直接传入 3
frame_height = round(vc.get(cv2.CAP_PROP_FRAME_HEIGHT))  # 帧高度，也可直接传入 4
print("the input video's fps is {}, total frame count is {}, frame width is {}, frame height is {}".
      format(fps, frame_count, frame_width, frame_height))

# 设置输出视频的编解码格式及其它参数
fourcc = cv2.VideoWriter_fourcc(*"mp4v")
vw = cv2.VideoWriter("./output/output.mp4", fourcc, fps, (frame_width, frame_height), True)

# 若输入视频打开成功，则循环读取每一帧
while vc.isOpened():
    ret, frame = vc.read()

    # 若每一帧读取成功，则进行如下视频处理
    if ret:
        cv2.flip(frame, 0)
        vw.write(frame)  # write the flipped frame
        cv2.imshow('out_video_show', frame)

        # cv2.waitKey(t)： 表示每一帧停留的时间 t，该参数的单位为 ms，一般设置为 int(1000/fps)
        # 若没有按键，会停留时间 t，返回 -1；按键则返回键值的 ascii 值，直接停止读取视频
        # 通过控制时间 t 可以进行逐帧率进行分析
        if cv2.waitKey(25) & 0xFF == ord('q'):
            break
    else:
        print('video read done!')
        break

# 释放资源
vw.release()
vc.release()
cv2.destroyAllWindows()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41

抽帧

import os
import cv2
import shutil

# 抽帧
def get_frame_from_video(video_name, interval):
    """
    Args:
        video_name:输入视频名字
        interval: 保存图片的帧率间隔
    Returns:
        None
    """

    # 保存图片的路径
    save_path = video_name.split('.mp4')[0] + '/'
    if not os.path.exists(save_path):
        os.makedirs(save_path)
        print('path of %s is build' % save_path)
    else:
        shutil.rmtree(save_path)
        os.makedirs(save_path)
        print('path of %s already exist and rebuild' % save_path)

    video_capture = cv2.VideoCapture(video_name)  # 读入视频文件
    fps = int(video_capture.get(5))  # 7 为总共的帧数，3 为 W，4 为 H
    print('fps:', fps)  # 帧率

    i = 0
    j = 0
    while True:
        success, frame = video_capture.read()
        if frame is None:
            print('video read done!')
            break

        if i % interval == 0:
            save_name = save_path + str(j) + '_' + str(i) + '.jpg'
            cv2.imwrite(save_name, frame)
            print('image %s is saved' % save_name)
            j += 1

        i += 1
        print("i is {}".format(i))


if __name__ == '__main__':
    get_frame_from_video(video_name='../video/test_traffic.mp4', interval=3)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49

三、OpenCV 中的 DNN 模块

3.1、DNN 简介

OpenCV 自 3.3 版本开始，加入了对深度学习网络的支持（即 DNN 模块），它支持主流的深度学习框架生成与导出模型的加载、推理。
DNN 模块支持多种网络模型格式，用户无需额外的进行网络模型的转换就可以直接使用，支持的网络结构涵盖了常用的目标分类，目标检测和图像分割等
DNN模块支持多种类型网络层，基本涵盖常见的网络运算需求

在这里插入图片描述

3.2、DNN 常用方法简介

导入：import cv2； from cv2 import dnn

(1)、dnn.blobFromImage、dnn.blobFromImages

在这里插入图片描述

import cv2
from cv2 import dnn

img_cv2 = cv2.imread("test.jpeg")
print("原图像大小: ", img_cv2.shape)  # 原图像大小: 960,640,3 

inWidth = 256
inHeight = 256
outBlob1 = cv2.dnn.blobFromImage(img_cv2,
                                scalefactor=1.0 / 255,
                                size=(inWidth, inHeight),
                                mean=(0, 0, 0),
                                swapRB=False,
                                crop=False)
print("未裁剪输出: ", outBlob1.shape)  # 未裁剪输出: 1,3,256,256
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

(2)、dnn.NMSBoxes

在这里插入图片描述

(3)、dnn.readNet

在这里插入图片描述

(4)、yolov3/v4 检测实践

简单版本： opencv 版本 4.5.2，从源码编译可以使用 GPU 加速，CPU 加速需配置 OpenVINO

import cv2
import argparse
import numpy as np

net_cfg = [
    {'net_name': 'yolov3', 'use_gpu': False,
     'conf_threshold': 0.3, 'nms_threshold': 0.3,
     'input_width': 416, 'input_height': 416,
     'cls_file': 'models/yolov3/coco.names',
     'model_weights': 'models/yolov3/yolov3.weights',
     'model_config': 'models/yolov3/yolov3.cfg'
     },
    {'net_name': 'yolov4', 'use_gpu': False,
     'conf_threshold': 0.3, 'nms_threshold': 0.3,
     'input_width': 608, 'input_height': 608,
     'cls_file': 'models/yolov4/coco.names',
     'model_weights': 'models/yolov4/yolov4.weights',
     'model_config': 'models/yolov4/yolov4.cfg'
     }
]


class YOLO(object):
    def __init__(self, cfg):
        print('Net use', cfg['net_name'])
        self.use_gpu = cfg['use_gpu']
        self.conf_threshold = cfg['conf_threshold']
        self.nms_threshold = cfg['nms_threshold']
        self.input_width = cfg['input_width']
        self.input_height = cfg['input_height']

        np.random.seed(666)
        self.classes = open(cfg['cls_file']).read().strip().split("\n")  # 加载可以检测的目标的类型
        self.colors = np.random.randint(0, 255, size=(len(self.classes), 3), dtype='uint8')  # 生成多种不同的颜色
        self.net = cv2.dnn_DetectionModel(model=cfg['model_weights'],
                                          config=cfg['model_config'])  # 导入 YOLO 配置和权重文件并加载网络

        if self.use_gpu:
            self.net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
            self.net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)
        else:
            self.net.setPreferableBackend(cv2.dnn.DNN_BACKEND_OPENCV)
            self.net.setPreferableTarget(cv2.dnn.DNN_TARGET_CPU)

    def detect(self, img_bgr):
        self.net.setInputParams(scale=1 / 255.0, size=(self.input_width, self.input_height),
                                mean=None, swapRB=True, crop=None)
        classes, scores, boxes = self.net.detect(img_bgr, self.conf_threshold, self.nms_threshold)

        # 画出所有的框及得分
        for class_id, score, box in zip(classes, scores, boxes):
            color = self.colors[int(class_id)].tolist()
            text = "{}: {:.2f}".format(self.classes[int(class_id)], float(score))
            cv2.rectangle(img_bgr, box, color, 1, lineType=cv2.LINE_AA)  # (left_x, top_y, w, h)
            cv2.putText(img_bgr, text, (box[0], box[1] - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.3, color, 1,
                        lineType=cv2.LINE_AA)  # 0.3 为字体大小，1 为线宽

        # 显示
        cv2.imshow("Object Detection using YOLO and OpenCV4", img_bgr)
        cv2.waitKey(0)

        return np.array(boxes)  # np.concatenate((boxes, scores, classes), axis=-1)


if __name__ == "__main__":
    parser = argparse.ArgumentParser(description='Object Detection using YOLO and OpenCV4')
    parser.add_argument('--img_path', type=str, default='test_imgs/car.jpg', help='image path')
    parser.add_argument('--net_type', type=int, default=1, choices=[0, 1])
    args = parser.parse_args()

    net_yolo = YOLO(net_cfg[args.net_type])
    img_bgr = cv2.imread(args.img_path)
    dets = net_yolo.detect(img_bgr)
    print(dets)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75

后处理展开复杂版本： opencv 版本 4.5.2，从源码编译可以使用 GPU 加速，CPU 加速需配置 OpenVINO

import cv2
import argparse
import numpy as np

net_cfg = [
    {'net_name': 'yolov3', 'use_gpu': False,
     'conf_threshold': 0.3, 'nms_threshold': 0.3,
     'input_width': 416, 'input_height': 416,
     'cls_file': 'models/yolov3/coco.names',
     'model_weights': 'models/yolov3/yolov3.weights',
     'model_config': 'models/yolov3/yolov3.cfg'
     },
    {'net_name': 'yolov4', 'use_gpu': False,
     'conf_threshold': 0.3, 'nms_threshold': 0.3,
     'input_width': 608, 'input_height': 608,
     'cls_file': 'models/yolov4/coco.names',
     'model_weights': 'models/yolov4/yolov4.weights',
     'model_config': 'models/yolov4/yolov4.cfg'
     }
]


class YOLO(object):
    def __init__(self, cfg):
        print('Net use', cfg['net_name'])
        self.use_gpu = cfg['use_gpu']
        self.conf_threshold = cfg['conf_threshold']
        self.nms_threshold = cfg['nms_threshold']
        self.input_width = cfg['input_width']
        self.input_height = cfg['input_height']

        np.random.seed(666)
        self.classes = open(cfg['cls_file']).read().strip().split("\n")  # 加载可以检测的目标的类型
        self.colors = np.random.randint(0, 255, size=(len(self.classes), 3), dtype='uint8')  # 生成多种不同的颜色
        self.net = cv2.dnn.readNet(model=cfg['model_weights'], config=cfg['model_config'])  # 导入 YOLO 配置和权重文件并加载网络

        if self.use_gpu:
            self.net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
            self.net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)
        else:
            self.net.setPreferableBackend(cv2.dnn.DNN_BACKEND_OPENCV)
            self.net.setPreferableTarget(cv2.dnn.DNN_TARGET_CPU)

    def postprocess(self, frame, layer_outputs):
        H, W = frame.shape[:2]

        # 后处理：score filtering & nms & topk
        boxes = []  # 存放目标的检测框
        confidences = []  # 置信度
        cls_ids = []  # 目标类别

        # 循环提取每个输出层
        for output in layer_outputs:
            # 循环提取每个框
            for detection in output:
                # 提取当前目标的类 ID 和置信度，detction:1*85 [5:]表示类别，[0:4]bbox的位置信息 【4】置信度
                scores = detection[5:]
                cls_id = np.argmax(scores)
                confidence = scores[cls_id]

                if confidence > self.conf_threshold:
                    # 将边界框坐标相对于图像的大小进行缩放，YOLO 返回的是边界框的中心坐标及宽度和高度
                    box = detection[0:4] * np.array([W, H, W, H])
                    centerX, centerY, width, height = box.astype("int")

                    # 转换出边框的左上角坐标
                    left_x = int(centerX - width / 2)
                    top_y = int(centerY - height / 2)

                    # 更新目标框，置信度，类别
                    boxes.append([left_x, top_y, int(width), int(height)])
                    confidences.append(float(confidence))
                    cls_ids.append(cls_id)

        # 非最大值抑制，确定唯一边框，注意 boxes 形式为(left_x, top_y, w, h)
        indices = cv2.dnn.NMSBoxes(boxes, confidences, self.conf_threshold, self.nms_threshold, top_k=None)

        # 确定每个对象至少有一个框存在才进行输出
        dets = []
        if len(indices) > 0:
            for i in indices.flatten():
                # if labels[cls_ids[i]] == "car":
                left_x, top_y = boxes[i][0], boxes[i][1]
                width, height = boxes[i][2], boxes[i][3]
                dets.append([left_x, top_y, left_x + width, top_y + height, confidences[i]])

                color = [int(c) for c in self.colors[cls_ids[i]]]
                text = "{}: {:.4f}".format(self.classes[cls_ids[i]], confidences[i])
                cv2.rectangle(frame, (left_x, top_y), (left_x + width, top_y + height), color, thickness=1,
                              lineType=cv2.LINE_AA)
                cv2.putText(frame, text, (left_x, top_y - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.3, color, thickness=1,
                            lineType=cv2.LINE_AA)  # 0.3 为字体大小，1 为线宽
        cv2.imshow("image", frame)
        cv2.waitKey(0)

        return np.array(dets)

    def detect(self, img_bgr):
        # 1、预处理及输入(1, 3, input_width, input_height)
        blob = cv2.dnn.blobFromImage(img_bgr, 1 / 255.0, (self.input_width, self.input_height),
                                     mean=None, swapRB=True, crop=None)
        self.net.setInput(blob)  # 将 blob 送入网络

        # 2、前向推理
        # all_layers = net.getLayerNames()  # 获取 yolo 中每一层的名称
        output_layers = self.net.getUnconnectedOutLayersNames()  # 获取 YOLO 未连接输出层的名称
        outputs = self.net.forward(output_layers)  # 前向传播，进行预测，返回目标框边界和相应的概率

        # 3、后处理
        dets = self.postprocess(img_bgr, outputs)

        return dets


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument('--img_path', type=str, default='test_imgs/car.jpg', help='image path')
    parser.add_argument('--net_type', type=int, default=1, choices=[0, 1])
    args = parser.parse_args()

    net_yolo = YOLO(net_cfg[args.net_type])
    img_bgr = cv2.imread(args.img_path)
    dets = net_yolo.detect(img_bgr)
    print(dets)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125

3.3、DNN 内部加速方法

(1)、层融合

在这里插入图片描述

(2)、内存复用

在这里插入图片描述

3.4、DNN 使用 OpenVINO 加速

(1)、OpenVINO 简介

OpenVINO 提供了一整套在 Intel 计算设备上完成深度学习推理计算的解决方案，它支持 Intel CPU、 GPU、FPGA 和 Movidius 计算棒等多种设备
OpenVINO 工具包的主要组件是深度学习部署工具包 DLDT （ Deep Learning Deployment Toolkit），主要包括模型优化器(Model Optimizer) 和推理引擎（Inference engine，IE）两部分
- 模型优化器：负责将各种格式的深度神经网络模型转换成 统一的自定义格式（包含网络配置文件 .xml 和模型参数文件 .bin），并在转换过程中进行模型优化
- 推理引擎：接受经过模型优化器转换并优化的网络模型，为 Intel 的各种计算设备提供高性能的神经网络推理运算，并将结果返回给 User Application(应用程序)

请添加图片描述

(2)、OpenCV 如何使用 OpenVINO ?

OpenCV DNN 调用 OpenVINO 如下图所示，有两种方式：
- 模型优化器模式： 直接使用 DLDT 模型优化器编译后的 OpenVINO 格式（.xml 和 .bin）的网络模型进行推理计算，这种模式下，网络模型将被直接加载到推理引擎中，创建出一个推理引擎网络对象
- 构建器模式： 需要在 DNN 模块内部将网络模型逐层转换成内部表示，并通过推理引擎后端建立内部推理引擎网络
- 优缺点： 相比构建器模式，模型优化器模式 支持网络中所有的层，不需要逐层建立 DNN 网络，而是直接加载 OpenVINO 模型到推理引擎，能够减少在网络加载和运算推理过程中报错的情况

请添加图片描述

# 1、DNN 的 GPU 模式，需要从源码编译 OpenCV，才能使用 GPU
- 修改 cmakelist，在opencv\modules\dnn\CMakeLists.txt文件中添加 add_definitions(-DHAVE_CUDA=1)
net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)


# 2、DNN 使用 OpenVINO 加速的 CPU 模式，需要从源码编译 OpenCV
net.setPreferableBackend(cv2.dnn.DNN_BACKEND_INFERENCE_ENGINE)  # 设置推理引擎后端
net.setPreferableTarget(cv2.dnn.DNN_TARGET_CPU)  # 设置运算设备


# 3、Enum of computation backends supported by layers
enum  	Backend {
  DNN_BACKEND_DEFAULT = 0,
  DNN_BACKEND_HALIDE,
  DNN_BACKEND_INFERENCE_ENGINE,  # Intel's Inference Engine computational backend 
  DNN_BACKEND_OPENCV,
  DNN_BACKEND_VKCOM,
  DNN_BACKEND_CUDA
}

# 4、Enum of target devices for computations
enum  	Target {
  DNN_TARGET_CPU = 0,
  DNN_TARGET_OPENCL,
  DNN_TARGET_OPENCL_FP16,
  DNN_TARGET_MYRIAD,
  DNN_TARGET_VULKAN,
  DNN_TARGET_FPGA,
  DNN_TARGET_CUDA,
  DNN_TARGET_CUDA_FP16,
  DNN_TARGET_HDDL
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33

四、参考资料

1、opencv 官方网站
 2、opencv 官方在线和离线文档
 3、参考 opencv 官方文档进行总结：reading_and_writing_images
4、python opencv 中imwrite 无法生成带有中文路径的图片？
5、OpenCV调用YOLOv4进行目标检测

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/空白诗007/article/detail/926829

1、OpenCV 基本知识及 DNN模块介绍（Python 接口）_opencv dnn

文章目录

一、OpenCV 安装

1.1、通过 pip 安装

1.2、通过 conda 安装

二、OpenCV 基础知识

2.1. 读取、显示和写入图像

2.2. 颜色空间转换

2.3. 在图像上画直线、矩形、圆、多边形(曲线)

2.4. 图像的基础操作

2.5. 视频处理

(1)、cv2.VideoCapture 类

(2)、cv2.VideoWriter 类

(3)、视频读取并保存示例

三、OpenCV 中的 DNN 模块

3.1、DNN 简介

3.2、DNN 常用方法简介

(1)、dnn.blobFromImage、dnn.blobFromImages

(2)、dnn.NMSBoxes

(3)、dnn.readNet

(4)、yolov3/v4 检测实践

3.3、DNN 内部加速方法

(1)、层融合

(2)、内存复用

3.4、DNN 使用 OpenVINO 加速

(1)、OpenVINO 简介

(2)、OpenCV 如何使用 OpenVINO ?

四、参考资料