当前位置:   article > 正文

部署Yolov5模型到jetson nano上_build cuda_10.2_r440.tc440_70.29663091_0

build cuda_10.2_r440.tc440_70.29663091_0

目录

一、检查是否安装cuda

二、安装好pip3,系统已经自带python3.6.9

三、检测是否安装gpu版本的tensorflow

四、安装pycuda

五、下载tensorrtx源码

六、模型测试


一、检查是否安装cuda

nvcc -V

  1. ljx@ljx-desktop:~/pycuda2/tensorrtx-yolov5-v5.0/yolov5$ nvcc -V
  2. nvcc: NVIDIA (R) Cuda compiler driver
  3. Copyright (c) 2005-2021 NVIDIA Corporation
  4. Built on Sun_Feb_28_22:34:44_PST_2021
  5. Cuda compilation tools, release 10.2, V10.2.300
  6. Build cuda_10.2_r440.TC440_70.29663091_0
  7. ljx@ljx-desktop:~/pycuda2/tensorrtx-yolov5-v5.0/yolov5$
  1. cd /usr/src/cudnn_samples_v8/mnistCUDNN
  2. sudo make
  3. sudo chmod a+x mnistCUDNN
  4. ./mnistCUDNN

二、安装好pip3,系统已经自带python3.6.9

sudo apt-get install python3-pip python3-dev

 三、检测是否安装gpu版本的tensorflow

1.安装方法之前的文章有这里举个例子

sudo apt-get install libhdf5-serial-dev hdf5-tools
pip3 install --extra-index-url https://developer.download.nvidia.com/compute/redist/jp/v46 tensorflow-gpu==2.6.0+nv19.3 --user

 2.检测方法举两个例子

  1. ljx@ljx-desktop:~/pycuda2/pycuda-2021.1$ python3
  2. Python 3.6.9 (default, Dec 8 2021, 21:08:43)
  3. [GCC 8.4.0] on linux
  4. Type "help", "copyright", "credits" or "license" for more information.
  5. >>> import torchvision
  6. >>> print(torchvision.__version__)
  7. 0.11.1
  8. >>> import tensorflow as tf
  9. >>> a = tf.constant(1.)
  10. 2022-02-21 21:25:38.178350: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1017] ARM64 does not support NUMA - returning NUMA node zero
  11. 2022-02-21 21:25:38.179671: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1017] ARM64 does not support NUMA - returning NUMA node zero
  12. 2022-02-21 21:25:38.180036: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1017] ARM64 does not support NUMA - returning NUMA node zero
  13. 2022-02-21 21:25:38.194555: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1017] ARM64 does not support NUMA - returning NUMA node zero
  14. 2022-02-21 21:25:38.196004: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1017] ARM64 does not support NUMA - returning NUMA node zero
  15. 2022-02-21 21:25:38.197011: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1017] ARM64 does not support NUMA - returning NUMA node zero
  16. 2022-02-21 21:26:58.812460: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1017] ARM64 does not support NUMA - returning NUMA node zero
  17. 2022-02-21 21:26:58.873885: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1017] ARM64 does not support NUMA - returning NUMA node zero
  18. 2022-02-21 21:26:58.909564: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1017] ARM64 does not support NUMA - returning NUMA node zero
  19. 2022-02-21 21:26:59.039953: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 41 MB memory: -> device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3
  20. >>> import os
  21. >>> os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
  22. >>> a = tf.constant(1.)
  23. >>> b = tf.constant(2.)
  24. >>> print(a+b)
  25. tf.Tensor(3.0, shape=(), dtype=float32)
  26. >>> print('GPU:', tf.test.is_gpu_available())
  27. WARNING:tensorflow:From <stdin>:1: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
  28. Instructions for updating:
  29. Use `tf.config.list_physical_devices('GPU')` instead.
  30. 2022-02-21 21:32:15.515633: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1017] ARM64 does not support NUMA - returning NUMA node zero
  31. 2022-02-21 21:32:15.517432: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1017] ARM64 does not support NUMA - returning NUMA node zero
  32. 2022-02-21 21:32:15.518313: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1017] ARM64 does not support NUMA - returning NUMA node zero
  33. 2022-02-21 21:32:15.527565: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1017] ARM64 does not support NUMA - returning NUMA node zero
  34. 2022-02-21 21:32:15.528595: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1017] ARM64 does not support NUMA - returning NUMA node zero
  35. 2022-02-21 21:32:15.529327: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /device:GPU:0 with 41 MB memory: -> device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3
  36. GPU: True
  37. >>>
  1. ljx@ljx-desktop:~/pycuda2$ cat demo3.py
  2. import tensorflow as tf
  3. tf.compat.v1.disable_eager_execution()
  4. with tf.device('/cpu:0'):
  5. a = tf.constant([1.0,2.0,3.0],shape=[3],name='a')
  6. b = tf.constant([1.0,2.0,3.0],shape=[3],name='b')
  7. with tf.device('/gpu:1'):
  8. c = a+b
  9. sess = tf.compat.v1.Session(config=tf.compat.v1.ConfigProto(allow_soft_placement=True,log_device_placement=True))
  10. sess.run(tf.compat.v1.global_variables_initializer())
  11. print(sess.run(c))
  12. ljx@ljx-desktop:~/pycuda2$ python3 demo3.py
  13. 2022-02-24 13:36:43.842123: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1017] ARM64 does not support NUMA - returning NUMA node zero
  14. 2022-02-24 13:36:45.249622: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1017] ARM64 does not support NUMA - returning NUMA node zero
  15. 2022-02-24 13:36:45.251626: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1017] ARM64 does not support NUMA - returning NUMA node zero
  16. 2022-02-24 13:38:19.897324: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1017] ARM64 does not support NUMA - returning NUMA node zero
  17. 2022-02-24 13:38:20.908341: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1017] ARM64 does not support NUMA - returning NUMA node zero
  18. 2022-02-24 13:38:20.941767: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1017] ARM64 does not support NUMA - returning NUMA node zero
  19. 2022-02-24 13:38:21.589736: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 39 MB memory: -> device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3
  20. 2022-02-24 13:38:22.835843: I tensorflow/core/common_runtime/direct_session.cc:361] Device mapping:
  21. /job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3
  22. add: (AddV2): /job:localhost/replica:0/task:0/device:GPU:0
  23. 2022-02-24 13:38:28.950709: I tensorflow/core/common_runtime/placer.cc:114] add: (AddV2): /job:localhost/replica:0/task:0/device:GPU:0
  24. init: (NoOp): /job:localhost/replica:0/task:0/device:GPU:0
  25. 2022-02-24 13:38:28.988627: I tensorflow/core/common_runtime/placer.cc:114] init: (NoOp): /job:localhost/replica:0/task:0/device:GPU:0
  26. a: (Const): /job:localhost/replica:0/task:0/device:CPU:0
  27. 2022-02-24 13:38:28.988931: I tensorflow/core/common_runtime/placer.cc:114] a: (Const): /job:localhost/replica:0/task:0/device:CPU:0
  28. b: (Const): /job:localhost/replica:0/task:0/device:CPU:0
  29. 2022-02-24 13:38:28.989207: I tensorflow/core/common_runtime/placer.cc:114] b: (Const): /job:localhost/replica:0/task:0/device:CPU:0
  30. [2. 4. 6.]

四、安装pycuda

官方解决方案【链接
不想去看的话,直接下载这个链接的源码,同下步骤进行安装即可

 pycuda · PyPI

  1. tar zxvf pycuda-2021.1.tar.gz
  2. cd pycuda-2021.1/
  3. python3 configure.py --cuda-root=/usr/local/cuda-10.2
  4. sudo python3 setup.py install

demo测试

  1. ljx@ljx-desktop:~/pycuda2$ python3 demo2.py
  2. [[ 19.436962 39.908886 20.68723 ... -8.1019335 -15.546103
  3. -17.154585 ]
  4. [-19.714169 -0.6291714 9.462954 ... -15.174974 -4.1439514
  5. 18.460089 ]
  6. [-17.491064 -34.86578 -12.999788 ... -17.18811 10.867537
  7. 0.05436563]
  8. ...
  9. [ 45.716812 -32.27492 -0.5752983 ... -31.032787 -4.8378153
  10. 7.907672 ]
  11. [ 6.989045 -13.123575 -2.8372145 ... 21.856476 5.0534296
  12. -15.905795 ]
  13. [ 17.042442 0.354123 -7.9831614 ... -11.882836 20.23512
  14. -19.761951 ]]
  15. [[ 19.436964 39.908894 20.687223 ... -8.101934 -15.54609
  16. -17.154581 ]
  17. [-19.71417 -0.62916106 9.46296 ... -15.174983 -4.1439533
  18. 18.460089 ]
  19. [-17.491072 -34.86579 -12.999789 ... -17.188126 10.867537
  20. 0.05437115]
  21. ...
  22. [ 45.716824 -32.27491 -0.57529545 ... -31.03278 -4.8378134
  23. 7.907671 ]
  24. [ 6.989043 -13.123584 -2.8372157 ... 21.856468 5.053428
  25. -15.905798 ]
  26. [ 17.042446 0.35412684 -7.98316 ... -11.882843 20.23511
  27. -19.761948 ]]

  1. ljx@ljx-desktop:~/pycuda2$ cat demo2.py
  2. import numpy as np
  3. import pycuda.autoinit
  4. import pycuda.driver as cuda
  5. from pycuda.compiler import SourceModule
  6. mod = SourceModule("""
  7. #define BLOCK_SIZE 16
  8. typedef struct {
  9. int width;
  10. int height;
  11. int stride;
  12. int __padding; //为了和64位的elements指针对齐
  13. float* elements;
  14. } Matrix;
  15. // 读取矩阵元素
  16. __device__ float GetElement(const Matrix A, int row, int col)
  17. {
  18. return A.elements[row * A.stride + col];
  19. }
  20. // 赋值矩阵元素
  21. __device__ void SetElement(Matrix A, int row, int col, float value)
  22. {
  23. A.elements[row * A.stride + col] = value;
  24. }
  25. // 获取 16x16 的子矩阵
  26. __device__ Matrix GetSubMatrix(Matrix A, int row, int col)
  27. {
  28. Matrix Asub;
  29. Asub.width = BLOCK_SIZE;
  30. Asub.height = BLOCK_SIZE;
  31. Asub.stride = A.stride;
  32. Asub.elements = &A.elements[A.stride * BLOCK_SIZE * row + BLOCK_SIZE * col];
  33. return Asub;
  34. }
  35. __global__ void matrix_mul(Matrix *A, Matrix *B, Matrix *C)
  36. {
  37. int blockRow = blockIdx.y;
  38. int blockCol = blockIdx.x;
  39. int row = threadIdx.y;
  40. int col = threadIdx.x;
  41. Matrix Csub = GetSubMatrix(*C, blockRow, blockCol);
  42. // 每个线程通过累加Cvalue计算Csub的一个值
  43. float Cvalue = 0;
  44. // 为了计算Csub遍历所有需要的Asub和Bsub
  45. for (int m = 0; m < (A->width / BLOCK_SIZE); ++m)
  46. {
  47. Matrix Asub = GetSubMatrix(*A, blockRow, m);
  48. Matrix Bsub = GetSubMatrix(*B, m, blockCol);
  49. __shared__ float As[BLOCK_SIZE][BLOCK_SIZE];
  50. __shared__ float Bs[BLOCK_SIZE][BLOCK_SIZE];
  51. As[row][col] = GetElement(Asub, row, col);
  52. Bs[row][col] = GetElement(Bsub, row, col);
  53. __syncthreads();
  54. for (int e = 0; e < BLOCK_SIZE; ++e)
  55. Cvalue += As[row][e] * Bs[e][col];
  56. __syncthreads();
  57. }
  58. SetElement(Csub, row, col, Cvalue);
  59. }
  60. """)
  61. class MatrixStruct(object):
  62. def __init__(self, array):
  63. self._cptr = None
  64. self.shape, self.dtype = array.shape, array.dtype
  65. self.width = np.int32(self.shape[1])
  66. self.height = np.int32(self.shape[0])
  67. self.stride = self.width
  68. self.elements = cuda.to_device(array) # 分配内存并拷贝数组数据至device,返回其地址
  69. def send_to_gpu(self):
  70. self._cptr = cuda.mem_alloc(self.nbytes()) # 分配一个C结构体所占的内存
  71. cuda.memcpy_htod(int(self._cptr), self.width.tobytes()) # 拷贝数据至device,下同
  72. cuda.memcpy_htod(int(self._cptr)+4, self.height.tobytes())
  73. cuda.memcpy_htod(int(self._cptr)+8, self.stride.tobytes())
  74. cuda.memcpy_htod(int(self._cptr)+16, np.intp(int(self.elements)).tobytes())
  75. def get_from_gpu(self):
  76. return cuda.from_device(self.elements, self.shape, self.dtype) # 从device取回数组数据
  77. def nbytes(self):
  78. return self.width.nbytes * 4 + np.intp(0).nbytes
  79. a = np.random.randn(400,400).astype(np.float32)
  80. b = np.random.randn(400,400).astype(np.float32)
  81. c = np.zeros_like(a)
  82. A = MatrixStruct(a)
  83. B = MatrixStruct(b)
  84. C = MatrixStruct(c)
  85. A.send_to_gpu()
  86. B.send_to_gpu()
  87. C.send_to_gpu()
  88. matrix_mul = mod.get_function("matrix_mul")
  89. matrix_mul(A._cptr, B._cptr, C._cptr, block=(16,16,1), grid=(25,25))
  90. result = C.get_from_gpu()
  91. print(np.dot(a,b))
  92. print(result)

五、下载tensorrtx源码

进入tensorrtx的官网,下载你训练时对应的yolov5的版本,点击左上角的master-->tags-->yolov5

下载完成后,来到下载目录下,输入以下命令解压,我这里是v5.0版本

unzip tensorrtx-yolov5-v5.0.zip

 把之前训练的模型生成的wts权重文件放到tensorrtx的yolov5文件夹中

没有wts文件只是想体验强大的jetson nano的同学可以先下载一下五类垃圾分类权重文件https://blog.csdn.net/xiaoyuan2157

链接: https://pan.baidu.com/s/1nciB7Xn1vXj9ZfBAoj39Bw 提取码: r74h 

来到tensorrtx的yolov5文件夹,打开yololayer.h的代码,修改CLASS_NUM

创建进入文件夹buildcmake ..

  1. mkdir build
  2. cd build
  3. cmake ..
  4. make

 生成引擎文件

sudo ./yolov5 -s ../best.wts best.engine s 

这一段模型引擎生成的命令解释如下

sudo ./yolov5 -s/ [.wts文件路径] [.engine文件名称] [s/m/l/x/s6/m6/l6/x6 or c/c6 gd gw]

稍作等待后,出现Build engine successfully!表示生成完成,这时build文件夹里面会多出一个best.engine文件

六、模型测试

根据官方的yolov5_trt改的代码来测试一下 

  1. ljx@ljx-desktop:~/pycuda2/tensorrtx-yolov5-v5.0/yolov5$ cat yolov5_trt2.py
  2. """
  3. # Yolov5 基于pytorch,修改起来更加方便快捷;
  4. # yolov5自带anchor生成器,自动为你的数据集生成最优化的anchor;
  5. # yolov5的整体AP比yolov4更高。
  6. """
  7. import ctypes
  8. import os
  9. import random
  10. import sys
  11. import threading
  12. import time
  13. # 安装串口函数库 sudo pip3 install pyserial
  14. import serial
  15. import serial as se # 导入串口库,这里是用于串口通信的库,需要在命令行输入
  16. #pip3 install pyserial
  17. import cv2
  18. import numpy as np # 构造ndarray对象
  19. import pycuda.autoinit
  20. import pycuda.driver as cuda
  21. import tensorrt as trt
  22. from time import sleep
  23. # from jetcam.csi_camera import CSICamera
  24. # import torch
  25. # import torchvision#在nano上安装这两个库是有些麻烦的特别是torchvision。
  26. INPUT_W = 640
  27. INPUT_H = 640
  28. CONF_THRESH = 0.8 # 概率阈值
  29. IOU_THRESHOLD = 0.1
  30. # 定义画框函数
  31. def plot_one_box(x, img, color=None, label=None, line_thickness=None):
  32. '''
  33. description: Plots one bounding box on image img,
  34. this function comes from YoLov5 project.
  35. param:
  36. x: a box likes [x1,y1,x2,y2]
  37. img: a opencv image object
  38. label: str
  39. line_thickness: int
  40. return:
  41. no return
  42. '''
  43. # img, result_boxes, result_scores, result_classid = yolov5_wrapper.infer(img)
  44. # img = draw_boxes(img, result_boxes, result_scores, result_classid)
  45. tl = (
  46. line_thickness or round(0.002 * (img.shape[0] + img.shape[1]) / 2) + 1
  47. ) # line/font thickness
  48. color = color or [random.randint(0, 255) for _ in range(3)]
  49. c1, c2 = (int(x[0]), int(x[1])), (int(x[2]), int(x[3]))
  50. cv2.rectangle(img, c1, c2, color, thickness=tl, lineType=cv2.LINE_AA)
  51. # print("left:(" + str(c1[0]) + "," + str(c1[1]) +")","right:(" + str(c2[0]) + "," + str(c2[1])+ ")")
  52. a = int(c1[0])
  53. b = int(c2[0])
  54. c = int(c1[1])
  55. d = int(c2[1])
  56. x1 = (b + a) / 2
  57. x = int(x1)
  58. y1 = (d + c) / 2
  59. y = int(y1)
  60. r = label[2:6] #rate
  61. sleep(0.0009)
  62. c =str(label[0]) #class
  63. if label:
  64. tf = max(tl - 1, 1) # font thickness
  65. t_size = cv2.getTextSize(label, 0, fontScale=tl / 3, thickness=tf)[0]
  66. c2 = c1[0] + t_size[0], c1[1] - t_size[1] - 3
  67. cv2.rectangle(img, c1, c2, color, -1, cv2.LINE_AA) # filled
  68. cv2.putText(
  69. img,
  70. label,
  71. (c1[0], c1[1] - 2),
  72. 0,
  73. tl / 3,
  74. [225, 255, 255],
  75. thickness=tf,
  76. lineType=cv2.LINE_AA,
  77. )
  78. return x, y
  79. # 画框函数
  80. def draw_boxes(image_raw, result_boxes, result_scores, result_classid):
  81. max_scores = -1
  82. max_index = -1
  83. max_x,max_y = -1,-1
  84. for i in range(len(result_boxes)):
  85. box = result_boxes[i]
  86. x, y = plot_one_box(
  87. box,
  88. image_raw,
  89. label="{}:{:.2f}".format(
  90. categories[int(result_classid[i])], result_scores[i]
  91. )
  92. )
  93. # print(result_classid[i])
  94. # se.write((str(x) + ',' + str(y) + ',' + str(result_classid[i]) + '\r\n').encode())
  95. # global max_score
  96. if result_boxes.all() > max_scores:
  97. max_scores = result_scores[i]
  98. max_index = i
  99. max_x, max_y = x, y
  100. if max_scores != -1:
  101. c = int(result_classid[max_index])
  102. output_str = ('[' + str(x) + ',' + str(y) + ',' +str(c) + ']'+'\r\n')
  103. print(output_str)
  104. se.write(output_str.encode())
  105. sleep(0.0009)
  106. return image_raw
  107. # yolov5模型转到TensorRT中推理
  108. # 定义yolov5转trt的类 start
  109. class YoLov5TRT(object):
  110. """
  111. description: A YOLOv5 class that warps TensorRT ops, preprocess and postprocess ops.
  112. """
  113. def __init__(self, engine_file_path):
  114. # Create a Context on this device,
  115. self.ctx = cuda.Device(0).make_context()
  116. stream = cuda.Stream()
  117. TRT_LOGGER = trt.Logger(trt.Logger.INFO)
  118. runtime = trt.Runtime(TRT_LOGGER)
  119. # Deserialize the engine from file
  120. with open(engine_file_path, "rb") as f:
  121. engine = runtime.deserialize_cuda_engine(f.read())
  122. context = engine.create_execution_context()
  123. host_inputs = []
  124. cuda_inputs = []
  125. host_outputs = []
  126. cuda_outputs = []
  127. bindings = []
  128. for binding in engine:
  129. size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
  130. dtype = trt.nptype(engine.get_binding_dtype(binding))
  131. # Allocate host and device buffers
  132. host_mem = cuda.pagelocked_empty(size, dtype)
  133. cuda_mem = cuda.mem_alloc(host_mem.nbytes)
  134. # Append the device buffer to device bindings.
  135. bindings.append(int(cuda_mem))
  136. # Append to the appropriate list.
  137. if engine.binding_is_input(binding):
  138. host_inputs.append(host_mem)
  139. cuda_inputs.append(cuda_mem)
  140. else:
  141. host_outputs.append(host_mem)
  142. cuda_outputs.append(cuda_mem)
  143. # Store
  144. self.stream = stream
  145. self.context = context
  146. self.engine = engine
  147. self.host_inputs = host_inputs
  148. self.cuda_inputs = cuda_inputs
  149. self.host_outputs = host_outputs
  150. self.cuda_outputs = cuda_outputs
  151. self.bindings = bindings
  152. # 释放引擎,释放GPU显存,释放CUDA流
  153. def __del__(self):
  154. print("delete object to release memory")
  155. def infer(self, input_image_path):
  156. threading.Thread.__init__(self)
  157. # Make self the active context, pushing it on top of the context stack.
  158. self.ctx.push()
  159. # Restore
  160. stream = self.stream
  161. context = self.context
  162. engine = self.engine
  163. host_inputs = self.host_inputs
  164. cuda_inputs = self.cuda_inputs
  165. host_outputs = self.host_outputs
  166. cuda_outputs = self.cuda_outputs
  167. bindings = self.bindings
  168. # Do image preprocess
  169. input_image, image_raw, origin_h, origin_w = self.preprocess_image(
  170. input_image_path
  171. )
  172. # Copy input image to host buffer
  173. np.copyto(host_inputs[0], input_image.ravel())
  174. start = time.time()
  175. # Transfer input data to the GPU.
  176. cuda.memcpy_htod_async(cuda_inputs[0], host_inputs[0], stream)
  177. # Run inference.
  178. context.execute_async(bindings=bindings, stream_handle=stream.handle)
  179. # Transfer predictions back from the GPU.
  180. cuda.memcpy_dtoh_async(host_outputs[0], cuda_outputs[0], stream)
  181. # Synchronize the stream
  182. stream.synchronize()
  183. end = time.time()
  184. # Remove any context from the top of the context stack, deactivating it.
  185. self.ctx.pop()
  186. # Here we use the first row of output in that batch_size = 1
  187. output = host_outputs[0]
  188. # Do postprocess
  189. result_boxes, result_scores, result_classid = self.post_process(
  190. output, origin_h, origin_w
  191. )
  192. return image_raw, result_boxes, result_scores, result_classid
  193. def destroy(self):
  194. # Remove any context from the top of the context stack, deactivating it.
  195. self.ctx.pop()
  196. def preprocess_image(self, image_raw):
  197. """
  198. description: Read an image from image path, convert it to RGB,
  199. resize and pad it to target size, normalize to [0,1],
  200. transform to NCHW format.
  201. param:
  202. input_image_path: str, image path
  203. return:
  204. image: the processed image
  205. image_raw: the original image
  206. h: original height
  207. w: original width
  208. """
  209. h, w, c = image_raw.shape
  210. image = cv2.cvtColor(image_raw, cv2.COLOR_BGR2RGB)
  211. # Calculate widht and height and paddings
  212. r_w = INPUT_W / w
  213. r_h = INPUT_H / h
  214. if r_h > r_w:
  215. tw = INPUT_W
  216. th = int(r_w * h)
  217. tx1 = tx2 = 0
  218. ty1 = int((INPUT_H - th) / 2)
  219. ty2 = INPUT_H - th - ty1
  220. else:
  221. tw = int(r_h * w)
  222. th = INPUT_H
  223. tx1 = int((INPUT_W - tw) / 2)
  224. tx2 = INPUT_W - tw - tx1
  225. ty1 = ty2 = 0
  226. # Resize the image with long side while maintaining ratio
  227. image = cv2.resize(image, (tw, th))
  228. # Pad the short side with (128,128,128)
  229. image = cv2.copyMakeBorder(
  230. image, ty1, ty2, tx1, tx2, cv2.BORDER_CONSTANT, (128, 128, 128)
  231. )
  232. image = image.astype(np.float32)
  233. # Normalize to [0,1]
  234. image /= 255.0
  235. # HWC to CHW format:
  236. image = np.transpose(image, [2, 0, 1])
  237. # CHW to NCHW format
  238. image = np.expand_dims(image, axis=0)
  239. # Convert the image to row-major order, also known as "C order":
  240. image = np.ascontiguousarray(image)
  241. return image, image_raw, h, w
  242. def xywh2xyxy(self, origin_h, origin_w, x):
  243. """
  244. description: Convert nx4 boxes from [x, y, w, h] to [x1, y1, x2, y2] where xy1=top-left, xy2=bottom-right
  245. param:
  246. origin_h: height of original image
  247. origin_w: width of original image
  248. x: A boxes tensor, each row is a box [center_x, center_y, w, h]
  249. return:
  250. y: A boxes tensor, each row is a box [x1, y1, x2, y2]
  251. """
  252. y = np.zeros_like(x)
  253. # y = torch.zeros_like(x) if isinstance(x, torch.Tensor) else np.zeros_like(x)
  254. r_w = INPUT_W / origin_w
  255. r_h = INPUT_H / origin_h
  256. if r_h > r_w:
  257. y[:, 0] = x[:, 0] - x[:, 2] / 2
  258. y[:, 2] = x[:, 0] + x[:, 2] / 2
  259. y[:, 1] = x[:, 1] - x[:, 3] / 2 - (INPUT_H - r_w * origin_h) / 2
  260. y[:, 3] = x[:, 1] + x[:, 3] / 2 - (INPUT_H - r_w * origin_h) / 2
  261. y /= r_w
  262. else:
  263. y[:, 0] = x[:, 0] - x[:, 2] / 2 - (INPUT_W - r_h * origin_w) / 2
  264. y[:, 2] = x[:, 0] + x[:, 2] / 2 - (INPUT_W - r_h * origin_w) / 2
  265. y[:, 1] = x[:, 1] - x[:, 3] / 2
  266. y[:, 3] = x[:, 1] + x[:, 3] / 2
  267. y /= r_h
  268. return y
  269. # 往YoLov5TRT这个类中加入一个方法,此处是用numpy的方式实现nms
  270. def nms(self, boxes, scores, iou_threshold=IOU_THRESHOLD): # 非极大值抑制,是目标检测框架中的后处理模块
  271. # 空间距离结合并交比(IOU)完成聚类划分
  272. x1 = boxes[:, 0]
  273. y1 = boxes[:, 1]
  274. x2 = boxes[:, 2]
  275. y2 = boxes[:, 3]
  276. areas = (y2 - y1 + 1) * (x2 - x1 + 1)
  277. scores = scores
  278. keep = []
  279. index = scores.argsort()[::-1]
  280. while index.size > 0:
  281. i = index[0] # every time the first is the biggst, and add it directly
  282. keep.append(i)
  283. x11 = np.maximum(x1[i], x1[index[1:]]) # calculate the points of overlap
  284. y11 = np.maximum(y1[i], y1[index[1:]])
  285. x22 = np.minimum(x2[i], x2[index[1:]])
  286. y22 = np.minimum(y2[i], y2[index[1:]])
  287. w = np.maximum(0, x22 - x11 + 1) # the weights of overlap
  288. h = np.maximum(0, y22 - y11 + 1) # the height of overlap
  289. overlaps = w * h
  290. ious = overlaps / (areas[i] + areas[index[1:]] - overlaps)
  291. idx = np.where(ious <= iou_threshold)[0]
  292. index = index[idx + 1] # because index start from 1
  293. # print(overlaps)
  294. # print(x1)
  295. # sleep(1)
  296. return keep
  297. # 把nms的结果赋值给indices变量,改写post_process函数
  298. def post_process(self, output, origin_h, origin_w):
  299. """
  300. description: postprocess the prediction
  301. param:
  302. output: A tensor likes [num_boxes,cx,cy,w,h,conf,cls_id, cx,cy,w,h,conf,cls_id, ...]
  303. origin_h: height of original image
  304. origin_w: width of original image
  305. return:
  306. result_boxes: finally boxes, a boxes tensor, each row is a box [x1, y1, x2, y2]
  307. result_scores: finally scores, a tensor, each element is the score correspoing to box
  308. result_classid: finally classid, a tensor, each element is the classid correspoing to box
  309. """
  310. # Get the num of boxes detected
  311. num = int(output[0])
  312. # Reshape to a two dimentional ndarray
  313. pred = np.reshape(output[1:], (-1, 6))[:num, :]
  314. # to a torch Tensor
  315. # pred = torch.Tensor(pred).cuda()#去掉这行,用torchvision库中的nms方法来完成非极大值抑制。
  316. # Get the boxes
  317. boxes = pred[:, :4]
  318. # Get the scores
  319. scores = pred[:, 4]
  320. # Get the classid
  321. classid = pred[:, 5]
  322. # Choose those boxes that score > CONF_THRESH
  323. si = scores > CONF_THRESH
  324. boxes = boxes[si, :]
  325. scores = scores[si]
  326. classid = classid[si]
  327. # Trandform bbox from [center_x, center_y, w, h] to [x1, y1, x2, y2]
  328. boxes = self.xywh2xyxy(origin_h, origin_w, boxes)
  329. # Do nms
  330. # 去掉cpu方法,因为ndarray没有这个方法
  331. # indices = torchvision.ops.nms(boxes, scores, iou_threshold=IOU_THRESHOLD).cpu()
  332. # result_boxes = boxes[indices, :].cpu()
  333. # result_scores = scores[indices].cpu()
  334. # result_classid = classid[indices].cpu()
  335. indices = self.nms(boxes, scores, IOU_THRESHOLD)
  336. result_boxes = boxes[indices, :]
  337. result_scores = scores[indices]
  338. result_classid = classid[indices]
  339. # print(result_boxes)
  340. # print(result_classid)
  341. return result_boxes, result_scores, result_classid
  342. class myThread(threading.Thread):
  343. def __init__(self, func, args):
  344. threading.Thread.__init__(self)
  345. self.func = func
  346. self.args = args
  347. def run(self):
  348. self.func(*self.args)
  349. # 摄像头检测
  350. def detect_camera(camera, yolov5_wrapper):
  351. # def detect_camera(x,camera, yolov5_wrapper):
  352. count = 0
  353. # 开始循环检测
  354. while True:
  355. # img = camera.read()#CSI摄像头
  356. ret, img = camera.read() # usb摄像头用这个
  357. img, result_boxes, result_scores, result_classid = yolov5_wrapper.infer(img)
  358. img = draw_boxes(img, result_boxes, result_scores, result_classid)
  359. count = count + 1
  360. cv2.imshow("result", img) # 显示结果
  361. if cv2.waitKey(1) == ord('q'):
  362. break
  363. # 定义摄像头函数
  364. def main_camera():
  365. camera = cv2.VideoCapture(0) # usb摄像头用这个
  366. # camera = CSICamera(capture_device=0, width=640, height=480)
  367. # load custom plugins
  368. camera.set(3, 640)
  369. camera.set(4, 480)
  370. PLUGIN_LIBRARY = "build/libmyplugins.so"
  371. ctypes.CDLL(PLUGIN_LIBRARY)
  372. engine_file_path = "build/yolov5s.engine"
  373. # YoLov5TRT instance
  374. yolov5_wrapper = YoLov5TRT(engine_file_path)
  375. print("start detection!")
  376. detect_camera(camera, yolov5_wrapper)
  377. # camera.release() # 使用cv方法打开摄像头才需要这句
  378. cv2.destroyAllWindows()
  379. print("\nfinish!")
  380. if __name__ == "__main__":
  381. # load custom plugins 修改成你build出来的引擎的相对路径
  382. PLUGIN_LIBRARY = "build/libmyplugins.so"
  383. ctypes.CDLL(PLUGIN_LIBRARY)
  384. engine_file_path = "build/yolov5s.engine"
  385. se = serial.Serial('/dev/ttyTHS1', 115200, timeout=0.5) # 设置使用的引脚、波特率和超时时间 8接R,10接T
  386. # load coco labels
  387. # categories = ['battery', 'orange', 'bottle', 'paper_cup', 'spitball'] # 垃圾种类
  388. categories = ['0', '1', '2', '3', '4'] # 垃圾种类
  389. main_camera()

都是按照大佬们的博客复制学习的,真尴尬哈哈

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/article/detail/42164?site
推荐阅读
相关标签
  

闽ICP备14008679号