赞
踩
MMDetection
是一个目标检测工具箱,包含了丰富的目标检测、实例分割、全景分割算法以及相关的组件和模块,github项目地址。Object Detection
)模型(近年来的一些SOTA模型):DAB-DETR、RTMDet、GLIP、Detic、DINOInstance Segmentation
)模型(近年来的一些SOTA模型):Mask2former、BoxInst、SparseInst、RTMDetPanoptic Segmentation
)模型:Panoptic FPN、MaskFormer、Mask2FormerMMDetection
的训练与测试过程,在数据集Dog and Cat Detection
上微调了RTMDet
模型,解析了RTMDet
模型,最终模型指标bbox_mAP
达到了0.952。import IPython.display as display !pip install openmim !mim install mmengine==0.7.2 # 构建wheel,需要30分钟,构建好以后将whl文件放入单独的文件夹 # !git clone https://github.com/open-mmlab/mmcv.git # !cd mmcv && CUDA_HOME=/usr/local/cuda-11.8 MMCV_WITH_OPS=1 pip wheel --wheel-dir=/kaggle/working . !pip install -q /kaggle/input/frozen-packages-mmdetection/mmcv-2.0.1-cp310-cp310-linux_x86_64.whl !rm -rf mmdetection !git clone https://github.com/open-mmlab/mmdetection.git !git clone https://github.com/open-mmlab/mmyolo.git %cd mmdetection %pip install -e . !pip install wandb display.clear_output()
open-mmlab
的包管理库openmim
,然后安装mmengine
库,代码如下:!pip install openmim
!mim install mmengine==0.7.2
kaggle
中无法通过mim
直接安装mmcv
(后续训练会报错),我们只能通过构建wheel
的方式安装,代码如下:!git clone https://github.com/open-mmlab/mmcv.git
!cd mmcv && CUDA_HOME=/usr/local/cuda-11.8 MMCV_WITH_OPS=1 pip wheel --wheel-dir=/kaggle/working .
/kaggle/working
目录下发现mmcv-2.0.1-cp310-cp310-linux_x86_64.whl
文件,使用pip install -q /kaggle/working/mmcv-2.0.1-cp310-cp310-linux_x86_64.whl
安装即可。但为了节省时间,防止每次运行都需要等很长时间,我将构建的wheel
下载然后上传到kaggle Datasets
这样每次只用加载数据集就可以安装了,这里提供数据地址。所以安装代码变为:!pip install -q /kaggle/input/frozen-packages-mmdetection/mmcv-2.0.1-cp310-cp310-linux_x86_64.whl
git clone
的方式安装mmdetection
,因为数据集为.xml
后缀,后面我们需要使用mmyolo
中的工具转换格式,所以一起下载,但不安装mmyolo
。!rm -rf mmdetection
!git clone https://github.com/open-mmlab/mmdetection.git
!git clone https://github.com/open-mmlab/mmyolo.git
# 进入mmdetection项目文件夹
%cd mmdetection
# 安装mmdetection
%pip install -e .
wandb
包,并登录。!pip install wandb
import wandb
wandb.login()
checkpoints
,用于存放模型的预训练权重。因为我们选择的是RTMDet
模型,所以下载对应权重。mmdetection
的github项目地址,进入configs/rtmdet
路径,在README.md
文件中有详细的预训练权重。Params
)越多,精度指标(box AP
)越高,我们选择一个参数量适中的模型RTMDet-l
,对应的configs
文件名为rtmdet_l_8xb32-300e_coco.py
。意思是RTMDet-l型号,在8个GPU上,每个GPUbatch size
为32,在coco
数据集上训练了300epochs
的权重。下载并保存在checkpoints
文件夹下!mkdir ./checkpoints
!mim download mmdet --config rtmdet_l_8xb32-300e_coco --dest ./checkpoints
from mmdet.apis import DetInferencer model_name = 'rtmdet_l_8xb32-300e_coco' checkpoint = './checkpoints/rtmdet_l_8xb32-300e_coco_20220719_112030-5a0be7c4.pth' device = 'cuda:0' inferencer = DetInferencer(model_name, checkpoint, device) img = './demo/demo.jpg' result = inferencer(img, out_dir='./output') display.clear_output() from PIL import Image Image.open('./output/vis/demo.jpg')
Dog and Cat Detection
文件组织信息: - Dog-and-Cat-Detection
- annotations
- Cats_Test0.xml
- Cats_Test1.xml
- Cats_Test2.xml
- ...
- images
- Cats_Test0.png
- Cats_Test1.png
- Cats_Test2.png
- ...
kaggle
中在input
路径下的数据集是只读类型,不允许更改,并且标注文件为.xml
格式,需要转换,这里先将图片复制到./data/images
目录下import shutil
# 复制文件到工作目录
shutil.copytree('/kaggle/input/dog-and-cat-detection/images', './data/images')
.json
格式,我们将dog-and-cat-detection/annotations
文件夹中的.xml
文件转换为1个.json
文件。import xml.etree.ElementTree as ET import os import json coco = dict() coco['images'] = [] coco['type'] = 'instances' coco['annotations'] = [] coco['categories'] = [] category_set = dict() image_set = set() category_item_id = -1 image_id = 0 annotation_id = 0 def addCatItem(name): global category_item_id category_item = dict() category_item['supercategory'] = 'none' category_item_id += 1 category_item['id'] = category_item_id category_item['name'] = name coco['categories'].append(category_item) category_set[name] = category_item_id return category_item_id def addImgItem(file_name, size): global image_id if file_name is None: raise Exception('Could not find filename tag in xml file.') if size['width'] is None: raise Exception('Could not find width tag in xml file.') if size['height'] is None: raise Exception('Could not find height tag in xml file.') image_id += 1 image_item = dict() image_item['id'] = image_id image_item['file_name'] = file_name + ".png" image_item['width'] = size['width'] image_item['height'] = size['height'] coco['images'].append(image_item) image_set.add(file_name) return image_id def addAnnoItem(object_name, image_id, category_id, bbox): global annotation_id annotation_item = dict() annotation_item['segmentation'] = [] seg = [] seg.append(bbox[0]) seg.append(bbox[1]) seg.append(bbox[0]) seg.append(bbox[1] + bbox[3]) seg.append(bbox[0] + bbox[2]) seg.append(bbox[1] + bbox[3]) seg.append(bbox[0] + bbox[2]) seg.append(bbox[1]) annotation_item['segmentation'].append(seg) annotation_item['area'] = bbox[2] * bbox[3] annotation_item['iscrowd'] = 0 annotation_item['ignore'] = 0 annotation_item['image_id'] = image_id annotation_item['bbox'] = bbox annotation_item['category_id'] = category_id annotation_id += 1 annotation_item['id'] = annotation_id coco['annotations'].append(annotation_item) def parseXmlFiles(xml_path): for f in os.listdir(xml_path): if not f.endswith('.xml'): continue xmlname = f.split('.xml')[0] bndbox = dict() size = dict() current_image_id = None current_category_id = None file_name = None size['width'] = None size['height'] = None size['depth'] = None xml_file = os.path.join(xml_path, f) tree = ET.parse(xml_file) root = tree.getroot() if root.tag != 'annotation': raise Exception('pascal voc xml root element should be annotation, rather than {}'.format(root.tag)) for elem in root: current_parent = elem.tag current_sub = None object_name = None if elem.tag == 'folder': continue if elem.tag == 'filename': file_name = xmlname if file_name in category_set: raise Exception('file_name duplicated') elif current_image_id is None and file_name is not None and size['width'] is not None: if file_name not in image_set: current_image_id = addImgItem(file_name, size) else: raise Exception('duplicated image: {}'.format(file_name)) for subelem in elem: bndbox['xmin'] = None bndbox['xmax'] = None bndbox['ymin'] = None bndbox['ymax'] = None current_sub = subelem.tag if current_parent == 'object' and subelem.tag == 'name': object_name = subelem.text if object_name not in category_set: current_category_id = addCatItem(object_name) else: current_category_id = category_set[object_name] elif current_parent == 'size': if size[subelem.tag] is not None: raise Exception('xml structure broken at size tag.') size[subelem.tag] = int(subelem.text) for option in subelem: if current_sub == 'bndbox': if bndbox[option.tag] is not None: raise Exception('xml structure corrupted at bndbox tag.') bndbox[option.tag] = int(float(option.text)) if bndbox['xmin'] is not None: if object_name is None: raise Exception('xml structure broken at bndbox tag') if current_image_id is None: raise Exception('xml structure broken at bndbox tag') if current_category_id is None: raise Exception('xml structure broken at bndbox tag') bbox = [] bbox.append(bndbox['xmin']) bbox.append(bndbox['ymin']) bbox.append(bndbox['xmax'] - bndbox['xmin']) bbox.append(bndbox['ymax'] - bndbox['ymin']) addAnnoItem(object_name, current_image_id, current_category_id, bbox) os.makedirs('./data/annotations') xml_path = '/kaggle/input/dog-and-cat-detection/annotations' json_file = './data/annotations/annotations_all.json' parseXmlFiles(xml_path) json.dump(coco, open(json_file, 'w'))
- mmdetection
- data
- annotations
- annotations_all.json
- images
- Cats_Test0.png
- Cats_Test1.png
- Cats_Test2.png
- ....
- ...
mmyolo
项目文件中的一个脚本,将数据分为训练和测试集,先进入mmyolo
项目文件夹# 切换到mmyolo项目文件夹
%cd /kaggle/working/mmyolo
tools/misc/coco_split.py
,参数由上到下分别为: --json(生成的.json
文件路径);–out-dir(生成的切分.json
文件存储文件夹路径);–ratios 0.8 0.2(训练集、测试集占比);–shuffle(是否打乱顺序);–seed(随机数种子)# 切分训练、测试集
!python tools/misc/coco_split.py --json /kaggle/working/mmdetection/data/annotations/annotations_all.json \
--out-dir /kaggle/working/mmdetection/data/annotations \
--ratios 0.8 0.2 \
--shuffle \
--seed 2023
Split info: ======
Train ratio = 0.8, number = 2949
Val ratio = 0, number = 0
Test ratio = 0.2, number = 737
Set the global seed: 2023
shuffle dataset.
Saving json to /kaggle/working/mmdetection/data/annotations/trainval.json
Saving json to /kaggle/working/mmdetection/data/annotations/test.json
All done!
mmdetection
项目文件夹:%cd /kaggle/working/mmdetection
- mmdetection
- data
- annotations
- test.json
- trainval.json
- annotations_all.json
- images
- Cats_Test0.png
- Cats_Test1.png
- Cats_Test2.png
- ....
- ...
RTMDet
模型架构图可以在对应参数文件夹README.md
文档中找到。
可以在github
中打开configs/rtmdet/rtmdet_l_8xb32-300e_coco.py
配置文件(观察_base_值,若有继承关系,可以一直往上查找,直到找到主文件),这里RTMDet-l
型号模型已经是主文件了,可以直接查看。
我们要更改的主要就是_base_
(继承的上级文件)、data_root
(数据存储的文件夹)、train_batch_size_per_gpu
(每个GPU
训练的batch size
)、train_num_workers
(核心工作数,一般为n GPU x 4
)、max_epochs
(最大epoch
数)、base_lr
(基础学习率)、metainfo
(种类信息及各种类对应调色板)、train_dataloader
(图片路径及训练集标注信息)、val_dataloader
(图片路径及验证集标注信息)、val_evaluator
(验证集标注信息)、model
(冻结骨干网络stages
数,种类数)、param_scheduler
(学习率衰减趋势)、optim_wrapper
(学习率赋值)、default_hooks
(模型权重保存策略)、custom_hooks
(数据管道切换)、load_from
(预训练权重加载路径)、train_cfg
(赋值max_epochs
以及验证测量)、randomness
(固定随机数种子)、visualizer
(选择可视化平台)
配置文件最重要的就是metainfo
参数和model
参数,一定要检查分类数是否正确,以及调色板数量是否一致。注意:即使只有1类,metainfo
也要写成'classes': ('cat', ),
括号中的逗号一定要有,否则报错。model
中的bbox_head
也要和种类数一致。
学习率缩放一般遵循经验法则:base_lr_default * (your_bs / default_bs)
。从上面结构图中可以看到RTMDet
模型有4个stages
,model
配置中dict(backbone=dict(frozen_stages=4), bbox_head=dict(num_classes=2))
表示冻结了4个stages
,即骨干网络全冻结。
config_animals = """ # Inherit and overwrite part of the config based on this config _base_ = './rtmdet_l_8xb32-300e_coco.py' data_root = './data/' # dataset root train_batch_size_per_gpu = 24 train_num_workers = 4 max_epochs = 50 stage2_num_epochs = 6 base_lr = 0.000375 metainfo = { 'classes': ('cat', 'dog', ), 'palette': [ (252, 215, 99), (153, 197, 252), ] } train_dataloader = dict( batch_size=train_batch_size_per_gpu, num_workers=train_num_workers, dataset=dict( data_root=data_root, metainfo=metainfo, data_prefix=dict(img='images/'), ann_file='annotations/trainval.json')) val_dataloader = dict( batch_size=train_batch_size_per_gpu, num_workers=train_num_workers, dataset=dict( data_root=data_root, metainfo=metainfo, data_prefix=dict(img='images/'), ann_file='annotations/trainval.json')) test_dataloader = val_dataloader val_evaluator = dict(ann_file=data_root + 'annotations/trainval.json') test_evaluator = val_evaluator model = dict(backbone=dict(frozen_stages=4), bbox_head=dict(num_classes=2)) # learning rate param_scheduler = [ dict( type='LinearLR', start_factor=1.0e-5, by_epoch=False, begin=0, end=1000), dict( # use cosine lr from 10 to 20 epoch type='CosineAnnealingLR', eta_min=base_lr * 0.05, begin=max_epochs // 2, end=max_epochs, T_max=max_epochs // 2, by_epoch=True, convert_to_iter_based=True), ] train_pipeline_stage2 = [ dict(type='LoadImageFromFile', backend_args=None), dict(type='LoadAnnotations', with_bbox=True), dict( type='RandomResize', scale=(640, 640), ratio_range=(0.1, 2.0), keep_ratio=True), dict(type='RandomCrop', crop_size=(640, 640)), dict(type='YOLOXHSVRandomAug'), dict(type='RandomFlip', prob=0.5), dict(type='Pad', size=(640, 640), pad_val=dict(img=(114, 114, 114))), dict(type='PackDetInputs') ] # optimizer optim_wrapper = dict( _delete_=True, type='OptimWrapper', optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.05), paramwise_cfg=dict( norm_decay_mult=0, bias_decay_mult=0, bypass_duplicate=True)) default_hooks = dict( checkpoint=dict( interval=5, max_keep_ckpts=2, # only keep latest 2 checkpoints save_best='auto' ), logger=dict(type='LoggerHook', interval=20)) custom_hooks = [ dict( type='PipelineSwitchHook', switch_epoch=max_epochs - stage2_num_epochs, switch_pipeline=train_pipeline_stage2) ] # load COCO pre-trained weight load_from = './checkpoints/rtmdet_l_8xb32-300e_coco_20220719_112030-5a0be7c4.pth' train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=max_epochs, val_begin=20, val_interval=1) randomness = dict(seed=2023, deterministic=True, diff_rank_seed=False) visualizer = dict(vis_backends=[dict(type='LocalVisBackend'), dict(type='WandbVisBackend')]) """ with open('./configs/rtmdet/rtmdet_l_1xb4-100e_animals.py', 'w') as f: f.write(config_animals)
!python tools/train.py configs/rtmdet/rtmdet_l_1xb4-100e_animals.py
epoch = 50
时的精度 Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.952
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 1.000
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.995
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.800
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.919
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.959
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.964
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.965
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.965
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.800
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.939
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.970
07/10 07:35:26 - mmengine - INFO - bbox_mAP_copypaste: 0.952 1.000 0.995 0.800 0.919 0.959
07/10 07:35:27 - mmengine - INFO - Epoch(val) [50][123/123] coco/bbox_mAP: 0.9520 coco/bbox_mAP_50: 1.0000 coco/bbox_mAP_75: 0.9950 coco/bbox_mAP_s: 0.8000 coco/bbox_mAP_m: 0.9190 coco/bbox_mAP_l: 0.9590 data_time: 0.0532 time: 0.8068
wandb
平台,跟踪训练精度,并将各项指标进行可视化from mmdet.apis import DetInferencer
import glob
config = 'configs/rtmdet/rtmdet_l_1xb4-100e_animals.py'
checkpoint = glob.glob('./work_dirs/rtmdet_l_1xb4-100e_animals/best_coco*.pth')[0]
device = 'cuda:0'
inferencer = DetInferencer(config, checkpoint, device)
img = './data/images/Cats_Test1011.png'
result = inferencer(img, out_dir='./output', pred_score_thr=0.6)
display.clear_output()
Image.open('./output/vis/Cats_Test1011.png')
img = './data/images/Cats_Test1035.png'
result = inferencer(img, out_dir='./output', pred_score_thr=0.6)
display.clear_output()
Image.open('./output/vis/Cats_Test1035.png')
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。