赞
踩
为了有效监测驾驶员是否疲劳驾驶、避免交通事故的发生,提出了一种利用人脸特征点进行实时疲劳驾驶检测的新方法。对驾驶员驾驶时的面部图像进行实时监控,首先检测人脸,并利用ERT算法定位人脸特征点;然后根据人脸眼睛区域的特征点坐标信息计算眼睛纵横比EAR来描述眼睛张开程度,根据合适的EAR阈值可判断睁眼或闭眼状态;最后基于EAR实测值和EAR阈值对监控视频计算闭眼时间比例(PERCLOS)值度量驾驶员主观疲劳程度,将其与设定的疲劳度阈值进行比较即可判定是否疲劳驾驶。
人脸特征点检测基于该类库实现(https://github.com/610265158/Peppa_Pig_Face_Engine), 我尝试过很多开源框架,包括 dlib, openface,pfld,clnf 等,在闭眼检测方面,表现都不是十分理想,后来发现这个类库,哇,眼前一亮,检测的很牛逼,而且很稳定。在 i7 八代的cpu上,识别一帧大概平均在40ms左右( 因为项目主机上没有gpu ,所以没有测试过gpu的检测速度 ,但应该很快), 详细参考该作者文章:人脸关键点检测 face keypoint detect_小羊苏西的博客-CSDN博客_keypoint 检测, 写的真的很牛逼,不牛逼你找我。
在我的数据集上检测闭眼的效果图:
因为开源数据集中包含闭眼的数据太少了,所以需要我们自己手动增加 ,这里我使用的是 dlib 的标注工具。
1. 标注数据集 , 因为主要检测眼睛和头部姿态,所以我标注了37个点。(大概标注了5000多张)
** 标注工具: (由于标注过程中经常出错,所以增加了撤销等功能)
2. 标注完成后,会生成一个xml文件,里面包含所有的标注信息 (最好检测下,不要标记少点或者多点情况),然后做下面操作
1. 打乱顺序
imglab --shuffle dataset_0402.xml
2. 分隔数据集 ( 训练集 和 测试集 )
imglab --split-train-test 0.95 dataset_0402.xml
3. 还可以 翻转数据集 、去除相似样本等操作
- imglab --rmdupes xml/mydataset.xml ## 去除相似样本
- imglab --flip xml/mydataset.xml ## 翻转图片
4. 更多详细操作参考: 【AI】dlib中图像标注工具 imglab 详细说明_郭老二-CSDN博客_imglab
3. 将训练集和测试集转换成作者提供的格式
- import json
- from xml.dom.minidom import parse
- from tqdm import tqdm
-
-
- def json_to_txt(json_file, txt_file):
- txt_file = open(txt_file, mode='w')
- with open(json_file, 'r') as f:
- data = json.load(f)
- tmp_str = ""
- for sub_data in data:
- file_name = sub_data['image_path']
-
- tmp_str += file_name + '|'
-
- key_points = sub_data['keypoints']
-
- for points in key_points:
- tmp_str = tmp_str + str(points[0]) + ' ' + str(points[1]) + ' '
- tmp_str = tmp_str + '\n'
-
- txt_file.write(tmp_str)
-
-
- def read_xml_to_json(path, out_file_path):
- domTree = parse(path)
- # 文档根元素
- rootNode = domTree.documentElement
- images = rootNode.getElementsByTagName("image")
- with open(out_file_path, 'w') as f:
- train_json_list = []
- for image in tqdm(images):
- one_image_ann = {}
- if image.hasAttribute("file"):
- info = ""
- # 文件路径
- file_path = image.getAttribute("file")
- print("path:" + file_path)
-
- one_image_ann['image_path'] = file_path
-
- box = image.getElementsByTagName("box")
-
- top = box[0].getAttribute("top")
- left = box[0].getAttribute("left")
- width = box[0].getAttribute("width")
- height = box[0].getAttribute("height")
-
- print("top:" + top + " left:" + left + " width:" + width + " height:" + height)
- bbox = [float(top), float(left), float(width), float(height)]
-
- parts = box[0].getElementsByTagName("part")
-
- if len(parts) == 0:
- continue
- key = []
- for part in parts:
- key.append([float(part.getAttribute("x")), float(part.getAttribute("y"))])
- print("x:" + part.getAttribute("x") + " y:" + part.getAttribute("y"))
-
- one_image_ann['keypoints'] = key
- one_image_ann['bbox'] = bbox
- one_image_ann['attr'] = None
-
- train_json_list.append(one_image_ann)
-
- json.dump(train_json_list, f, indent=2)
-
-
- def read_xml_to_txt(path, out_txt_file_path):
- domTree = parse(path)
- # 文档根元素
- rootNode = domTree.documentElement
- images = rootNode.getElementsByTagName("image")
- with open(out_txt_file_path, 'w') as f:
- txt_str = ""
- for image in tqdm(images):
- if image.hasAttribute("file"):
- # 文件路径
- file_path = image.getAttribute("file")
- txt_str += file_path + '|'
- # print("path:" + file_path)
-
- box = image.getElementsByTagName("box")
- parts = box[0].getElementsByTagName("part")
-
- if len(parts) == 0:
- continue
-
- key = []
- for part in parts:
- key.append([float(part.getAttribute("x")), float(part.getAttribute("y"))])
- txt_str = txt_str + str(float(part.getAttribute("x"))) + ' ' + str(
- float(part.getAttribute("y"))) + ' '
- # print("x:" + part.getAttribute("x") + " y:" + part.getAttribute("y"))
-
- txt_str = txt_str + '\n'
-
- f.write(txt_str)
-
-
- if __name__ == '__main__':
- data_path = ["data/test.xml", "data/train.xml"]
- out_path = ["data/test.txt", "data/train.txt"]
- for path, out in zip(data_path, out_path):
- read_xml_to_txt(path, out)

4. 配置训练参数
1. 修改特征点下标,因为作者使用的数据集是基于68点的,所以说下标肯定是不同的,如果你标注点的顺序和作者使用的数据 集一样,则不需要更改。
2. 配置数据集训练和测试路径以及其他的一些参数,参考 train_config.py。
5. 修改数据读取方式
1. 如果直接开始训练,会报这个错: Can't pickle local object 'DataFromGenerator.init..' , 因为作者是在linux下训练的,而我是在windows下训练,可能window不支持这个多线程预加载。如果我们将这个 ds = MultiProcessPrefetchData(ds, self.prefetch_size, self.process_num) 注释掉,重新训练,会发现训练速度很慢,迭代10次,大概需要耗时 1min , gpu 利用率大概在 2% 左右,我猜可能是因为我们没有做预加载处理,导致大部分时间都耗时在读取数据上。
2. 通过 tfrecord 方式来读取数据,这样可以大大加快我们的训练速度,在 mx150的gpu上测试,10迭代耗时大概在5s左右,提高了至少10倍的速度,但唯一难受的是,生成的 record 文件太大了(解决文件太大,看后面)。 下面是实现过程:
1. 生成 tfrecord 文件
- # 写这段代码的时候,只有上帝和我知道它是干嘛的
- # 现在,只有上帝知道
- # @File : generate_tfrecord.py
- # @Time : 2020/4/24 14:10
- # @Author : J.
- # @desc : 生成 tfrecord 文件
-
- import tensorflow as tf
- from lib.dataset.dataietr import FaceKeypointDataIter
- from train_config import config as cfg
- from tqdm import tqdm
- import argparse
- import sys
-
-
- def int64_feature(value):
- return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
-
-
- def bytes_feature(value):
- return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
-
-
- def create_tf_example(image_file, is_train):
- crop_image, label = _train_data._map_func(image_file, is_train)
- tf_example = tf.train.Example(
- features=tf.train.Features(
- feature={
- 'image': bytes_feature(crop_image.tobytes()),
- 'label': bytes_feature(label.tobytes())
- }
- ))
- return tf_example
-
-
- def generate_tfrecord(images_files, record_path, is_train=True):
- num_tf_example = 0
- writer = tf.python_io.TFRecordWriter(record_path)
- with tqdm(images_files, ncols=100) as files:
- for image in files:
- tf_example = create_tf_example(image, is_train)
- writer.write(tf_example.SerializeToString())
- num_tf_example += 1
- # if num_tf_example % 100 == 0:
- # print("Create %d TF_Example" % num_tf_example)
- writer.close()
- print("{} tf_examples has been created successfully, which are saved in {}".format(num_tf_example, record_path))
-
-
- def main(_):
- global _train_data
- global _val_data
- _train_data = FaceKeypointDataIter(cfg.TRAIN.batch_size, cfg.TRAIN.epoch, cfg.DATA.root_path,
- FLAGS.train_data,
- True)
-
- _val_data = FaceKeypointDataIter(cfg.TRAIN.batch_size, cfg.TRAIN.epoch, cfg.DATA.root_path,
- FLAGS.val_data,
- False)
-
- print("================== generate train tf_record start ===================")
- train_images_files = _train_data.get_parse_file()
- generate_tfrecord(train_images_files, FLAGS.train_save_path, True)
- print("================== generate train tf_record end ===================")
-
- print("================== generate val tf_record start ===================")
- val_images_files = _val_data.get_parse_file()
- generate_tfrecord(val_images_files, FLAGS.val_save_path, False)
- print("================== generate val tf_record end ===================")
-
-
- if __name__ == '__main__':
- parser = argparse.ArgumentParser()
- parser.add_argument(
- '--train_data',
- type=str,
- default='data/train.txt',
- help='训练数据.')
-
- parser.add_argument(
- '--val_data',
- type=str,
- default='data/test.txt',
- help='验证数据.')
-
- parser.add_argument(
- '--train_save_path',
- type=str,
- default='record/train.record',
- help='生成训练数据路径.')
-
- parser.add_argument(
- '--val_save_path',
- type=str,
- default='record/val.record',
- help='生成验证数据路径.')
-
- FLAGS, unparsed = parser.parse_known_args()
- tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)

2. 读取 record 数据
- class read_face_data():
-
- def __init__(self, tfrecord_path, batch_size, out_channel, win, hin, num_threads):
- self.tfrecord_path = tfrecord_path
- self.batch_size = batch_size
- self.win = win
- self.hin = hin
- self.num_threads = num_threads
- self.out_channel = out_channel
-
- def read_and_decode(self):
- filename_queue = tf.train.string_input_producer([self.tfrecord_path], shuffle=False)
- reader = tf.TFRecordReader()
- _, serialized_example = reader.read(filename_queue)
- features = tf.parse_single_example(serialized_example,
- features={
- 'image': tf.FixedLenFeature([], tf.string),
- 'label': tf.FixedLenFeature([], tf.string)
- })
-
- images = tf.decode_raw(features['image'], tf.float32)
- images = tf.reshape(images, [self.win, self.hin, 3])
-
- labels = tf.decode_raw(features['label'], tf.float32)
- labels = tf.reshape(labels, [self.out_channel])
-
- # capacity:队列中元素的最大数量
- # min_after_dequeue出队后队列中元素的最小数量,用于确保元素的混合级别
- _images, _labels = tf.train.shuffle_batch([images, labels],
- num_threads=self.num_threads,
- batch_size=self.batch_size,
- capacity=self.batch_size * 2,
- min_after_dequeue=self.batch_size)
-
- return _images, _labels

3. 修改 net_work.py
修改 loop 、_train、_val 方法
- def loop(self, ):
-
- self.build()
- self.load_weight()
-
- sess = tf.Session()
-
- train_face_data = read_face_data(cfg.DATA.train_txt_path, cfg.TRAIN.batch_size, cfg.MODEL.out_channel,
- cfg.MODEL.win, cfg.MODEL.hin, 8)
- self.train_image_data, self.train_label_data = train_face_data.read_and_decode()
-
- val_face_data = read_face_data(cfg.DATA.val_txt_path, cfg.TRAIN.batch_size, cfg.MODEL.out_channel,
- cfg.MODEL.win, cfg.MODEL.hin, 8)
- self.val_image_data, self.val_label_data = val_face_data.read_and_decode()
- init_op = tf.global_variables_initializer()
- sess.run(init_op)
- ## 启动多线程处理输入数据
- coord = tf.train.Coordinator()
- threads = tf.train.start_queue_runners(sess=sess, coord=coord)
- with self._graph.as_default():
- # Create a saver.
- self.saver = tf.train.Saver(tf.global_variables(), max_to_keep=None)
-
- # Build the summary operation from the last tower summaries.
- self.summary_op = tf.summary.merge(self.summaries)
-
- self.summary_writer = tf.summary.FileWriter(cfg.MODEL.model_path, self._sess.graph)
-
- # epoch 2000
- min_loss_control = 1000.
- for epoch in range(cfg.TRAIN.epoch):
- self._train(epoch, sess)
- val_loss = self._val(epoch, sess)
- logger.info('**************'
- 'val_loss %f ' % (val_loss))
-
- # tmp_model_name=cfg.MODEL.model_path + \
- # 'epoch_' + str(epoch ) + \
- # 'L2_' + str(cfg.TRAIN.weight_decay_factor) + \
- # '.ckpt'
- # logger.info('save model as %s \n'%tmp_model_name)
- # self.saver.save(self.sess, save_path=tmp_model_name)
-
- if 1:
- min_loss_control = val_loss
- low_loss_model_name = cfg.MODEL.model_path + \
- 'epoch_' + str(epoch) + \
- 'L2_' + str(cfg.TRAIN.weight_decay_factor) + '.ckpt'
- logger.info('A new low loss model saved as %s \n' % low_loss_model_name)
- self.saver.save(self._sess, save_path=low_loss_model_name)
-
- self._sess.close()
- sess.close()
- coord.request_stop()
- coord.join(threads)
-
- def _train(self, _epoch, sess):
- # config.TRAIN.train_set_size // config.TRAIN.num_gpu // config.TRAIN.batch_size
- for step in range(cfg.TRAIN.iter_num_per_epoch):
- self.ite_num += 1
- start_time = time.time()
- # 64 * 160 * 160 *3 64*143
- example_images, example_labels = sess.run([self.train_image_data, self.train_label_data])
- # example_images, example_labels = next(self.train_ds)
- # example_images = train_iter_data['image']
- # example_labels = train_iter_data['label']
-
- ########show_flag check the data
- if cfg.TRAIN.vis:
- for i in range(cfg.TRAIN.batch_size):
- example_image = example_images[i, :, :, :] / 255.
- example_label = example_labels[i, :]
-
- Landmark = example_label[0:136]
- cla = example_label[136:]
-
- # print(np.max(example_image))
- # print(np.min(example_image))
- # print(Landmark)
- print(cla)
- Landmark = Landmark.reshape([-1, 2])
- _h, _w, _ = example_image.shape
- for _index in range(Landmark.shape[0]):
- x_y = Landmark[_index]
- cv2.circle(example_image, center=(int(x_y[0] * _w), int(x_y[1] * _w)), color=(122, 122, 122),
- radius=1, thickness=1)
-
- # cv2.putText(img_show, 'left_eye:open', (xmax, ymin),
- # cv2.FONT_HERSHEY_SIMPLEX, 1,
- # (255, 0, 255), 2)
- cv2.namedWindow('img', 0)
- cv2.imshow('img', example_image)
- cv2.waitKey(0)
-
- fetch_duration = time.time() - start_time
-
- feed_dict = {}
- for n in range(cfg.TRAIN.num_gpu):
- feed_dict[self.inputs[0][n]] = example_images[n * cfg.TRAIN.batch_size:(n + 1) * cfg.TRAIN.batch_size,
- :, :, :]
- feed_dict[self.inputs[1][n]] = example_labels[n * cfg.TRAIN.batch_size:(n + 1) * cfg.TRAIN.batch_size,
- :]
-
- feed_dict[self.inputs[2]] = True
- _, total_loss_value, loss_value, leye_loss_value, reye_loss_value, mouth_loss_value, \
- leye_cla_accuracy_value, reye_cla_accuracy_value, mouth_cla_accuracy_value, l2_loss_value, learn_rate, = \
- self._sess.run([*self.outputs],
- feed_dict=feed_dict)
-
- duration = time.time() - start_time
- run_duration = duration - fetch_duration
- if self.ite_num % cfg.TRAIN.log_interval == 0:
- num_examples_per_step = cfg.TRAIN.batch_size * cfg.TRAIN.num_gpu
- examples_per_sec = num_examples_per_step / duration
- sec_per_batch = duration / cfg.TRAIN.num_gpu
-
- format_str = ('epoch %d: iter %d, '
- 'total_loss=%.6f '
- 'loss=%.6f '
- 'leye_loss=%.6f '
- 'reye_loss=%.6f '
- 'mouth_loss=%.6f '
- 'leye_acc=%.6f '
- 'reye_acc=%.6f '
- 'mouth_acc=%.6f '
- 'l2_loss=%.6f '
- 'learn_rate =%e '
- '(%.1f examples/sec; %.3f sec/batch) '
- 'fetch data time = %.6f'
- 'run time = %.6f')
- logger.info(format_str % (_epoch,
- self.ite_num,
- total_loss_value,
- loss_value,
- leye_loss_value,
- reye_loss_value,
- mouth_loss_value,
- leye_cla_accuracy_value,
- reye_cla_accuracy_value,
- mouth_cla_accuracy_value,
- l2_loss_value,
- learn_rate,
- examples_per_sec,
- sec_per_batch,
- fetch_duration,
- run_duration))
-
- if self.ite_num % 100 == 0:
- summary_str = self._sess.run(self.summary_op, feed_dict=feed_dict)
- self.summary_writer.add_summary(summary_str, self.ite_num)
-
- def _val(self, _epoch, sess):
- all_total_loss = 0
- for step in range(cfg.TRAIN.val_iter):
-
- # example_images, example_labels = next(self.val_ds) # 在会话中取出image和label
- example_images, example_labels = sess.run([self.val_image_data, self.val_label_data])
-
- feed_dict = {}
- for n in range(cfg.TRAIN.num_gpu):
- feed_dict[self.inputs[0][n]] = example_images[n * cfg.TRAIN.batch_size:(n + 1) * cfg.TRAIN.batch_size,
- :, :, :]
- feed_dict[self.inputs[1][n]] = example_labels[n * cfg.TRAIN.batch_size:(n + 1) * cfg.TRAIN.batch_size,
- :]
- feed_dict[self.inputs[2]] = False
- total_loss_value, loss_value, leye_loss_value, reye_loss_value, mouth_loss_value, \
- leye_cla_accuracy_value, reye_cla_accuracy_value, mouth_cla_accuracy_value, l2_loss_value, learn_rate = \
- self._sess.run([*self.val_outputs],
- feed_dict=feed_dict)
-
- all_total_loss += total_loss_value - l2_loss_value
-
- return all_total_loss / cfg.TRAIN.val_iter

4. 重新开始训练,速度绝对飞起,gpu利用率达到 95%, 10个迭代大概在 5s 左右。
5. 我训练最终 loss 大概在 5.5左右 ,作者大概在 3 左右,可能训练参数还有待优化。然后将生成的模型转换成pb文件,找几张图片或者视频,验证下识别效果。
** 关于解决 tfrecord 文件太大问题:
当我们数据集增强后,有可能导致tfrecord文件太大,我试过大约写入100万多条的数据,tfrecord 文件大约接近 270G。
解决:
1. 压缩数据
- # 压缩数据
- writer_options = tf.python_io.TFRecordOptions(
- tf.python_io.TFRecordCompressionType.ZLIB)
- writer = tf.python_io.TFRecordWriter(record_path, options=writer_options)
-
-
- # 解压缩数据
- tfrecord_options = tf.python_io.TFRecordOptions(tf.python_io.TFRecordCompressionType.ZLIB)
2. 将数据分成多个record文件保存,读取时,只需要将多个record文件的路径列表交给“tf.train.string_input_producer”
参考: TensorFlow高效读取数据的方法——TFRecord的学习 - c# - 皮皮看书
- with open(self.tfrecord_path, "r") as f:
- lines = f.readlines()
- files_list = []
- for line in lines:
- files_list.append(line.rstrip())
-
- filename_queue = tf.train.string_input_producer(files_list, shuffle=False)
*** 压缩后,10000条数据,大约 700多M
通过识别的特征点,计算眼睛的最小的距离,来判断是否属于闭眼状态,然后定义单位时间内 (一般取1 分钟或者 30 秒) 眼睛闭合一定比例 (70%或80%) 所占的时间,来判断是否发生了瞌睡,即PERCLOS值。
改编后的训练源码(tf1) : face_landmark-tf1.rar-互联网文档类资源-CSDN下载
*** 如果 tfrecord 文件太大 ,可压缩、拆分数据集等办法解决。
识别 (参考作者源码): https://github.com/610265158/Peppa_Pig_Face_Engine
(dlib)标注工具 : FeatureTool.rar-互联网文档类资源-CSDN下载
*** 增加撤销 ,删除当前图片等功能,方便标注。
本人属于小白一枚,很多地方懂的也不是很多,平时喜欢瞎搞搞,所以希望大家有什么好的建议或想法什么的,欢迎在下面留言,不对的地方,大家多多指正。
附几张检测效果图 :
** 在闭眼检测方面,表现十分优秀。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。