当前位置:   article > 正文

【keras】一台设备上同时使用多张显卡训练同一个网络模型_如何两张显卡跑一个网络

如何两张显卡跑一个网络

Reference:


【简述-zzw】Keras同时用多张显卡训练网络

【知乎】如何让keras训练深度网络时使用两张显卡?


以 tensorflow 为后端,有两种方法可以在多张GPU上运行一个模型:数据并行和设备并行,参考keras中文文档

数据并行:


数据并行将目标模型在多个设备上各复制一份,并使用每个设备上的复制品处理整个数据集的不同部分数据。Keras在keras.utils.multi_gpu_model中提供有内置函数,该函数可以产生任意模型的数据并行版本,最高支持在8片GPU上并行。 请参考utils中的multi_gpu_model文档。 下面是一个例子:

  1. from keras.utils import multi_gpu_model
  2. # Replicates `model` on 8 GPUs.
  3. # This assumes that your machine has 8 available GPUs.
  4. parallel_model = multi_gpu_model(model, gpus=8)
  5. parallel_model.compile(loss='categorical_crossentropy',
  6. optimizer='rmsprop')
  7. # This `fit` call will be distributed on 8 GPUs.
  8. # Since the batch size is 256, each GPU will process 32 samples.
  9. parallel_model.fit(x, y, epochs=20, batch_size=256)


设备并行:

设备并行是在不同设备上运行同一个模型的不同部分,当模型含有多个并行结构,例如含有两个分支时,这种方式很适合。这种并行方法可以通过使用TensorFlow device scopes实现,下面是一个例子:

  1. # Model where a shared LSTM is used to encode two different sequences in parallel
  2. input_a = keras.Input(shape=(140, 256))
  3. input_b = keras.Input(shape=(140, 256))
  4. shared_lstm = keras.layers.LSTM(64)
  5. # Process the first sequence on one GPU
  6. with tf.device_scope('/gpu:0'):
  7. encoded_a = shared_lstm(tweet_a)
  8. # Process the next sequence on another GPU
  9. with tf.device_scope('/gpu:1'):
  10. encoded_b = shared_lstm(tweet_b)
  11. # Concatenate results on CPU
  12. with tf.device_scope('/cpu:0'):
  13. merged_vector = keras.layers.concatenate([encoded_a, encoded_b],
  14. axis=-1)


以keras框架使用两张GPU训练 inception_v4 模型为例:

  1. # -*- coding: utf-8 -*-
  2. import numpy as np
  3. from keras.models import Sequential
  4. from keras.layers import Input, Dense, Convolution2D, MaxPooling2D, AveragePooling2D, ZeroPadding2D, Dropout, Flatten, merge, Reshape, Activation
  5. from keras.layers.normalization import BatchNormalization
  6. from keras.models import Model
  7. from keras import backend as K
  8. from sklearn.metrics import log_loss
  9. # from load_cifar10 import load_cifar10_data
  10. from keras.preprocessing.image import ImageDataGenerator
  11. from keras import optimizers
  12. import keras
  13. import tensorflow as tf
  14. from keras.utils import multi_gpu_model
  15. def conv2d_bn(x, nb_filter, nb_row, nb_col,
  16. border_mode='same', subsample=(1, 1), bias=False):
  17. """
  18. Utility function to apply conv + BN.
  19. (Slightly modified from https://github.com/fchollet/keras/blob/master/keras/applications/inception_v3.py)
  20. """
  21. if K.image_dim_ordering() == "th":
  22. channel_axis = 1
  23. else:
  24. channel_axis = -1
  25. x = Convolution2D(nb_filter, nb_row, nb_col,
  26. subsample=subsample,
  27. border_mode=border_mode,
  28. bias=bias)(x)
  29. x = BatchNormalization(axis=channel_axis)(x)
  30. x = Activation('relu')(x)
  31. return x
  32. def block_inception_a(input):
  33. if K.image_dim_ordering() == "th":
  34. channel_axis = 1
  35. else:
  36. channel_axis = -1
  37. branch_0 = conv2d_bn(input, 96, 1, 1)
  38. branch_1 = conv2d_bn(input, 64, 1, 1)
  39. branch_1 = conv2d_bn(branch_1, 96, 3, 3)
  40. branch_2 = conv2d_bn(input, 64, 1, 1)
  41. branch_2 = conv2d_bn(branch_2, 96, 3, 3)
  42. branch_2 = conv2d_bn(branch_2, 96, 3, 3)
  43. branch_3 = AveragePooling2D((3,3), strides=(1,1), border_mode='same')(input)
  44. branch_3 = conv2d_bn(branch_3, 96, 1, 1)
  45. x = merge([branch_0, branch_1, branch_2, branch_3], mode='concat', concat_axis=channel_axis)
  46. return x
  47. def block_reduction_a(input):
  48. if K.image_dim_ordering() == "th":
  49. channel_axis = 1
  50. else:
  51. channel_axis = -1
  52. branch_0 = conv2d_bn(input, 384, 3, 3, subsample=(2,2), border_mode='valid')
  53. branch_1 = conv2d_bn(input, 192, 1, 1)
  54. branch_1 = conv2d_bn(branch_1, 224, 3, 3)
  55. branch_1 = conv2d_bn(branch_1, 256, 3, 3, subsample=(2,2), border_mode='valid')
  56. branch_2 = MaxPooling2D((3,3), strides=(2,2), border_mode='valid')(input)
  57. x = merge([branch_0, branch_1, branch_2], mode='concat', concat_axis=channel_axis)
  58. return x
  59. def block_inception_b(input):
  60. if K.image_dim_ordering() == "th":
  61. channel_axis = 1
  62. else:
  63. channel_axis = -1
  64. branch_0 = conv2d_bn(input, 384, 1, 1)
  65. branch_1 = conv2d_bn(input, 192, 1, 1)
  66. branch_1 = conv2d_bn(branch_1, 224, 1, 7)
  67. branch_1 = conv2d_bn(branch_1, 256, 7, 1)
  68. branch_2 = conv2d_bn(input, 192, 1, 1)
  69. branch_2 = conv2d_bn(branch_2, 192, 7, 1)
  70. branch_2 = conv2d_bn(branch_2, 224, 1, 7)
  71. branch_2 = conv2d_bn(branch_2, 224, 7, 1)
  72. branch_2 = conv2d_bn(branch_2, 256, 1, 7)
  73. branch_3 = AveragePooling2D((3,3), strides=(1,1), border_mode='same')(input)
  74. branch_3 = conv2d_bn(branch_3, 128, 1, 1)
  75. x = merge([branch_0, branch_1, branch_2, branch_3], mode='concat', concat_axis=channel_axis)
  76. return x
  77. def block_reduction_b(input):
  78. if K.image_dim_ordering() == "th":
  79. channel_axis = 1
  80. else:
  81. channel_axis = -1
  82. branch_0 = conv2d_bn(input, 192, 1, 1)
  83. branch_0 = conv2d_bn(branch_0, 192, 3, 3, subsample=(2, 2), border_mode='valid')
  84. branch_1 = conv2d_bn(input, 256, 1, 1)
  85. branch_1 = conv2d_bn(branch_1, 256, 1, 7)
  86. branch_1 = conv2d_bn(branch_1, 320, 7, 1)
  87. branch_1 = conv2d_bn(branch_1, 320, 3, 3, subsample=(2,2), border_mode='valid')
  88. branch_2 = MaxPooling2D((3, 3), strides=(2, 2), border_mode='valid')(input)
  89. x = merge([branch_0, branch_1, branch_2], mode='concat', concat_axis=channel_axis)
  90. return x
  91. def block_inception_c(input):
  92. if K.image_dim_ordering() == "th":
  93. channel_axis = 1
  94. else:
  95. channel_axis = -1
  96. branch_0 = conv2d_bn(input, 256, 1, 1)
  97. branch_1 = conv2d_bn(input, 384, 1, 1)
  98. branch_10 = conv2d_bn(branch_1, 256, 1, 3)
  99. branch_11 = conv2d_bn(branch_1, 256, 3, 1)
  100. branch_1 = merge([branch_10, branch_11], mode='concat', concat_axis=channel_axis)
  101. branch_2 = conv2d_bn(input, 384, 1, 1)
  102. branch_2 = conv2d_bn(branch_2, 448, 3, 1)
  103. branch_2 = conv2d_bn(branch_2, 512, 1, 3)
  104. branch_20 = conv2d_bn(branch_2, 256, 1, 3)
  105. branch_21 = conv2d_bn(branch_2, 256, 3, 1)
  106. branch_2 = merge([branch_20, branch_21], mode='concat', concat_axis=channel_axis)
  107. branch_3 = AveragePooling2D((3, 3), strides=(1, 1), border_mode='same')(input)
  108. branch_3 = conv2d_bn(branch_3, 256, 1, 1)
  109. x = merge([branch_0, branch_1, branch_2, branch_3], mode='concat', concat_axis=channel_axis)
  110. return x
  111. def inception_v4_base(input):
  112. if K.image_dim_ordering() == "th":
  113. channel_axis = 1
  114. else:
  115. channel_axis = -1
  116. # Input Shape is 299 x 299 x 3 (th) or 3 x 299 x 299 (th)
  117. net = conv2d_bn(input, 32, 3, 3, subsample=(2,2), border_mode='valid')
  118. net = conv2d_bn(net, 32, 3, 3, border_mode='valid')
  119. net = conv2d_bn(net, 64, 3, 3)
  120. branch_0 = MaxPooling2D((3,3), strides=(2,2), border_mode='valid')(net)
  121. branch_1 = conv2d_bn(net, 96, 3, 3, subsample=(2,2), border_mode='valid')
  122. net = merge([branch_0, branch_1], mode='concat', concat_axis=channel_axis)
  123. branch_0 = conv2d_bn(net, 64, 1, 1)
  124. branch_0 = conv2d_bn(branch_0, 96, 3, 3, border_mode='valid')
  125. branch_1 = conv2d_bn(net, 64, 1, 1)
  126. branch_1 = conv2d_bn(branch_1, 64, 1, 7)
  127. branch_1 = conv2d_bn(branch_1, 64, 7, 1)
  128. branch_1 = conv2d_bn(branch_1, 96, 3, 3, border_mode='valid')
  129. net = merge([branch_0, branch_1], mode='concat', concat_axis=channel_axis)
  130. branch_0 = conv2d_bn(net, 192, 3, 3, subsample=(2,2), border_mode='valid')
  131. branch_1 = MaxPooling2D((3,3), strides=(2,2), border_mode='valid')(net)
  132. net = merge([branch_0, branch_1], mode='concat', concat_axis=channel_axis)
  133. # 35 x 35 x 384
  134. # 4 x Inception-A blocks
  135. for idx in xrange(4):
  136. net = block_inception_a(net)
  137. # 35 x 35 x 384
  138. # Reduction-A block
  139. net = block_reduction_a(net)
  140. # 17 x 17 x 1024
  141. # 7 x Inception-B blocks
  142. for idx in xrange(7):
  143. net = block_inception_b(net)
  144. # 17 x 17 x 1024
  145. # Reduction-B block
  146. net = block_reduction_b(net)
  147. # 8 x 8 x 1536
  148. # 3 x Inception-C blocks
  149. for idx in xrange(3):
  150. net = block_inception_c(net)
  151. return net
  152. def inception_v4_model(img_rows, img_cols, color_type=1, num_classes=None, dropout_keep_prob=0.2):
  153. '''
  154. Inception V4 Model for Keras
  155. Model Schema is based on
  156. https://github.com/kentsommer/keras-inceptionV4
  157. ImageNet Pretrained Weights
  158. Theano: https://github.com/kentsommer/keras-inceptionV4/releases/download/2.0/inception-v4_weights_th_dim_ordering_th_kernels.h5
  159. TensorFlow: https://github.com/kentsommer/keras-inceptionV4/releases/download/2.0/inception-v4_weights_tf_dim_ordering_tf_kernels.h5
  160. Parameters:
  161. img_rows, img_cols - resolution of inputs
  162. channel - 1 for grayscale, 3 for color
  163. num_classes - number of class labels for our classification task
  164. '''
  165. # Input Shape is 299 x 299 x 3 (tf) or 3 x 299 x 299 (th)
  166. if K.image_dim_ordering() == 'th':
  167. inputs = Input((3, 299, 299))
  168. else:
  169. inputs = Input((299, 299, 3))
  170. # Make inception base
  171. net = inception_v4_base(inputs)
  172. # Final pooling and prediction
  173. # 8 x 8 x 1536
  174. net_old = AveragePooling2D((8,8), border_mode='valid')(net)
  175. # 1 x 1 x 1536
  176. net_old = Dropout(dropout_keep_prob)(net_old)
  177. net_old = Flatten()(net_old)
  178. # 1536
  179. predictions = Dense(output_dim=1001, activation='softmax')(net_old)
  180. model = Model(inputs, predictions, name='inception_v4')
  181. if K.image_dim_ordering() == 'th':
  182. # Use pre-trained weights for Theano backend
  183. weights_path = 'imagenet_models/inception-v4_weights_th_dim_ordering_th_kernels.h5'
  184. else:
  185. # Use pre-trained weights for Tensorflow backend
  186. weights_path = 'imagenet_models/inception-v4_weights_tf_dim_ordering_tf_kernels.h5'
  187. # weights_path = './InceptionV4_model_fold_01.h5'
  188. model.load_weights(weights_path, by_name=True)
  189. # Truncate and replace softmax layer for transfer learning
  190. # Cannot use model.layers.pop() since model is not of Sequential() type
  191. # The method below works since pre-trained weights are stored in layers but not in the model
  192. net_ft = AveragePooling2D((8,8), border_mode='valid')(net)
  193. net_ft = Dropout(dropout_keep_prob)(net_ft)
  194. net_ft = Flatten()(net_ft)
  195. predictions_ft = Dense(output_dim=num_classes, activation='softmax')(net_ft)
  196. model = Model(inputs, predictions_ft, name='inception_v4')
  197. return model
  198. if __name__ == '__main__':
  199. # import os
  200. # os.environ['CUDA_VISIBLE_DEVICES']='0'
  201. # dimensions of our images.
  202. # ADNI GM
  203. # X: 121*145
  204. # Y: 121*121
  205. # Z: 145*121
  206. # OASIS GM MRI
  207. # 176*208
  208. ### data_fold_01_train_val_test_entropy_keep_SliceNum_33
  209. img_width, img_height = 299, 299
  210. fold_name = "fold_01" ## data_fold_01_entropy_keep_SliceNum_33
  211. ## single_subject_data_fold_01_train_val_test_entropy_keep_SliceNum_81
  212. train_data_dir = 'single_subject_data_' + fold_name + '_train_val_test_entropy_keep_SliceNum_81/train'
  213. validation_data_dir = 'single_subject_data_' + fold_name + '_train_val_test_entropy_keep_SliceNum_81/validation'
  214. filepath="model_single_subject_InceptionV4_" + fold_name + "_train_val_test_entropy_keep_SliceNum_81_best.h5"
  215. # train num (AD+NC) = 36207 + 41796 = 78003
  216. # validation num (AD+NC) = 9477 + 11178 = 20655
  217. # test num (AD+NC) = 2673 + 2916 =
  218. # train_samples_AD = len(os.listdir(path))
  219. nb_train_samples = 78003
  220. nb_validation_samples = 20655
  221. epochs = 120
  222. batch_size = 64 #10 #40
  223. channel = 3
  224. num_classes = 2
  225. print("=== paramaters info ===")
  226. print("epochs = {}.".format(epochs))
  227. print("batch_size = {}.".format(batch_size))
  228. print("nb_train_samples = {}.".format(nb_train_samples))
  229. print("nb_validation_samples = {}.".format(nb_validation_samples))
  230. #if K.image_data_format() == 'channels_first':
  231. # input_shape = (3, img_width, img_height)
  232. #else:
  233. # input_shape = (img_width, img_height, 3)
  234. # this is the augmentation configuration we will use for training
  235. train_datagen = ImageDataGenerator(
  236. rescale=1. / 255,
  237. shear_range=0.2,
  238. zoom_range=0.2,
  239. horizontal_flip=True)
  240. # this is the augmentation configuration we will use for testing:
  241. # only rescaling
  242. test_datagen = ImageDataGenerator(rescale=1. / 255)
  243. ### class_mode: "categorical", "binary", "sparse"或None之一.
  244. ### 默认为"categorical: 该参数决定了返回的标签数组的形式, "categorical"会返回2D的one-hot编码标签,
  245. ### "binary"返回1D的二值标签.
  246. ### "sparse"返回1D的整数标签,
  247. ### 如果为None则不返回任何标签, 生成器将仅仅生成batch数据, 这种情况在使用model.predict_generator()和model.evaluate_generator()等函数时会用到.
  248. train_generator = train_datagen.flow_from_directory(
  249. train_data_dir,
  250. target_size=(img_width, img_height),
  251. batch_size=batch_size,
  252. class_mode='binary')
  253. validation_generator = test_datagen.flow_from_directory(
  254. validation_data_dir,
  255. target_size=(img_width, img_height),
  256. batch_size=batch_size,
  257. class_mode='binary')
  258. # Load our model
  259. model = inception_v4_model(img_height, img_width, channel, num_classes, dropout_keep_prob=0.2)
  260. parallel_model = multi_gpu_model(model, gpus=2)
  261. # Learning rate is changed to 0.001
  262. sgd = optimizers.SGD(lr=1e-3, decay=1e-6, momentum=0.9, nesterov=True)
  263. parallel_model.compile(optimizer=sgd, loss='sparse_categorical_crossentropy', metrics=['accuracy'])
  264. checkpoint = keras.callbacks.ModelCheckpoint(
  265. filepath = filepath,
  266. monitor='val_acc',
  267. verbose=1,
  268. save_best_only=True,
  269. # save_weights_only=False,
  270. mode='max',
  271. period=1
  272. )
  273. callbacks_list = [checkpoint]
  274. ### verbose:日志显示,0为不在标准输出流输出日志信息,1为输出进度条记录,2为每个epoch输出一行记录
  275. ###
  276. parallel_model.fit_generator(
  277. train_generator,
  278. steps_per_epoch=nb_train_samples/batch_size,
  279. epochs=epochs,
  280. verbose = 2,
  281. validation_data=validation_generator,
  282. validation_steps=nb_validation_samples/batch_size,
  283. callbacks = callbacks_list)
  284. #validation_steps=nb_validation_samples // batch_size)
  285. # model.save('InceptionV4_model_fold_01.h5')
  286. # Make predictions
  287. #predictions_valid = model.predict(X_valid, batch_size=batch_size, verbose=1)
  288. # Cross-entropy loss score
  289. #score = log_loss(Y_valid, predictions_valid)
  290. ### CUDA_VISIBLE_DEVICES=0 python inception_v4_train_val_test_entropy_keep_SliceNum_81_fold_01_single_subject.py > acc_inception_v4_train_val_test_entropy_keep_SliceNum_81_fold_01_single_subject.txt
  291. ### python inception_v4_train_val_test_entropy_keep_SliceNum_81_fold_01_single_subject.py > acc_single_subject_inception_v4_train_val_test_entropy_keep_SliceNum_81_fold_01.txt





注意:

上述代码使用

  1. # parallel_model.fit_generator(
  2. # train_generator,
  3. # steps_per_epoch=nb_train_samples/batch_size,
  4. # epochs=epochs,
  5. # verbose = 2,
  6. # validation_data=validation_generator,
  7. # validation_steps=nb_validation_samples/batch_size,
  8. # callbacks = callbacks_list)

会报错:

TypeError: can't pickle NotImplementedType objects

去掉 callbacks 即可,如下所示:

  1. parallel_model.fit_generator(
  2. train_generator,
  3. steps_per_epoch=nb_train_samples/batch_size,
  4. epochs=epochs,
  5. verbose = 2,
  6. validation_data=validation_generator,
  7. validation_steps=nb_validation_samples/batch_size)






声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/article/detail/53768?site
推荐阅读
相关标签
  

闽ICP备14008679号