当前位置:   article > 正文

关于迁移学习的方法

迁移学习

目录

1. 迁移学习介绍

2. 迁移学习方式

2.1 载入所有参数

2.2 载入参数,只训练最后的参数

2.3 载入后,添加新层,对新层训练

2.3.1 查看网络结构

2.3.2 更改分类层 

2.3.3 自定义网络 resnet 的演示

3.  查看网络的参数值


1. 迁移学习介绍

迁移学习可以帮助我们快速实现一个网络

通过载入预训练好的模型,然后针对不同的项目进行微调,比从头训练的网络收敛要快得多

常见的有下面几种方式:

 

2. 迁移学习方式

下面对上面三种的方法进行实现

迁移学习:需要保证自定义的模型和预训练的模型一样,否则载入权重的位置会对应不上

2.1 载入所有参数

测试代码用的是resnet网络:

  1. # 构建网络
  2. net = resnet34() # 不需要设定参数
  3. pre_model = './resnet_pre_model.pth' # 预训练权重
  4. missing_keys,unexpected_keys = net.load_state_dict(torch.load((pre_model)),strict=False)

这样,就可以载入预训练模型,然后再进行训练了。

其实这个就是权重的初始化,平时用的是随机的初始化,现在用的是别人训练好的,可以实际应用的初始化

2.2 载入参数,只训练最后的参数

这里用torchvision演示:

  1. from torchvision.models import AlexNet
  2. # Alex
  3. model = AlexNet(num_classes=10)
  4. '''
  5. AlexNet(
  6. (features): Sequential(
  7. (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
  8. (1): ReLU(inplace=True)
  9. (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  10. (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
  11. (4): ReLU(inplace=True)
  12. (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  13. (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  14. (7): ReLU(inplace=True)
  15. (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  16. (9): ReLU(inplace=True)
  17. (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  18. (11): ReLU(inplace=True)
  19. (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  20. )
  21. (avgpool): AdaptiveAvgPool2d(output_size=(6, 6))
  22. (classifier): Sequential(
  23. (0): Dropout(p=0.5, inplace=False)
  24. (1): Linear(in_features=9216, out_features=4096, bias=True)
  25. (2): ReLU(inplace=True)
  26. (3): Dropout(p=0.5, inplace=False)
  27. (4): Linear(in_features=4096, out_features=4096, bias=True)
  28. (5): ReLU(inplace=True)
  29. (6): Linear(in_features=4096, out_features=10, bias=True)
  30. )
  31. )
  32. '''

因为这里使用的是pytorch的反向传播,而是否可以反向传播就是这个参数的requires_grad 是否等于True 。只训练最后参数的话,只需要把前面参数的requires_grad设定为False,冻结就行了

显示网络的参数:

  1. for name,para in model.named_parameters():
  2. print(name)
  3. '''features.0.weight
  4. features.0.bias
  5. features.3.weight
  6. features.3.bias
  7. features.6.weight
  8. features.6.bias
  9. features.8.weight
  10. features.8.bias
  11. features.10.weight
  12. features.10.bias
  13. classifier.1.weight
  14. classifier.1.bias
  15. classifier.4.weight
  16. classifier.4.bias
  17. classifier.6.weight
  18. classifier.6.bias
  19. '''

这里我们只需要将特征提取层,也就是卷积层冻结,然后全连接层不变就行了

操作如下:

  1. for name,para in model.named_parameters():
  2. if 'features' in name:
  3. para.requires_grad = False

这里冻结和没冻结前做个对比:

冻结前:

 冻结后:

 

优化器中也要改变,需要过滤掉不需要反向传播的参数

优化器中过滤冻结的参数:

  1. from torchvision.models import AlexNet
  2. from torch import optim
  3. # Alex
  4. model = AlexNet(num_classes=10)
  5. for name,para in model.named_parameters():
  6. if 'features' in name:
  7. para.requires_grad = False
  8. para = [p for p in model.parameters() if p.requires_grad]
  9. optimizer = optim.Adam(para,lr=0.001)
  10. '''
  11. [Parameter containing:
  12. tensor([[ 0.0024, 0.0097, -0.0038, ..., 0.0044, 0.0014, 0.0096],
  13. [-0.0078, -0.0014, 0.0031, ..., -0.0010, 0.0051, 0.0012],
  14. [-0.0088, 0.0054, -0.0075, ..., -0.0026, 0.0085, -0.0103],
  15. ...,
  16. [-0.0090, 0.0065, 0.0046, ..., -0.0077, 0.0068, -0.0092],
  17. [ 0.0051, 0.0075, 0.0015, ..., -0.0072, -0.0044, 0.0077],
  18. [ 0.0060, 0.0079, 0.0010, ..., 0.0066, 0.0044, -0.0006]],
  19. requires_grad=True), Parameter containing:
  20. tensor([-0.0072, 0.0021, -0.0079, ..., -0.0045, -0.0031, 0.0052],
  21. requires_grad=True), Parameter containing:
  22. tensor([[-7.3999e-03, 7.5480e-03, -7.0330e-03, ..., -2.3227e-03,
  23. -1.0509e-03, -1.0634e-02],
  24. [ 1.4005e-02, 1.0355e-02, 4.3921e-03, ..., 5.6021e-03,
  25. -5.4067e-03, 8.2123e-03],
  26. [ 1.1953e-02, 7.0178e-03, 6.5284e-05, ..., 9.9544e-03,
  27. 1.2050e-02, -2.8193e-03],
  28. ...,
  29. [-1.2271e-02, 2.8609e-03, 1.5023e-02, ..., -1.2590e-02,
  30. 3.6282e-03, -1.5037e-03],
  31. [-1.1178e-02, -6.8283e-03, -1.5380e-02, ..., 9.1631e-03,
  32. -8.2415e-04, -1.0820e-02],
  33. [ 3.5226e-03, 8.1489e-04, 1.4744e-02, ..., 3.8180e-03,
  34. 7.2305e-03, -4.8745e-03]], requires_grad=True), Parameter containing:
  35. tensor([-0.0024, 0.0106, 0.0019, ..., -0.0047, -0.0113, 0.0155],
  36. requires_grad=True), Parameter containing:
  37. tensor([[-8.0021e-03, -5.3036e-03, -7.7326e-03, ..., -6.2924e-03,
  38. 1.0251e-02, -1.3929e-02],
  39. [-1.0562e-02, 1.5300e-02, -1.3457e-02, ..., 4.8542e-03,
  40. -1.2721e-02, -2.1716e-03],
  41. [ 1.2303e-02, 3.4304e-03, -1.3099e-02, ..., -6.2512e-03,
  42. -7.1608e-03, -5.3249e-03],
  43. ...,
  44. [ 1.4954e-02, -9.6393e-03, 1.3907e-02, ..., 2.4139e-03,
  45. -2.5765e-03, -4.9496e-05],
  46. [ 7.0794e-03, -5.5391e-03, -1.1280e-02, ..., 2.3952e-03,
  47. 4.2578e-03, 7.0075e-03],
  48. [ 1.0447e-02, -8.3530e-03, 5.4398e-03, ..., -1.4187e-03,
  49. 1.2113e-02, 1.0778e-02]], requires_grad=True), Parameter containing:
  50. tensor([-0.0004, -0.0154, -0.0008, -0.0101, 0.0106, 0.0130, -0.0051, 0.0056,
  51. -0.0152, -0.0006], requires_grad=True)]
  52. '''

2.3 载入后,添加新层,对新层训练

预训练的模型,分类的结果可能不是我们需要的,例如预训练是1000分类,我们做的是10分类

所以这里,需要将最后一层改变

注:这里是将最后一个分类层改变,没有在原有的基础上添加新层

2.3.1 查看网络结构

需要改变的话,那肯定得需要查看原有的网络结构

  1. from torchvision.models import AlexNet
  2. # Alex
  3. model = AlexNet(num_classes=10)
  4. '''
  5. AlexNet(
  6. (features): Sequential(
  7. (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
  8. (1): ReLU(inplace=True)
  9. (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  10. (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
  11. (4): ReLU(inplace=True)
  12. (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  13. (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  14. (7): ReLU(inplace=True)
  15. (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  16. (9): ReLU(inplace=True)
  17. (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  18. (11): ReLU(inplace=True)
  19. (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  20. )
  21. (avgpool): AdaptiveAvgPool2d(output_size=(6, 6))
  22. (classifier): Sequential(
  23. (0): Dropout(p=0.5, inplace=False)
  24. (1): Linear(in_features=9216, out_features=4096, bias=True)
  25. (2): ReLU(inplace=True)
  26. (3): Dropout(p=0.5, inplace=False)
  27. (4): Linear(in_features=4096, out_features=4096, bias=True)
  28. (5): ReLU(inplace=True)
  29. (6): Linear(in_features=4096, out_features=10, bias=True)
  30. )
  31. )
  32. '''

这里找网络特定层就是逐个遍历的方法

例如这里要找到最后一个分类层,首先它在classifier的6里面,然后classifier又在Alexnet下面

记住,这里的Alexnet就是自己实例化的网络,这里是model

所以这里要找到最后一个Linear的方法就是,model.classifier[6]

2.3.2 更改分类层 

那更改的话,直接将out_feartures更改就行了,也就是model.classifier[6].out_features = 10

 

当然,也可以用自定义的Linear 替换掉,两种方法是一样的

 

在这里基础上,进行冻结,就和之前的操作一样的,这里不再演示

2.3.3 自定义网络 resnet 的演示

首先,一定要查看网络的结构,因为同一个网络,可能实现方式是不一样的

  1. from model import resnet34
  2. net = resnet34()
  3. print(net)
  4. '''
  5. ResNet(
  6. (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  7. (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  8. (relu): ReLU(inplace=True)
  9. (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  10. (layer1): Sequential(
  11. (0): BasicBlock(
  12. (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  13. (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  14. (relu): ReLU()
  15. (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  16. (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  17. )
  18. (1): BasicBlock(
  19. (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  20. (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  21. (relu): ReLU()
  22. (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  23. (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  24. )
  25. (2): BasicBlock(
  26. (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  27. (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  28. (relu): ReLU()
  29. (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  30. (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  31. )
  32. )
  33. (layer2): Sequential(
  34. (0): BasicBlock(
  35. (conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
  36. (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  37. (relu): ReLU()
  38. (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  39. (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  40. (downsample): Sequential(
  41. (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
  42. (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  43. )
  44. )
  45. (1): BasicBlock(
  46. (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  47. (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  48. (relu): ReLU()
  49. (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  50. (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  51. )
  52. (2): BasicBlock(
  53. (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  54. (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  55. (relu): ReLU()
  56. (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  57. (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  58. )
  59. (3): BasicBlock(
  60. (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  61. (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  62. (relu): ReLU()
  63. (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  64. (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  65. )
  66. )
  67. (layer3): Sequential(
  68. (0): BasicBlock(
  69. (conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
  70. (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  71. (relu): ReLU()
  72. (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  73. (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  74. (downsample): Sequential(
  75. (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
  76. (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  77. )
  78. )
  79. (1): BasicBlock(
  80. (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  81. (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  82. (relu): ReLU()
  83. (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  84. (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  85. )
  86. (2): BasicBlock(
  87. (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  88. (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  89. (relu): ReLU()
  90. (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  91. (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  92. )
  93. (3): BasicBlock(
  94. (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  95. (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  96. (relu): ReLU()
  97. (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  98. (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  99. )
  100. (4): BasicBlock(
  101. (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  102. (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  103. (relu): ReLU()
  104. (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  105. (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  106. )
  107. (5): BasicBlock(
  108. (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  109. (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  110. (relu): ReLU()
  111. (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  112. (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  113. )
  114. )
  115. (layer4): Sequential(
  116. (0): BasicBlock(
  117. (conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
  118. (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  119. (relu): ReLU()
  120. (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  121. (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  122. (downsample): Sequential(
  123. (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
  124. (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  125. )
  126. )
  127. (1): BasicBlock(
  128. (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  129. (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  130. (relu): ReLU()
  131. (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  132. (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  133. )
  134. (2): BasicBlock(
  135. (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  136. (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  137. (relu): ReLU()
  138. (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  139. (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  140. )
  141. )
  142. (avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
  143. (fc): Linear(in_features=512, out_features=1000, bias=True)
  144. )
  145. '''

注意看,这里最后一层是fc,而不是之前的classifier

所以更改最后一层应该这样实现

 

更改中间的卷积层为:

  1. net = resnet34()
  2. net.layer2[0].conv1 = nn.Conv2d(64,256,kernel_size=3,stride=1)

之前的网络:

更改之后的:

3.  查看网络的参数值

只需要在之前的索引后,加一个权重或者偏置就行了

 

或者利用上面冻结的方法也可以

 

感兴趣的话可以自己试试

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/繁依Fanyi0/article/detail/916900
推荐阅读
相关标签
  

闽ICP备14008679号