Pytorch官网入门Demo——实现一个图像分类器 参考:
哔哩哔哩:pytorch官方demo(Lenet)
pytorch官网demo (中文版戳这里 )
pytorch中的卷积操作详解
Fan的CSDN笔记
代码部分 Model 模型构建
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 import torch.nn as nnimport torch.nn.functional as Fclass LeNet (nn.Module): def __init__ (self ): super (LeNet, self).__init__() self.conv1 = nn.Conv2d(3 , 16 , 5 ) self.pool1 = nn.MaxPool2d(2 , 2 ) self.conv2 = nn.Conv2d(16 , 32 , 5 ) self.pool2 = nn.MaxPool2d(2 , 2 ) self.fc1 = nn.Linear(32 *5 *5 , 120 ) self.fc2 = nn.Linear(120 , 84 ) self.fc3 = nn.Linear(84 , 10 ) def forward (self, x ): x = F.relu(self.conv1(x)) x = self.pool1(x) x = F.relu(self.conv2(x)) x = self.pool2(x) x = x.view(-1 , 32 *5 *5 ) x = F.relu(self.fc1(x)) x = F.relu(self.fc2(x)) x = self.fc3(x) return x
Train训练(下载cifar-10官方训练集和测试集)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 import torchimport torchvisionimport torch.nn as nnfrom model import LeNetimport torch.optim as optimimport torchvision.transforms as transformsimport matplotlib.pyplot as pltimport numpy as npdef main (): transform = transforms.Compose( [transforms.ToTensor(), transforms.Normalize((0.5 , 0.5 , 0.5 ), (0.5 , 0.5 , 0.5 ))]) train_set = torchvision.datasets.CIFAR10(root='./data' , train=True , download=False , transform=transform) train_loader = torch.utils.data.DataLoader(train_set, batch_size=36 , shuffle=True , num_workers=0 ) val_set = torchvision.datasets.CIFAR10(root='./data' , train=False , download=True , transform=transform) val_loader = torch.utils.data.DataLoader(val_set, batch_size=5000 , shuffle=False , num_workers=0 ) val_data_iter = iter (val_loader) val_image, val_label = next (val_data_iter) classes = ('plane' , 'car' , 'bird' , 'cat' , 'deer' , 'dog' , 'frog' , 'horse' , 'ship' , 'truck' ) net = LeNet() loss_function = nn.CrossEntropyLoss() optimizer = optim.Adam(net.parameters(), lr=0.001 ) for epoch in range (5 ): running_loss = 0.0 for step, data in enumerate (train_loader, start=0 ): inputs, labels = data optimizer.zero_grad() outputs = net(inputs) loss = loss_function(outputs, labels) loss.backward() optimizer.step() running_loss += loss.item() if step % 500 == 499 : with torch.no_grad(): outputs = net(val_image) predict_y = torch.max (outputs, dim=1 )[1 ] accuracy = torch.eq(predict_y, val_label).sum ().item() / val_label.size(0 ) print ('[%d, %5d] train_loss: %.3f test_accuracy: %.3f' % (epoch + 1 , step + 1 , running_loss / 500 , accuracy)) running_loss = 0.0 print ('Finished Training' ) save_path = './Lenet.pth' torch.save(net.state_dict(), save_path) if __name__ == '__main__' : main()
输出结果:
1 2 3 4 5 6 7 8 9 10 11 12 Files already downloaded and verified [1, 500] train_loss: 1.747 test_accuracy: 0.459 [1, 1000] train_loss: 1.445 test_accuracy: 0.510 [2, 500] train_loss: 1.230 test_accuracy: 0.575 [2, 1000] train_loss: 1.173 test_accuracy: 0.601 [3, 500] train_loss: 1.034 test_accuracy: 0.612 [3, 1000] train_loss: 1.035 test_accuracy: 0.629 [4, 500] train_loss: 0.941 test_accuracy: 0.645 [4, 1000] train_loss: 0.928 test_accuracy: 0.649 [5, 500] train_loss: 0.846 test_accuracy: 0.666 [5, 1000] train_loss: 0.866 test_accuracy: 0.670 Finished Training
预测模块
输入图片
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 import torchimport torchvision.transforms as transformsfrom PIL import Imagefrom model import LeNetdef main (): transform = transforms.Compose( [transforms.Resize((32 , 32 )), transforms.ToTensor(), transforms.Normalize((0.5 , 0.5 , 0.5 ), (0.5 , 0.5 , 0.5 ))]) classes = ('plane' , 'car' , 'bird' , 'cat' , 'deer' , 'dog' , 'frog' , 'horse' , 'ship' , 'truck' ) net = LeNet() net.load_state_dict(torch.load('Lenet.pth' )) im = Image.open ('totest.jpg' ) im = transform(im) im = torch.unsqueeze(im, dim=0 ) with torch.no_grad(): outputs = net(im) predict = torch.max (outputs, dim=1 )[1 ].numpy() print (classes[int (predict)]) if __name__ == '__main__' : main()
输出结果
plane
学习 Demo流程
model.py ——定义LeNet网络模型
train.py ——加载数据集并训练,训练集计算loss,测试集计算accuracy,保存训练好的网络参数
predict.py——得到训练好的网络参数后,用自己找的图像进行分类测试
model.py 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 import torch.nn as nnimport torch.nn.functional as Fclass LeNet (nn.Module): def __init__ (self ): super (LeNet, self).__init__() self.conv1 = nn.Conv2d(3 , 16 , 5 ) self.pool1 = nn.MaxPool2d(2 , 2 ) self.conv2 = nn.Conv2d(16 , 32 , 5 ) self.pool2 = nn.MaxPool2d(2 , 2 ) self.fc1 = nn.Linear(32 *5 *5 , 120 ) self.fc2 = nn.Linear(120 , 84 ) self.fc3 = nn.Linear(84 , 10 ) def forward (self, x ): x = F.relu(self.conv1(x)) x = self.pool1(x) x = F.relu(self.conv2(x)) x = self.pool2(x) x = x.view(-1 , 32 *5 *5 ) x = F.relu(self.fc1(x)) x = F.relu(self.fc2(x)) x = self.fc3(x) return x
Tips :
pytorch中的卷积、池化、输入输出层中参数的含义与位置,可配合下图一起食用:
卷积 Conv2d 我们常用的卷积(Conv2d)在pytorch中对应的函数是:
1 torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1 , padding=0 , dilation=1 , groups=1 , bias=True , padding_mode='zeros' )
一般使用时关注以下几个参数即可:
in_channels :输入特征矩阵的深度。如输入一张RGB彩色图像,那in_channels=3
out_channels :输入特征矩阵的深度。也等于卷积核的个数,使用n个卷积核输出的特征矩阵深度就是n
kernel_size :卷积核的尺寸。可以是int类型,如3 代表卷积核的height=width=3,也可以是tuple类型如(3, 5)代表卷积核的height=3,width=5
stride :卷积核的步长。默认为1,和kernel_size一样输入可以是int型,也可以是tuple类型
padding :补零操作,默认为0。可以为int型如1即补一圈0,如果输入为tuple型如(2, 1) 代表在上下补2行,左右补1列。
附上pytorch官网上的公式:
注:当通过N = (W − F + 2P ) / S + 1 计算式得到的输出尺寸非整数时,会通过删除多余的行和列来保证卷积的输出尺寸为整数。
池化 MaxPool2d 最大池化(MaxPool2d)在 pytorch 中对应的函数是:
1 MaxPool2d(kernel_size, stride)
Tensor的展平:view() 注意到,在经过第二个池化层后,数据还是一个三维的Tensor (32, 5, 5),需要先经过展平后(32*5*5)再传到全连接层:
1 2 3 x = self.pool2(x) x = x.view(-1 , 32 *5 *5 ) x = F.relu(self.fc1(x))
全连接 Linear 全连接( Linear)在 pytorch 中对应的函数是:
1 Linear(in_features, out_features, bias=True )
Train.py 导入数据集 导入包
1 2 3 4 5 6 7 8 9 import torchimport torchvisionimport torch.nn as nnfrom model import LeNetimport torch.optim as optimimport torchvision.transforms as transformsimport matplotlib.pyplot as pltimport numpy as npimport time
####数据预处理
对输入的图像数据做预处理,即由shape (H x W x C) in the range [0, 255] → shape (C x H x W) in the range [0.0, 1.0]
1 2 3 transform = transforms.Compose( [transforms.ToTensor(), transforms.Normalize((0.5 , 0.5 , 0.5 ), (0.5 , 0.5 , 0.5 ))])
数据集介绍 利用torchvision.datasets
函数可以在线导入pytorch中的数据集,包含一些常见的数据集如MNIST等 此demo用的是CIFAR10数据集,也是一个很经典的图像分类数据集,由 Hinton 的学生 Alex Krizhevsky 和 Ilya Sutskever 整理的一个用于识别普适物体的小型数据集,一共包含 10 个类别的 RGB 彩色图片。
导入、加载 训练集 1 2 3 4 5 6 7 8 9 10 train_set = torchvision.datasets.CIFAR10(root='./data' , train=True , download=True , transform=transform) train_loader = torch.utils.data.DataLoader(train_set, batch_size=50 , shuffle=False , num_workers=0 )
导入、加载 测试集 1 2 3 4 5 6 7 8 9 10 11 12 test_set = torchvision.datasets.CIFAR10(root='./data' , train=False , download=False ,transform=transform) test_loader = torch.utils.data.DataLoader(test_set, batch_size=10000 , shuffle=False , num_workers=0 ) test_data_iter = iter (test_loader) test_image, test_label = test_data_iter.next () 1234567891011
训练过程
名词
定义
epoch
对训练集的全部数据进行一次完整的训练,称为 一次 epoch
batch
由于硬件算力有限,实际训练时将训练集分成多个批次训练,每批数据的大小为 batch_size
iteration 或 step
对一个batch的数据训练的过程称为 一个 iteration 或 step
以本demo为例,训练集一共有50000个样本,batch_size=50,那么完整的训练一次样本:iteration或step=1000,epoch=1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 net = LeNet() loss_function = nn.CrossEntropyLoss() optimizer = optim.Adam(net.parameters(), lr=0.001 ) for epoch in range (5 ): running_loss = 0.0 time_start = time.perf_counter() for step, data in enumerate (train_loader, start=0 ): inputs, labels = data optimizer.zero_grad() outputs = net(inputs) loss = loss_function(outputs, labels) loss.backward() optimizer.step() running_loss += loss.item() if step % 1000 == 999 : with torch.no_grad(): outputs = net(test_image) predict_y = torch.max (outputs, dim=1 )[1 ] accuracy = (predict_y == test_label).sum ().item() / test_label.size(0 ) print ('[%d, %5d] train_loss: %.3f test_accuracy: %.3f' % (epoch + 1 , step + 1 , running_loss / 500 , accuracy)) print ('%f s' % (time.perf_counter() - time_start)) running_loss = 0.0 print ('Finished Training' )save_path = './Lenet.pth' torch.save(net.state_dict(), save_path)
打印信息如下:
1 2 3 4 5 6 7 8 9 10 11 [1 , 1000 ] train_loss: 1.537 test_accuracy: 0.541 35.345407 s[2 , 1000 ] train_loss: 1.198 test_accuracy: 0.605 40.532376 s[3 , 1000 ] train_loss: 1.048 test_accuracy: 0.641 44.144097 s[4 , 1000 ] train_loss: 0.954 test_accuracy: 0.647 41.313228 s[5 , 1000 ] train_loss: 0.882 test_accuracy: 0.662 41.860646 sFinished Training
使用GPU/CPU训练 使用下面语句可以在有GPU时使用GPU,无GPU时使用CPU进行训练
1 2 device = torch.device("cuda" if torch.cuda.is_available() else "cpu" ) print (device)
也可以直接指定
1 2 3 device = torch.device("cuda" )
对应的,需要用to()
函数来将Tensor在CPU和GPU之间相互移动,分配到指定的device中计算
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 net = LeNet() net.to(device) loss_function = nn.CrossEntropyLoss() optimizer = optim.Adam(net.parameters(), lr=0.001 ) for epoch in range (5 ): running_loss = 0.0 time_start = time.perf_counter() for step, data in enumerate (train_loader, start=0 ): inputs, labels = data optimizer.zero_grad() outputs = net(inputs.to(device)) loss = loss_function(outputs, labels.to(device)) loss.backward() optimizer.step() running_loss += loss.item() if step % 1000 == 999 : with torch.no_grad(): outputs = net(test_image.to(device)) predict_y = torch.max (outputs, dim=1 )[1 ] accuracy = (predict_y == test_label.to(device)).sum ().item() / test_label.size(0 ) print ('[%d, %5d] train_loss: %.3f test_accuracy: %.3f' % (epoch + 1 , step + 1 , running_loss / 1000 , accuracy)) print ('%f s' % (time.perf_counter() - time_start)) running_loss = 0.0 print ('Finished Training' )save_path = './Lenet.pth' torch.save(net.state_dict(), save_path)
打印信息如下:
1 2 3 4 5 6 7 8 9 10 11 12 cuda [1 , 1000 ] train_loss: 1.569 test_accuracy: 0.527 18.727597 s[2 , 1000 ] train_loss: 1.235 test_accuracy: 0.595 17.367685 s[3 , 1000 ] train_loss: 1.076 test_accuracy: 0.623 17.654908 s[4 , 1000 ] train_loss: 0.984 test_accuracy: 0.639 17.861825 s[5 , 1000 ] train_loss: 0.917 test_accuracy: 0.649 17.733115 sFinished Training
可以看到,用GPU训练时,速度提升明显,耗时缩小。
Predict.py 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 import torchimport torchvision.transforms as transformsfrom PIL import Imagefrom model import LeNettransform = transforms.Compose( [transforms.Resize((32 , 32 )), transforms.ToTensor(), transforms.Normalize((0.5 , 0.5 , 0.5 ), (0.5 , 0.5 , 0.5 ))]) im = Image.open ('horse.jpg' ) im = transform(im) im = torch.unsqueeze(im, dim=0 ) net = LeNet() net.load_state_dict(torch.load('Lenet.pth' )) classes = ('plane' , 'car' , 'bird' , 'cat' , 'deer' , 'dog' , 'frog' , 'horse' , 'ship' , 'truck' ) with torch.no_grad(): outputs = net(im) predict = torch.max (outputs, dim=1 )[1 ].data.numpy() print (classes[int (predict)])
输出即为预测的标签。
其实预测结果也可以用 softmax 表示,输出10个概率:
1 2 3 4 with torch.no_grad(): outputs = net(im) predict = torch.softmax(outputs, dim=1 ) print (predict)
输出结果中最大概率值对应的索引即为 预测标签 的索引。
1 2 tensor([[2.2782e-06 , 2.1008e-07 , 1.0098e-04 , 9.5135e-05 , 9.3220e-04 , 2.1398e-04 , 3.2954e-08 , 9.9865e-01 , 2.8895e-08 , 2.8820e-07 ]])