
- 一、构造 Experiment 类
- 二、AlexNet 简介
- 三、搭建 AlexNet
- 四、CIFAR-10 数据集
- 五、训练/测试 AlexNet
- 附录:完整代码
卷积神经网络训练和测试的代码结构大部分都相同,为了方便今后更加简洁地书写代码,这里先构造一个 Experiment 类,并将其保存在 Experiment.py 文件中。
Experiment.py \textcolor{blue}{\text{Experiment.py}} Experiment.py
import torch
import time
import numpy as np
import matplotlib.pyplot as plt
class Experiment:
def __init__(self, train_loader, test_loader, model, num_epochs, lr, optimizer='SGD', quiet=False):
self.train_loader = train_loader
self.test_loader = test_loader
self.model = model.cuda()
self.num_epochs = num_epochs
self.lr = lr
self.loss_fn = torch.nn.CrossEntropyLoss()
self.quiet = quiet
self.speed = 0
self.train_loss, self.train_acc = [], []
self.test_loss, self.test_acc = [], []
if optimizer == 'SGD':
self.optimizer = torch.optim.SGD(self.model.parameters(), lr=self.lr)
elif optimizer == 'Adam':
self.optimizer = torch.optim.Adam(self.model.parameters(), lr=self.lr)
else:
raise ValueError
def train(self):
correct, avg_loss = 0, 0
for batch_idx, (X, y) in enumerate(self.train_loader):
tic = time.time()
X, y = X.cuda(), y.cuda()
pred = self.model(X)
loss = self.loss_fn(pred, y)
self.optimizer.zero_grad()
loss.backward()
self.optimizer.step()
toc = time.time()
self.speed += toc - tic
correct += (pred.argmax(dim=1) == y).sum().item()
avg_loss += loss
avg_loss /= (batch_idx + 1)
correct /= len(self.train_loader.dataset)
self.train_loss.append(avg_loss.item())
self.train_acc.append(correct)
if not self.quiet:
print('Train Avg Loss: {:.6f},'.format(avg_loss), end=' ')
print('Train Accuracy: {:.6f}'.format(correct))
def test(self):
correct, avg_loss = 0, 0
with torch.no_grad():
for batch_idx, (X, y) in enumerate(self.test_loader):
X, y = X.cuda(), y.cuda()
pred = self.model(X)
loss = self.loss_fn(pred, y)
correct += (pred.argmax(dim=1) == y).sum().item()
avg_loss += loss
avg_loss /= (batch_idx + 1)
correct /= len(self.test_loader.dataset)
self.test_loss.append(avg_loss.item())
self.test_acc.append(correct)
if not self.quiet:
print('Test Avg Loss: {:.6f},'.format(avg_loss), end=' ')
print('Test Accuracy: {:.6f}\n'.format(correct))
def main(self):
for epoch in range(self.num_epochs):
if not self.quiet:
print('Epoch {}\n'.format(epoch + 1) + '-' * 50)
self.train()
self.test()
self.speed = self.num_epochs * len(self.train_loader.dataset) / self.speed
print('-' * 50)
print('{:.1f} samples/sec'.format(self.speed))
print('-' * 50 + '\n\n' + 'Done!')
def show(self):
x = np.arange(1, self.num_epochs + 1)
plt.plot(x, self.train_loss, c='royalblue', label='train loss')
plt.plot(x, self.train_acc, c='seagreen', label='train acc', ls='dashed')
plt.plot(x, self.test_loss, c='darkorchid', label='test loss')
plt.plot(x, self.test_acc, c='firebrick', label='test acc', ls='dashed')
plt.legend(loc='best')
plt.xlabel('epoch')
plt.grid()
plt.show()
具体使用方法:
from Experiment import Experiment as E
net = Net() # 初始化我们的神经网络(无需移动到GPU,因为Experiment会自动将其移动到GPU上)
num_epochs = 20 # 决定有多少个Epoch
lr = 1e-2 # 学习率
e = E(train_loader, test_loader, net, num_epochs, lr)
e.main() # 进行训练和测试
e.show() # 绘制损失曲线和准确率曲线
Experiment 仅支持 SGD 和 Adam 优化器,默认使用 SGD 作为优化器,如需更换优化器,则需要为 optimizer 指定参数:
e = E(train_loader, test_loader, net, num_epochs, lr, optimizer='Adam')
默认情况下,执行 e.main() 会不断输出每一个 Epoch 的结果,如果只想看最后的曲线图而不看这些频繁生成结果,需要指定 quiet 参数的值为 True
e = E(train_loader, test_loader, net, num_epochs, lr, quiet=True)
此外,Experiment 默认在 GPU 上进行训练。
AlexNet 和 LeNet 的设计理念非常相似(可以粗略地认为 AlexNet 是更大更深的 LeNet),但依然存在显著差异。相比 LeNet,AlexNet 采用 ReLU 作为激活函数,平均池化换成了最大池化,全连接层中增加了 Dropout 用作正则,此外对于输入的数据集也进行了图像增强(Image Augmentation)。
不考虑每层的参数,AlexNet 相比 LeNet 新增加了三个卷积层一个汇聚层,两者的结构比较如下:
可以看出 AlexNet 针对的数据是RGB彩图,且图像尺寸为 224 × 224 224\times 224 224×224。
三、搭建 AlexNet为了方便训练和测试,我们这里只考虑十分类任务,即修改最后一层的神经元个数为 10 10 10,不再是原论文中的 1000 1000 1000(否则训练时间会过长)。
from torch import nn
class AlexNet(nn.Module):
def __init__(self):
super().__init__()
self.conv = nn.Sequential(
nn.Conv2d(3, 96, kernel_size=11, stride=4, padding=1), nn.ReLU(),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(96, 256, kernel_size=5, padding=2), nn.ReLU(),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(256, 384, kernel_size=3, padding=1), nn.ReLU(),
nn.Conv2d(384, 384, kernel_size=3, padding=1), nn.ReLU(),
nn.Conv2d(384, 256, kernel_size=3, padding=1), nn.ReLU(),
nn.MaxPool2d(kernel_size=3, stride=2),
)
self.fc = nn.Sequential(
nn.Linear(6400, 4096), nn.ReLU(),
nn.Dropout(p=0.5),
nn.Linear(4096, 4096), nn.ReLU(),
nn.Dropout(p=0.5),
nn.Linear(4096, 10)
)
def forward(self, x):
x = self.conv(x)
x = torch.flatten(x, 1)
x = self.fc(x)
return x
四、CIFAR-10 数据集
CIFAR-10 数据集(官网链接)共包含了 60000 60000 60000 张 32 × 32 32\times 32 32×32 像素的图片。一共有 10 10 10 类,每一类有 6000 6000 6000 张图片。其中训练集中共有 50000 50000 50000 张图片,测试集中共有 10000 10000 10000 张图片。
需要注意的是,AlexNet 针对的是 224 × 224 224\times 224 224×224 的图片,因此我们需要调整图片大小(实际上这不是一个明智的做法,这里仅仅是为了使用 AlexNet)。
数据预处理代码:
import torchvision
from torch.utils.data import DataLoader
from torchvision.transforms import ToTensor, Resize
transformer = torchvision.transforms.Compose([Resize(224), ToTensor()])
train_data = torchvision.datasets.CIFAR10('/mnt/mydataset', train=True, transform=transformer, download=True)
test_data = torchvision.datasets.CIFAR10('/mnt/mydataset', train=False, transform=transformer, download=True)
train_loader = DataLoader(train_data, batch_size=100, shuffle=True, num_workers=4)
test_loader = DataLoader(test_data, batch_size=100, num_workers=4)
五、训练/测试 AlexNet
设置学习率为 0.05,训练 50 个 Epoch:
alexnet = AlexNet()
e = E(train_loader, test_loader, alexnet, 50, 0.05)
e.main()
e.show()
在 NVIDIA GeForce RTX 3080 Ti 上的训练结果如下(这里仅展示第 50 个Epoch 以及整体变化图):
Epoch 50
--------------------------------------------------
Train Avg Loss: 0.011900, Train Accuracy: 0.995940
Test Avg Loss: 1.381492, Test Accuracy: 0.792600
--------------------------------------------------
4657.1 samples/sec
--------------------------------------------------
Done!
可以看出大约从第15个Epoch起,测试集的损失在逐步上升,且 AlexNet 在测试集上的精度并未得到进一步提高,反倒在训练集上的精度逐步逼近1。这种情况可能是因为我们的学习率始终为一个常数,需要在第15个Epoch后进一步调小学习率才有可能提升 AlexNet 在测试集上的精度。
import torch
import torchvision
from torch import nn
from torch.utils.data import DataLoader
from torchvision.transforms import ToTensor, Resize
from Experiment import Experiment as E
class AlexNet(nn.Module):
def __init__(self):
super().__init__()
self.conv = nn.Sequential(
nn.Conv2d(3, 96, kernel_size=11, stride=4, padding=1), nn.ReLU(),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(96, 256, kernel_size=5, padding=2), nn.ReLU(),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(256, 384, kernel_size=3, padding=1), nn.ReLU(),
nn.Conv2d(384, 384, kernel_size=3, padding=1), nn.ReLU(),
nn.Conv2d(384, 256, kernel_size=3, padding=1), nn.ReLU(),
nn.MaxPool2d(kernel_size=3, stride=2),
)
self.fc = nn.Sequential(
nn.Linear(6400, 4096), nn.ReLU(),
nn.Dropout(p=0.5),
nn.Linear(4096, 4096), nn.ReLU(),
nn.Dropout(p=0.5),
nn.Linear(4096, 10)
)
def forward(self, x):
x = self.conv(x)
x = torch.flatten(x, 1)
x = self.fc(x)
return x
transformer = torchvision.transforms.Compose([Resize(224), ToTensor()])
train_data = torchvision.datasets.CIFAR10('/mnt/mydataset', train=True, transform=transformer, download=True)
test_data = torchvision.datasets.CIFAR10('/mnt/mydataset', train=False, transform=transformer, download=True)
train_loader = DataLoader(train_data, batch_size=100, shuffle=True, num_workers=4)
test_loader = DataLoader(test_data, batch_size=100, num_workers=4)
alexnet = AlexNet()
e = E(train_loader, test_loader, alexnet, 50, 0.05)
e.main()
e.show()
欢迎分享,转载请注明来源:内存溢出
微信扫一扫
支付宝扫一扫
评论列表(0条)