【pytorch】将模型部署至生产环境：借助TensorRT的Python及C++接口完成代码优化及部署_python

（一）TensorRT介绍：
Tensor是一个有助于在NVIDIA图形处理单元（GPU）上高性能推理c++库，专门致力于在GPU上快速有效地进行网络推理。

TensorRT可以对网络进行压缩、优化以及运行时部署，并且没有框架的开销。改善网络的延迟、吞吐量以及效率。

TensorRT通常是异步使用的，因此，当输入数据到达时，程序调用带有输入缓冲区和TensorRT放置结果的缓冲区的enqueue函数。

下面是TensorRT结构图：

Network Definition：网络定义接口为应用程序提供了指定网络定义方法。可以指定网络的输入输出tensor，也可以添加layer，一般不会用tensorRT构建网络。

Builder Configuration：构建器配置接口指定用于创建engine的详细信息，它允许应用程序指定优化Profile，最大工作空间大小，最小可接受的精度水平，用于自动调整的定时迭代技术以及用于量化网络以8位精度运行的接口。

Builder：构建器接口允许根据网络定义和builder configuration创建一个优化的engine。

Engine：engine接口允许应用程序执行inference。它支持同步和异步执行、概要分析以及枚举和查询engine的输入和输出的绑定。

TensorRT会根据网络的定义执行优化【包括特定平台的优化】并生成inference engine。此过程被称为构建阶段，因此，一个典型的应用程序只会被构建一次engine，然后将其序列化为plane file以供后续使用。【注意：生成的plane file 不能跨平台或TensorRT 版本移植
因为plane file是明确指定GPU 的model，所以我们要想使用不同的GPU来运行plane file必须得重新指定GPU】

（二）转换思路及模型准备：
转换思路为：pytorch -> onnx -> onnx2trt -> TensorRT
对于pytorch -> onnx ，采用记录法并导出onxx的方法，代码如下：

import os.path
from typing import Iterator
import numpy as np
import torch
import cv2
from PIL import Image
from torch.utils.data import Dataset,DataLoader,Subset,random_split
import re
from functools import reduce
from torch.utils.tensorboard import SummaryWriter as Writer
from torchvision import transforms,datasets
import torchvision as tv
from torch import nn
import torch.nn.functional as F
import time
import onnx
import onnxruntime
#查看命令：tensorboard --logdir=./myBorderText
#可用pycharm中code中的generater功能实现：
#向模型添加显式注释：
class myCustomerNetWork(torch.jit.ScriptModule):
    def __init__(self):
        super().__init__()
        #输入3通道输出6通道：
        self.features=nn.Sequential(nn.Conv2d(3, 64, (3, 3)),nn.ReLU(),nn.Conv2d(64,128,(3,3)),
                                    nn.ReLU(),nn.Conv2d(128,256,(3,3)),nn.ReLU(),nn.AdaptiveAvgPool2d(1))

        self.classfired=nn.Sequential(nn.Flatten(),nn.Linear(256,80),nn.Dropout(),nn.Linear(80,10))

    @torch.jit.script_method
    def forward(self,x):
        return self.classfired(self.features(x))
#网络输入要求为torch.Size([32, 3, 32, 32])格式
myNet= torch.jit.script(myCustomerNetWork())
myNet=myCustomerNetWork()
pthfile = r'D:\flask_pytorch\saveTextOnlyParams.pth'
#当strict=false时，参数文件匹配得上就加载，没有就默认初始化。
myNet.load_state_dict(torch.load(pthfile),strict=False)
if torch.cuda.is_available():
    myNet=myNet.cuda()
myNet.eval()

if __name__ == '__main__':
    imagePath = r"C:\Users360\Desktop\monodepth.jpeg"
    img = cv2.imdecode(np.fromfile(imagePath, np.uint8), -1)
    img = cv2.resize(img, (32, 32))
    # bgr转rgb
    img = img[:, :, ::-1].copy()
    inputX = torch.FloatTensor(img).cuda()
    inputX = inputX.permute(2, 0, 1).contiguous()
    inputX = inputX.unsqueeze(0)
    #torch_out=myNet(inputX)
    # 将模型序列化
    #myNet.save('jit_model2.pth')
    #torch.onnx.export在运行时，先判断是否是SriptModule，如果不是，则进行torch.jit.trace，因此export需要一个随机生成的输入参数
    # 若传入 scriptModule,需要外加配置 example_outputs，用来获取输出的shape和dtype，无需运行模型
    #之前模型使用记录法得到，这里无需运行模型，但要给出输入及输出参撒形状；一般无特殊情况，跟踪法使用更多。
    dynamic_axes = {'input': {0: 'batch'}, 'output': {0: 'batch'}}  # 配置动态分辨率
    #在最新版pytorch中的记录法，inputX仍然需要，但只是用于生成output形状的，但是不会再最终，所以example_outputs参数被删去了
    torch.onnx.export(myNet, inputX, r'./modelForTensorRT.onnx', input_names=['input'], output_names=['output'], dynamic_axes=dynamic_axes)

待续未完

欢迎分享，转载请注明来源：内存溢出

原文地址:https://54852.com/langs/876036.html

【pytorch】将模型部署至生产环境：借助TensorRT的Python及C++接口完成代码优化及部署

发表评论

评论列表（0条）