当先锋百科网

首页 1 2 3 4 5 6 7

YOLO~V1论文及Pytorch实现详解

论文地址:https://paperswithcode.com/paper/you-only-look-once-unified-real-time-object
pytorch参考实现:https://github.com/ivanwhaf/yolov1-pytorch.git

1. 目标检测概述

在图像分类任务中,我们假设图像中只有⼀个主要物体对象,我们只关注如何识别其类别。然⽽,很多时候图像⾥有多个我们感兴趣的⽬标,我们不仅想知道它们的类别,还想得到它们在图像中的具体位置。在计算机视觉⾥,我们将这类任务称为⽬标检测(object detection)。
对于目标检测,目前主要有了两类方法,分别是两阶段方法和单阶段方法。对于两阶段方法,以Faster RCNN系列为代表,其先通过一个RPN(Region Proposal Net)生成建议区域(含物体的区域),再对这些区域输入后续网络进行类别判断和位置调整,此类方法的优点是精度较高,缺点是检测速度较低,训练较为困难。对于单阶段方法,以本系列文章要讲的YOLO系列为代表,其起初的基本思想为把目标检测直接看成一个回归问题,直接预测物体位置和类别信息。

2. YOLO-V1特点

  • 速度快,能达到每秒45帧以上的速度;
  • 全图推理,没有窗口和推荐区域概念,一次观察整张图像

3. 论文核心

在这里插入图片描述

4. 网络结构

yolo v1模型结构YOLO V1网络结构非常简单,易于搭建,基本为一个直通式的结构,前24层卷积网络用来提取特征,通过卷积和最大池化的步长来进行下采样,通过1x1卷积模块来改变通道数。最后两层为全连接层,用来预测位置和类别信息。YOLO V1结构没有滑动窗口和推荐区域机制,其预测是通过一次观察整张图像进行预测。
以VOC数据集训练为例,因为是20类问题,且原文把输入看成7x7的网格(grid_cell),每个网格预测2个BoundingBox(bbox),每个bbox含5个参数(x, y, w, h, c),x, y为bbox的中心,为相对与所在网格grid_cell的偏移,取值在0-1之间,w, h为bbox相对整张图像宽高的大小,取值也在0-1之间,c为该bbox含物体(非背景)的置信度。同时,对每个grid_cell还预测20个类别的置信度,故每个grid_cell预测20 + 2 x (4 + 1) = 30个参数。

5. 训练和预测

5.1 ImageNet1k数据集预训练

使用网络的前20层卷积网络(输入224224下采样至77)+平均池化+一层全连接层,使用ImageNet1k数据集训练一个分类网络。

5.2 VOC数据集训练

训练流程及损失函数

由于目标检测往往需要更多的细节,一般采用较高分辨率图像,训练目标检测模型时论文中输入图像分辨率为448*448。去除预训练模型的平均池化和全连接层,再加上4层卷积层和2层全连接层,随机初始化新增的几层网络的参数,再在VOC数据集上进行训练。接下来重点讲一下YOLO~V1的激活函数。
YOLO~V1的损失函数采用均方差函数,利于迭代优化。如上图所示,其损失可以看成3个部分。
YOLO~V1损失函数

  • 与bbox有关的(x, y, w, h)有关的位置损失;
    (1) λ c o o r d \lambda_coord λcoord为位置损失和分类损失的平衡系数,文中设置为5;
    (2)w,h计算损失时开根号为了缩小大目标和小目标物体损失的差距,如假设大目标物体 W p r e − W b b o x W_pre -W_bbox WpreWbbox为0.16,小目标 W p r e − W b b o x W_pre -W_bbox WpreWbbox为0.01,如果不开根号,大目标的损失为小目标损失的196倍,开根号后为16倍,这就缓解了这种尺度差距大带来的偏向检测大目标的问题。
  • 与每个bbox是否包含物体(判断是前景还是背景)的置信度C有关的损失;
  • 每个网格(grid_cell)含各类物体的概率损失

训练参数

  • 学习率:总共训练135epochs, 0.01训练75epochs,0.001训练30epochs,0.0001训练30epochs。
  • 最后的全连接层使用dropout = 0.5防过拟合
  • 数据增强:随机缩放、随机调整曝光度和饱和度(HSV)

5.3 推理

在这里插入图片描述
在推理时使用类别条件概率乘以bbox含物体的置信度来作为反应bbox内含某一类物体的置信度以及位置准确度。因为会预测7 * 7 * 2 =98个bbox,还需使用NMS(非极大化抑制)来获得最后的检测结果,只保留最好的检测框。NMS介绍可参考:https://blog.csdn.net/m0_37605642/article/details/98358864
在这里插入图片描述

6. 优缺点

优点

  • 模型简单,易于搭建和训练
  • 检测速度快

缺点

  • 对小目标检测效果不好,只用了最高层特征,大框小误差与小框相同的误差损失一样;

7. pytorch代码实现

7.1 项目文件目录

YOLOV1-pytorch

	- cfg
	- - dateset.yaml  存放数据集文件路径和类比信息的配置文件
	- - yolov1.yaml 存放模型相关参数的配置文件
	- data 存放例子
	- models 模型搭建脚本
	- utils
	- - datasets.py 制作自制数据集的脚本
	- - draw_gt.py 绘制矩形框的脚本
	- - loss 计算损失函数的脚本
	- - modify_label.py 将源数据集标准信息转为含(x, y, w, h, c)的txt文件
	- - util.py  包含NMS、 IOU计算、 坐标转换等函数
	- train.py 训练模型的脚本
	- detect.py 推理的脚本

7.2 数据准备

7.2.1 标准信息转换

转换前VOC数据集标准格式举例2007_000027.xml:

<?xml version="1.0"?>

-<annotation>
	<folder>VOC2012</folder>
	<filename>2007_000027.jpg</filename>
	-<source>
		<database>The VOC2007 Database</database>
		<annotation>PASCAL VOC2007</annotation>
		<image>flickr</image>
	</source>
	-<size>
		<width>486</width>
		<height>500</height>
		<depth>3</depth>
	</size>
	<segmented>0</segmented>
	-<object>
		<name>person</name>
		<pose>Unspecified</pose>
		<truncated>0</truncated>
		<difficult>0</difficult>
		-<bndbox>
			<xmin>174</xmin>
			<ymin>101</ymin>
			<xmax>349</xmax>
			<ymax>351</ymax>
		</bndbox>
		-<part>
			<name>head</name>
			-<bndbox>
				<xmin>169</xmin>
				<ymin>104</ymin>
				<xmax>209</xmax>
				<ymax>146</ymax>
			</bndbox>
		</part>
		-<part>
			<name>hand</name>
			-<bndbox>
				<xmin>278</xmin>
				<ymin>210</ymin>
				<xmax>297</xmax>
				<ymax>233</ymax>
			</bndbox>
		</part>
		-<part>
			<name>foot</name>
			-<bndbox>
				<xmin>273</xmin>
				<ymin>333</ymin>
				<xmax>297</xmax>
				<ymax>354</ymax>
			</bndbox>
		</part>
		-<part>
			<name>foot</name>
			-<bndbox>
				<xmin>319</xmin>
				<ymin>307</ymin>
				<xmax>340</xmax>
				<ymax>326</ymax>
			</bndbox>
		</part>
	</object>
</annotation>

modify_label.py

import os
import xml.etree.ElementTree as ET
def modify_voc2007_label():
    # VOC2007 dataset xml -> label.txt
    CLASSES = ['person', 'bird', 'cat', 'cow', 'dog', 'horse', 'sheep', 'aeroplane', 'bicycle', 'boat', 'bus', 'car',
               'motorbike', 'train', 'bottle', 'chair', 'dining table', 'potted plant', 'sofa', 'tvmonitor']

    xml_path = '../dataset/VOC2007/Annotations'
    xml_names = os.listdir(xml_path)

    for xml_name in xml_names:
        # print(xml_name)
        f = open(os.path.join(xml_path, xml_name), 'r')
        tree = ET.parse(f)
        root = tree.getroot()

        # get pic info
        size = root.find('size')
        width, height = int(size.find('width').text), int(size.find('height').text)

        f2 = open('../dataset/VOC2007/labels/' + xml_name.split('.')[0] + '.txt', 'a')

        for obj in root.iter('object'):
            c = obj.find('name').text
            difficult = obj.find('difficult').text
            if c not in CLASSES:
                continue
            box = obj.find('bndbox')
            x1, y1 = int(box.find('xmin').text), int(box.find('ymin').text)
            x2, y2 = int(box.find('xmax').text), int(box.find('ymax').text)

            x, y, w, h = (x1 + x2) / 2, (y1 + y2) / 2, x2 - x1, y2 - y1
            x, y, w, h = x / width, y / height, w / width, h / height

            # print(x, y, w, h, c)
            f2.write('{} {} {} {} {}\n'.format(str(round(x, 8)), str(round(y, 8)), str(round(w, 8)), str(round(h, 8)),
                                               str(CLASSES.index(c))))

        f2.close()
        f.close()

if __name__ == '__main__':
    modify_voc2007_label()

转换后为.txt格式2007_000027.xml

0.53806584 0.45200000 0.36008230 0.50000000 0

7.2.2 自制数据集

datasets.py脚本

pytorch自制数据集得继承Dataset类,需至少实现__len__方法(返回有多少个样本)和__getitem__方法(返回单个样本及其标注信息)

import os
import cv2
import torch
from PIL import Image
from torch.utils.data import DataLoader, Dataset, random_split
from torchvision import transforms
from .util import xywhc2label

class YOLODataset(Dataset):
    def __init__(self, img_path, label_path, S, B, num_classes, transforms=None):
        self.img_path = img_path  # images folder path
        self.label_path = label_path  # labels folder path
        self.transforms = transforms
        self.filenames = os.listdir(img_path)
        self.filenames.sort()  # 文件名排序,屏蔽不同操作系统可能初始文件顺序不一致
        self.S = S
        self.B = B
        self.num_classes = num_classes

    def __len__(self):
        return len(self.filenames)

    def __getitem__(self, idx):
        # read image
        img = cv2.imread(os.path.join(self.img_path, self.filenames[idx]))  # BGR
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

        # ori_width, ori_height = img.shape[1], img.shape[0]  # image's original width and height

        img = Image.fromarray(img).convert('RGB')
        img = self.transforms(img)  # resize and to tensor

        # read each image's corresponding label(.txt)
        xywhc = []
        with open(os.path.join(self.label_path, self.filenames[idx].split('.')[0] + '.txt'), 'r') as f:
            lines = f.readlines()
            for line in lines:
                if line == '\n':
                    continue
                line = line.strip().split(' ')
                # convert xywh str to float, class str to int
                x, y, w, h, c = float(line[0]), float(line[1]), float(line[2]), float(line[3]), int(line[4])
                xywhc.append((x, y, w, h, c))

        label = xywhc2label(xywhc, self.S, self.B, self.num_classes)  # convert xywhc list to label
        label = torch.Tensor(label)
        return img, label

xywhc2label()方法:把标准信息(保存每个物体xywhc的txt文件)转为计算损失时要用的(S, S, 5 * B + num_classes)形的目标

def xywhc2label(bboxs, S, B, num_classes):
    # bboxs is a xywhc list: [(x,y,w,h,c),(x,y,w,h,c),....]
    label = np.zeros((S, S, 5 * B + num_classes))
    for x, y, w, h, c in bboxs:
        x_grid = int(x // (1.0 / S))   # x y w h都为0-1,此处得到的为bbox在哪个grid cell
        y_grid = int(y // (1.0 / S))
        xx, yy = x, y
        label[y_grid, x_grid, 0:5] = np.array([xx, yy, w, h, 1])  # bbox1
        label[y_grid, x_grid, 5:10] = np.array([xx, yy, w, h, 1])  # bbox2
        label[y_grid, x_grid, 10 + c] = 1  # 类别
    return label

划分数据集生成数据加载迭代器

def create_dataloader(img_path, label_path, train_proportion, val_proportion, test_proportion, batch_size, input_size,
                      S, B, num_classes):
    transform = transforms.Compose([
        transforms.Resize((input_size, input_size)),
        transforms.ToTensor()
    ])

    # create yolo dataset
    dataset = YOLODataset(img_path, label_path, S, B, num_classes, transforms=transform)

    dataset_size = len(dataset)
    train_size = int(dataset_size * train_proportion)
    val_size = int(dataset_size * val_proportion)
    test_size = dataset_size - train_size - val_size  # 可以为0

    # split dataset to train set, val set and test set three parts
    train_dataset, val_dataset, test_dataset = random_split(dataset, [train_size, val_size, test_size])

    # create data loader
    train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=False)
    val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)
    test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

    return train_loader, val_loader, test_loader

7.3 模型搭建

yolov1.py 按照模型结构图逐层搭建即可,此处实现与源作者代码不同,作者实现的只有6层卷积,使用时覆盖源代码YOLOv1类即可

import torch
import torch.nn as nn

# 下采样了 64 倍  输入448 * 448 输出 7 * 7 * 30 = 3000
architecture_config = [
    # tuple = (kernel size, number of filters of output, stride, padding)
    (7, 64, 2, 3),
    "M",  # max-pooling 2x2 stride = 2   448 - > 112
    (3, 192, 1, 1),
    "M",  # max-pooling 2x2 stride = 2   112 -> 56
    (1, 128, 1, 0),
    (3, 256, 1, 1),
    (1, 256, 1, 0),
    (3, 512, 1, 1),
    "M",  # max-pooling 2x2 stride = 2   56 - > 28
    # [tuple, tuple, repeat times]
    [(1, 256, 1, 0), (3, 512, 1, 1), 4],
    (1, 512, 1, 0),
    (3, 1024, 1, 1),
    "M",  # max-pooling 2x2 stride = 2   28 - > 14
    # [tuple, tuple, repeat times]
    [(1, 512, 1, 0), (3, 1024, 1, 1), 2],
    (3, 1024, 1, 1),
    (3, 1024, 2, 1),  # 14 -> 7
    (3, 1024, 1, 1),
    (3, 1024, 1, 1),
]   # 7 * 7 * 1024

# 一个卷积模块包括卷积层、BN 和激活函数
class CNNBlock(nn.Module):
    def __init__(self, in_channels, out_channels, **kwargs):
        super(CNNBlock, self).__init__()
        self.conv = nn.Conv2d(in_channels, out_channels, bias=False, **kwargs)
        self.batchnorm = nn.BatchNorm2d(out_channels)
        self.leakyrelu = nn.LeakyReLU(0.1)

    def forward(self, x):
        x = self.conv(x)
        x = self.batchnorm(x)
        x = self.leakyrelu(x)
        return x


class YOLOv1(nn.Module):
    def __init__(self, in_channels=3, **kwargs):
        super(YOLOv1, self).__init__()
        self.architecture = architecture_config
        self.in_channels = in_channels
        self.darknet = self._create_conv_layers(self.architecture)
        self.fcs = self._create_fcs(**kwargs)

    def forward(self, x):
        x = self.darknet(x)
        x = torch.flatten(x, start_dim=1)  # 7 * 7 * 1024 = 50176
        x = self.fcs(x)
        return x

    def _create_conv_layers(self, architecture):
        layers = []
        in_channels = self.in_channels

        for x in architecture:
            if type(x) == tuple:   # 卷积层不含重复模块
                layers += [
                    CNNBlock(
                        in_channels, x[1], kernel_size=x[0], stride=x[2], padding=x[3],
                    )  # 关键字参数传入时保存为字典
                ]

                in_channels = x[1]

            elif type(x) == str:  # max-pooling层
                layers += [nn.MaxPool2d(kernel_size=2, stride=2)]

            elif type(x) == list:
                conv1 = x[0]  # tuple
                conv2 = x[1]  # tuple
                num_repeats = x[2]  # integer

                for _ in range(num_repeats):
                    layers += [
                        CNNBlock(
                            in_channels,
                            conv1[1],
                            kernel_size=conv1[0],
                            stride=conv1[2],
                            padding=conv1[3],
                        )
                    ]

                    layers += [
                        CNNBlock(
                            conv1[1],
                            conv2[1],
                            kernel_size=conv2[0],
                            stride=conv2[2],
                            padding=conv2[3],
                        )
                    ]

                    in_channels = conv2[1]

        return nn.Sequential(*layers)  # 将列表解开成几个独立的参数,传入函数

    def _create_fcs(self, split_size, num_boxes, num_classes):
        S, B, C = split_size, num_boxes, num_classes
        return nn.Sequential(
            nn.Flatten(),
            nn.Linear(1024*S*S, 4096),
            nn.Dropout(0.5),
            nn.LeakyReLU(0.1),
            nn.Linear(4096, S*S*(C+B*5)),  # (S,S,30)
        )


def test(S=7, B=2, C=20):
    model = Yolov1(split_size=S, num_boxes=B, num_classes=C)
    x = torch.randn((2, 3, 448, 448))
    print(model(x).shape)  # 2 * 1470(7 * 7 * 30)


if __name__ == "__main__":
    test()

7.4 训练

自定义损失函数计算: 对照公式比较容易实现

import torch
from torch import nn
from .util import calculate_iou


class YOLOv1Loss(nn.Module):
    def __init__(self, S, B):
        super().__init__()
        self.S = S
        self.B = B

    def forward(self, preds, labels):
        batch_size = labels.size(0)

        loss_coord_xy = 0.  # coord xy loss
        loss_coord_wh = 0.  # coord wh loss
        loss_obj = 0.  # obj loss
        loss_no_obj = 0.  # no obj loss
        loss_class = 0.  # class loss

        for i in range(batch_size):
            for y in range(self.S):
                for x in range(self.S):
                    # this region has object
                    if labels[i, y, x, 4] == 1:
                        pred_bbox1 = torch.Tensor(
                            [preds[i, y, x, 0], preds[i, y, x, 1], preds[i, y, x, 2], preds[i, y, x, 3]])
                        pred_bbox2 = torch.Tensor(
                            [preds[i, y, x, 5], preds[i, y, x, 6], preds[i, y, x, 7], preds[i, y, x, 8]])
                        label_bbox = torch.Tensor(
                            [labels[i, y, x, 0], labels[i, y, x, 1], labels[i, y, x, 2], labels[i, y, x, 3]])

                        # calculate iou of two bbox
                        iou1 = calculate_iou(pred_bbox1, label_bbox)
                        iou2 = calculate_iou(pred_bbox2, label_bbox)

                        # judge responsible box
                        if iou1 > iou2:
                            # calculate coord xy loss
                            loss_coord_xy += 5 * torch.sum((labels[i, y, x, 0:2] - preds[i, y, x, 0:2]) ** 2)

                            # coord wh loss
                            loss_coord_wh += torch.sum((labels[i, y, x, 2:4].sqrt() - preds[i, y, x, 2:4].sqrt()) ** 2)

                            # obj confidence loss
                            loss_obj += (iou1 - preds[i, y, x, 4]) ** 2
                            # loss_obj += (preds[i, y, x, 4] - 1) ** 2

                            # no obj confidence loss
                            loss_no_obj += 0.5 * ((0 - preds[i, y, x, 9]) ** 2)
                            # loss_no_obj += 0.5 * ((preds[i, y, x, 9] - 0) ** 2)
                        else:
                            # coord xy loss
                            loss_coord_xy += 5 * torch.sum((labels[i, y, x, 5:7] - preds[i, y, x, 5:7]) ** 2)

                            # coord wh loss
                            loss_coord_wh += torch.sum((labels[i, y, x, 7:9].sqrt() - preds[i, y, x, 7:9].sqrt()) ** 2)

                            # obj confidence loss
                            loss_obj += (iou2 - preds[i, y, x, 9]) ** 2
                            # loss_obj += (preds[i, y, x, 9] - 1) ** 2

                            # no obj confidence loss
                            loss_no_obj += 0.5 * ((0 - preds[i, y, x, 4]) ** 2)
                            # loss_no_obj += 0.5 * ((preds[i, y, x, 4] - 0) ** 2)

                        # class loss
                        loss_class += torch.sum((labels[i, y, x, 10:] - preds[i, y, x, 10:]) ** 2)

                    # this region has no object,背景
                    else:
                        loss_no_obj += 0.5 * torch.sum((0 - preds[i, y, x, [4, 9]]) ** 2)

        loss = loss_coord_xy + loss_coord_wh + loss_obj + loss_no_obj + loss_class  # five loss terms
        return loss / batch_size

train.py

import argparse
import os
import time

import matplotlib.pyplot as plt
import torch
from torch.optim import SGD
from torchvision import utils

from utils import create_dataloader, YOLOv1Loss, parse_cfg, build_model

# from torchviz import make_dot

parser = argparse.ArgumentParser(description='YOLOv1-pytorch')
parser.add_argument("--cfg", "-c", default="cfg/yolov1.yaml", help="Yolov1 config file path", type=str)
parser.add_argument("--dataset_cfg", "-d", default="cfg/dataset.yaml", help="Dataset config file path", type=str)
parser.add_argument("--weights", "-w", default="", help="Pretrained model weights path", type=str)
parser.add_argument("--output", "-o", default="output", help="Output path", type=str)
parser.add_argument("--epochs", "-e", default=100, help="Training epochs", type=int)
parser.add_argument("--lr", "-lr", default=0.002, help="Training learning rate", type=float)
parser.add_argument("--batch_size", "-bs", default=32, help="Training batch size", type=int)
parser.add_argument("--save_freq", "-sf", default=10, help="Frequency of saving model checkpoint when training",
                    type=int)
args = parser.parse_args()


def train(model, train_loader, optimizer, epoch, device, S, B, train_loss_lst):
    model.train()  # Set the module in training mode
    train_loss = 0
    for batch_idx, (inputs, labels) in enumerate(train_loader):
        t_start = time.time()
        inputs, labels = inputs.to(device), labels.to(device)
        outputs = model(inputs)

        # back prop
        criterion = YOLOv1Loss(S, B)  # 实例化一个网络模块
        loss = criterion(outputs, labels)  # 正向传播
        optimizer.zero_grad()  # 每一轮梯度清0
        loss.backward()  # 反向传播
        optimizer.step()  # 梯度更新
        train_loss += loss.item()  # 训练一个epoch总损失
        t_batch = time.time() - t_start

        # print loss and accuracy
        if batch_idx % 10 == 0:
            print('Train Epoch: {} [{}/{} ({:.1f}%)]  Time: {:.4f}s  Loss: {:.6f}'
                  .format(epoch, batch_idx * len(inputs), len(train_loader.dataset),
                          100. * batch_idx / len(train_loader), t_batch, loss.item()))

    # record training loss
    train_loss /= len(train_loader)  # 平均损失
    train_loss_lst.append(train_loss)  # 每个epoch的损失列表
    return train_loss_lst


def validate(model, val_loader, device, S, B, val_loss_lst):
    # 可以考虑保存coco指标或者mAP50
    model.eval()  # Sets the module in evaluation mode
    val_loss = 0
    # no need to calculate gradients
    with torch.no_grad():
        for data, target in val_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)

            # add one batch loss
            criterion = YOLOv1Loss(S, B)
            val_loss += criterion(output, target).item()

    val_loss /= len(val_loader)
    print('Val set: Average loss: {:.4f}\n'.format(val_loss))

    # record validating loss
    val_loss_lst.append(val_loss)
    return val_loss_lst


def test(model, test_loader, device, S, B):
    model.eval()  # Sets the module in evaluation mode
    test_loss = 0
    # no need to calculate gradients
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)

            # add one batch loss
            criterion = YOLOv1Loss(S, B)
            test_loss += criterion(output, target).item()

    # record testing loss
    test_loss /= len(test_loader)
    print('Test set: Average loss: {:.4f}'.format(test_loss))


if __name__ == "__main__":
    cfg = parse_cfg(args.cfg)
    dataset_cfg = parse_cfg(args.dataset_cfg)
    img_path, label_path = dataset_cfg['images'], dataset_cfg['labels']
    S, B, num_classes, input_size = cfg['S'], cfg['B'], cfg['num_classes'], cfg['input_size']

    # create output file folder
    start = time.strftime('%Y-%m-%d-%H-%M-%S', time.localtime(time.time()))
    output_path = os.path.join(args.output, start)
    os.makedirs(output_path)

    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    # build model
    model = build_model(args.weights, S, B, num_classes).to(device)
    train_loader, val_loader, test_loader = create_dataloader(img_path, label_path, 0.8, 0.1, 0.1, args.batch_size,
                                                              input_size, S, B, num_classes)

    optimizer = SGD(model.parameters(), lr=args.lr, momentum=0.9, weight_decay=0.0005)  # 可以加一个lr动态改变策略
    # optimizer = Adam(model.parameters(), lr=lr)

    train_loss_lst, val_loss_lst = [], []

    # train epoch
    for epoch in range(args.epochs):
        train_loss_lst = train(model, train_loader, optimizer, epoch, device, S, B, train_loss_lst)
        val_loss_lst = validate(model, val_loader, device, S, B, val_loss_lst)

        # save model weight every save_freq epoch
        if epoch % args.save_freq == 0 and epoch >= args.epochs / 2:
            torch.save(model.state_dict(), os.path.join(output_path, 'epoch' + str(epoch) + '.pth'))

    test(model, test_loader, device, S, B)

    # save model
    torch.save(model.state_dict(), os.path.join(output_path, 'last.pth'))

    # plot loss, save params change
    fig = plt.figure()
    plt.plot(range(args.epochs), train_loss_lst, 'g', label='train loss')
    plt.plot(range(args.epochs), val_loss_lst, 'k', label='val loss')
    plt.grid(True)
    plt.xlabel('epoch')
    plt.ylabel('acc-loss')
    plt.legend(loc="upper right")
    plt.savefig(os.path.join(output_path, 'loss_curve.jpg'))
    plt.show()
    plt.close(fig)

7.5 检测

一般过程为把待检测图像输入训练好的模型,预测得到pres(S, S, B*5+classes),通过NMS得到最终的bbox并绘制矩形框再保存。

import argparse
import os
import random
import shutil

import cv2
import numpy as np
from PIL import Image
from torchvision import transforms

from utils import parse_cfg, pred2xywhcc, build_model

parser = argparse.ArgumentParser(description='YOLOv1 Pytorch Implementation')
parser.add_argument("--weights", "-w", default="weights/last.pth", help="Path of model weight", type=str)
parser.add_argument("--source", "-s", default="dataset/VOC2007/JPEGImages",
                    help="Path of your input file source,0 for webcam", type=str)
parser.add_argument('--output', "-o", default='output', help='Output folder', type=str)
parser.add_argument("--cfg", "-c", default="cfg/yolov1.yaml", help="Your model config path", type=str)
parser.add_argument("--dataset_cfg", "-d", default="cfg/dataset.yaml", help="Your dataset config path", type=str)
parser.add_argument('--cam_width', "-cw", default=848, help='camera width', type=int)
parser.add_argument('--cam_height', "-ch", default=480, help='camera height', type=int)
parser.add_argument('--conf_thresh', "-ct", default=0.1, help='prediction confidence thresh', type=float)
parser.add_argument('--iou_thresh', "-it", default=0.3, help='prediction iou thresh', type=float)
args = parser.parse_args()

# random colors
COLORS = [[random.randint(0, 255) for _ in range(3)] for _ in range(100)]


def draw_bbox(img, bboxs, class_names):
    h, w = img.shape[0:2]
    n = bboxs.size()[0]
    bboxs = bboxs.detach().numpy()
    print(bboxs)
    for i in range(n):
        p1 = (int((bboxs[i, 0] - bboxs[i, 2] / 2) * w), int((bboxs[i, 1] - bboxs[i, 3] / 2) * h))
        p2 = (int((bboxs[i, 0] + bboxs[i, 2] / 2) * w), int((bboxs[i, 1] + bboxs[i, 3] / 2) * h))
        class_name = class_names[int(bboxs[i, 5])]
        # confidence = bboxs[i, 4]
        cv2.rectangle(img, p1, p2, color=COLORS[int(bboxs[i, 5])], thickness=2)
        cv2.putText(img, class_name, p1, cv2.FONT_HERSHEY_SIMPLEX, 0.8, COLORS[int(bboxs[i, 5])])
    return img


def predict_img(img, model, input_size, S, B, num_classes, conf_thresh, iou_thresh):
    """get model prediction of one image

    Args:
        img: image ndarray
        model: pytorch trained model
        input_size: input size
    Returns:
        xywhcc: predict image bbox
    """
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    pred_img = Image.fromarray(img).convert('RGB')

    transform = transforms.Compose([
        transforms.Resize((input_size, input_size)),
        transforms.ToTensor()
    ])
    pred_img = transform(pred_img)
    pred_img.unsqueeze_(0)

    pred = model(pred_img)[0].detach().cpu()
    xywhcc = pred2xywhcc(pred, S, B, num_classes, conf_thresh, iou_thresh)

    return xywhcc


if __name__ == "__main__":
    # load configs from config file
    cfg = parse_cfg(args.cfg)
    input_size = cfg['input_size']
    dataset_cfg = parse_cfg(args.dataset_cfg)
    class_names = dataset_cfg['class_names']
    print('Class names:', class_names)
    S, B, num_classes = cfg['S'], cfg['B'], cfg['num_classes']
    conf_thresh, iou_thresh, source = args.conf_thresh, args.iou_thresh, args.source

    # load model
    model = build_model(args.weights, S, B, num_classes)
    print('Model loaded successfully!')

    # create output folder
    if not os.path.exists(args.output):
        os.makedirs(args.output)

    # Image
    if source.split('.')[-1] in ['jpg', 'png', 'jpeg', 'bmp', 'tif', 'tiff', 'gif', 'webp']:
        img = cv2.imread(source)
        img_name = os.path.basename(source)

        xywhcc = predict_img(img, model, input_size, S, B, num_classes, conf_thresh, iou_thresh)
        if xywhcc.size()[0] != 0:
            img = draw_bbox(img, xywhcc, class_names)
            # save output img
            cv2.imwrite(os.path.join(args.output, img_name), img)

    # Folder
    elif source == source.split('.')[-1]:
        # create output folder
        output = os.path.join(args.output, source.split('/')[-1])
        if os.path.exists(output):
            shutil.rmtree(output)
            # os.removedirs(output)
        os.makedirs(output)

        imgs = os.listdir(source)
        for img_name in imgs:
            # img = cv2.imread(os.path.join(source, img_name))
            img = cv2.imdecode(np.fromfile(os.path.join(
                source, img_name), dtype=np.uint8), cv2.IMREAD_COLOR)
            # predict
            xywhcc = predict_img(img, model, input_size, S, B, num_classes, conf_thresh, iou_thresh)
            if xywhcc.size()[0] != 0:
                img = draw_bbox(img.copy(), xywhcc, class_names)
                # save output img
                cv2.imwrite(os.path.join(output, img_name), img)
            print(img_name)

8. 参考资料

  • https://github.com/ivanwhaf/yolov1-pytorch.git
  • You Only Look Once:Unified, Real-Time Object Detection