方法1:纯python语言实现:简介方便、速度慢
方法2:直接利用Cython模块编译
方法3:先将全部变量定义为静态类型,再利用Cython模块编译
方法4:在方法3的基础上再加入cuda加速模块, 再利用Cython模块编译,即利用gpu加速
一. 几点说明
1. 简单说明Cython:
Cython是一个快速生成Python扩展模块的工具,从语法层面上来讲是Python语法和C语言语法的混血,当Python性能遇到瓶颈时,Cython直接将C的原生速度植入Python程序,这样使Python程序无需使用C重写,能快速整合原有的Python程序,这样使得开发效率和执行效率都有很大的提高,而这些中间的部分,都是Cython帮我们做了。
2. 简单介绍NMS:
Faster-RCNN中有两处使用NMS,第一处是训练+预测的时候,利用ProposalCreator来生成proposal的时候,因为只需要一部分proposal,所以利用NMS进行筛选。第二处使用是预测的时候,当得到300个分类与坐标偏移结果的时候,需要对每个类别逐一进行非极大值抑制。也许有人问为什么对于每个类别不直接取置信度最高的那一个?因为一张图中某个类别可能不止一个,例如一张图中有多个人,直接取最高置信度的只能预测其中的一个人,而通过NMS理想情况下可以使得每个人(每类中的每个个体)都会有且仅有一个bbox框。
二. 四种方法实现
1. 纯python实现:nms_py.py
#!/usr/bin/env python3 # -*- coding: utf-8 -*- """ Created on Mon May 7 21:45:37 2018 @author: lps """ import numpy as np boxes=np.array([[100,100,210,210,0.72], [250,250,420,420,0.8], [220,220,320,330,0.92], [100,100,210,210,0.72], [230,240,325,330,0.81], [220,230,315,340,0.9]]) def py_cpu_nms(dets, thresh): # dets:(m,5) thresh:scaler x1 = dets[:,0] y1 = dets[:,1] x2 = dets[:,2] y2 = dets[:,3] areas = (y2-y1+1) * (x2-x1+1) scores = dets[:,4] keep = [] index = scores.argsort()[::-1] while index.size >0: i = index[0] # every time the first is the biggst, and add it directly keep.append(i) x11 = np.maximum(x1[i], x1[index[1:]]) # calculate the points of overlap y11 = np.maximum(y1[i], y1[index[1:]]) x22 = np.minimum(x2[i], x2[index[1:]]) y22 = np.minimum(y2[i], y2[index[1:]]) w = np.maximum(0, x22-x11+1) # the weights of overlap h = np.maximum(0, y22-y11+1) # the height of overlap overlaps = w*h ious = overlaps / (areas[i]+areas[index[1:]] - overlaps) idx = np.where(ious<=thresh)[0] index = index[idx+1] # because index start from 1 return keep import matplotlib.pyplot as plt def plot_bbox(dets, c='k'): x1 = dets[:,0] y1 = dets[:,1] x2 = dets[:,2] y2 = dets[:,3] plt.pl<em style="color:transparent">本文来源[email protected]搞@^&代*@码)网9</em>ot([x1,x2], [y1,y1], c) plt.plot([x1,x1], [y1,y2], c) plt.plot([x1,x2], [y2,y2], c) plt.plot([x2,x2], [y1,y2], c) plt.title("after nms") plot_bbox(boxes,'k') # before nms keep = py_cpu_nms(boxes, thresh=0.7) plot_bbox(boxes[keep], 'r')# after nms
结果大致这样:
新建nms文件夹,将nms_py.py 和__init__.py(空)文件放在其内成为包,可以调用。然后在nms文件夹外新建测试运行时间脚本 test_num.py:
import numpy as np import time from nms.nums_py import py_cpu_nms # for cpu #from nms.gpu_nms import gpu_nms # for gpu np.random.seed( 1 ) # keep fixed num_rois = 6000 minxy = np.random.randint(50,145,size=(num_rois ,2)) maxxy = np.random.randint(150,200,size=(num_rois ,2)) score = 0.8*np.random.random_sample((num_rois ,1))+0.2 boxes_new = np.concatenate((minxy,maxxy,score), axis=1).astype(np.float32) def nms_test_time(boxes_new): thresh = [0.7,0.8,0.9] T = 50 for i in range(len(thresh)): since = time.time() for t in range(T): keep = py_cpu_nms(boxes_new, thresh=thresh[i]) # for cpu # keep = gpu_nms(boxes_new, thresh=thresh[i]) # for gpu print("thresh={:.1f}, time wastes:{:.4f}".format(thresh[i], (time.time()-since)/T)) return keep if __name__ =="__main__": nms_test_time(boxes_new)