文献综述
In terms of visual feature expression, deep convolutional neural networks are more resilient than basic convolutional neural networks, and as a result, target identification utilizing deep convolutional neural networks, including two-stage models and one-stage models123, is being continuously developed. Target detection is represented by the R-CNN (Region Conventional Neural Network) 124-27l series, which was developed by Girshick et al. to represent the two-stage process. Its a candidate-regions-based method, as the name implies. The two-stage target detection approach is divided into two steps: the first is to gather a collection of candidate frames, and the second is to perform target classification and position modification based on the candidate frames that have been picked in order to generate more precise findings. Figure 1. 1 R-CNN algorithm flowAs seen in Figure 1.1 represents the R-CNN algorithm flow 24], which begins by selecting approximately 2K candidate regions to be detected from the input picture using Selective Search 128I, followed by using an SVM (Support Vector Machine) classifier to obtain the target category, and finally by adjusting the size of the target box using bounding box regression. Because of its ability to provide superior detection results and to modify the size of the target frame, however, the R-CNN target detector takes a long time to detect and uses a large amount of storage space. It has resulted in the improvement of a large number of algorithms. Consider the work of Ren et al. who developed a more refined version of the R-CNN algorithm, presented the concept of SPP-Net (Spatial Pyramid Pooling Networks) [29 method, and proposed Fast R-CNN1251 and subsequently suggested Fast R-CNN1252 algorithms. When extracting the features of the region of interest from the full image information, the FasterR-CNN1261 algorithm uses the region of interest pooling layer (ROI Pooling Layer) [30I, and instead of using selective search to generate the region to be detected, it uses the region candidate network (RPN) to generate the region to be detected, among other things. He Kaiming et al. presented the R-CNN27 mask in 2017 to perform a variety of tasks such as target recognition, semantic segmentation, instance segmentation, and other similar tasks. For the purpose of compensating for the ROI during the pooling process, ROI Align and bilinear difference are used to fill in the pixels of non-integer locations in the pooling process. The issue occurred as a result of the pooled rounding and zeroing procedure. Figure 1.2 YOLO algorithm flowThe two-stage model has produced acceptable detection results in terms of detection accuracy, but its real-time performance is poor. As a result, some researchers believe that eliminating the region candidate step and focusing just on target location and recognition will solve the problem. 2016 Redmon et al The YOLO (You Only Look Once) method is presented as a one-stage deep learning model. As demonstrated in Figure 1.2, the primary concept is to divide the picture into K x K grids and then utilize these tiny grids as the foundation for target detection and location. If the detected targets center falls within a tiny grid, that grid must be responsible for detecting the target. At the same time, these split tiny grids must forecast numerous bounding boxes (Bounding Boxes) and confidence, as well as many conditional category probabilities, i.e., each bounding box has five parameters: relative The box center (x, y) of the grid cell bounding box, the bounding box width and height (w, h), the number of projected categories C, and the Confidence level. However, because the first-generation YOLO algorithm can only forecast two bounding boxes and one category for each tiny grid division, the rate of missed detection for dense small objects is quite high. The SSD (Single Shot Multibox Detector) [3M! technique, proposed by Liu et al. in the same year, employs the VGG16/351 network as the backbone network and leverages multi-scale feature maps for target recognition via convolution operations.The YOLOv2 algorithm was developed by Redmon and colleagues in order to increase the detection effect. As a result, the Darknet-19 network is utilized as the backbone network, with a BN (Batch Normalization) layer added to each convolutional layer, which not only increases the convergence speed, but also ensures that there are no overfitting phenomena. When compared to YOLOv2, YOLOv3 splits the feature map into 13 x 13 tiny grids, with each grid predicting three bounding boxes, which is a significant improvement. At the same time, the Softmax function is no longer employed when categorizing the target object, and instead, more precise classification is used. Each classifier merely assesses if the target appearing in the target frame corresponds to the current label, which not only enhances the detection efficiency but also allows for multi-label classification to be achieved, as seen in Figure 1. Later algorithms included the YOLOv4 algorithm and the YOLOv54I algorithm. As of now, the YOLO algorithm has undergone several advancements and applications, including use in low-altitude UAV detection and the identification of tiny targets.With the continuing development of target detection algorithms based on deep learning and the widespread use of target detection technology in a variety of industries, target detection technology is also confronted with a number of new obstacles and problems, including the following:(1) How to enhance target identification accuracy while simultaneously improving performance using popular methods. Target detection accuracy is extremely important in practical applications because of the high demands placed on it. Only when the target detection accuracy achieves a specific level of precision can it be extensively applied in real-world situations and at work.(2) The best way to strike a compromise between detection accuracy and detection speed. The assessment indicators for target detection accuracy and detection speed are diametrically opposed in the target detection job. Among other things, the two-stage model R-CNN series has a high detection accuracy, but the one-stage model YOLO series has a quick response time when it comes to detection speed. It is also a significant task to strike a balance between these two indications.(3) How to increase the fluency of real-time detection algorithms that are based on conventional target detection techniques. However, when used directly for video detection, the traditional target detection method has a poor detection impact on the picture, but it has a good effect on the image when used indirectly.Based on the third and fourth generation YOLO algorithms, this paper will focus on the above three problems, improving the network structure of the algorithm to improve the detection accuracy of the detector, and using the rich context information in the video to improve the detection mechanism and optimize the real-time Object detection performance.参考文献[1] Peng Jishen, Sun Lixin, Wang Kai, et al. ED-YOLO power inspection UAV obstacle avoidance target detection algorithm based on model compression [J]. Journal of Instrumentation, 2021, 42(10):10.[2] Liu Xinrou, Li Yang, Song Wenjun. Object detection algorithm in industrial scene based on SlimYOLOv3 [J]. Computer Application Research, 2021.[3] Tang Yue, Wu Ge, Pu Yan. Improved GDT-YOLOV 3 target detection algorithm [J]. Liquid Crystal and Display, 2020, 35(8):9.[4] Ma Linlin, Ma Jianxin, Han Jiafang, et al. Research on target detection algorithm based on YOLOv5s [J]. Computer Knowledge and Technology: Academic Edition, 2021, 17(23):4.[5] Jiang Wenzhi, Li Bingzhen, Gu Jiaojiao, et al. Ship target detection algorithm based on improved YOLO V3 [J]. Electro-Optics and Control, 2021, 28(6):6.[6] Liang Qinjia, Liu Huai, Lu Fei. Research on traffic video target detection algorithm based on improved YOLOv3 model [J]. Journal of Nanjing Normal University: Engineering Technology Edition, 2021, 21(2):7.[7] Sheng Mingwei, Li Jun, Qin Hongde, et al. Ship target detection algorithm based on improved YOLOv3 [J]. Navigation and Control, 2021, 20(2):15.[8] Zhang Taoning. Research on fast target detection algorithm based on improved YOLOv3 model.[9] Chen Jun. Research and implementation of target detection based on YOLOv3 algorithm [D]. University of Electronic Science and Technology of China.[10] Yang Fan. Research on remote sensing image target detection algorithm based on YOLO [D]. Chengdu University of Technology.[11] Tang Songyan. Research and application of aerial target detection algorithm based on YOLOv3 [D]. Huazhong University of Science and Technology.[12] Xu Rong. Research on small target detection algorithm based on YOLOv3 [D]. Nanjing University of Posts and Telecommunications.[13] Sun Jia. Real-time target detection based on improved YOLO algorithm [D]. Shanxi University.[14] Zheng Jiahui. Pedestrian video target detection method based on YOLOv3 [D]. Xidian University, 2019.[15] Chen Jun. Research on fusion detection algorithm of multi-source remote sensing image sea surface target based on R-YOLO [D]. Huazhong University of Science and Technology.
资料编号:[576932]
课题毕业论文、文献综述、任务书、外文翻译、程序设计、图纸设计等资料可联系客服协助查找。