Vehicle Color Recognition (VCR) is vital in intelligent traffic management and criminal investigation assistance. However, the existing vehicle color datasets only cover 13 classes, which can not meet the current actual demand. Besides, although lots of efforts are devoted to VCR, they suffer from the problem of class imbalance in datasets.
To solve the problems, a research team led by Mingdi HU published their new research in Frontiers of Computer Science co-published by Higher Education Press and Springer·Nature.
The team propose a novel VCR method based on Smooth Modulation Neural Network with Multi-Scale Feature Fusion (SMNN-MSFF). They present a new VCR dataset with 24 vehicle classes, Vehicle Color-24. And propose the SMNN-MSFF model with multiscale feature fusion and smooth modulation. The former aims to extract feature information from local to global, and the latter could increase the loss of the images of tail class instances for training with class-imbalance. Extensive ablation studies demonstrate that each module of the proposed method is effective, especially the smooth modulation efficiently helps feature learning of the minority or tail classes. Comprehensive experimental evaluation on Vehicle Color-24 and previously three representative datasets demonstrate that the proposed SMNN-MSFF outperformed state-of-the-art VCR methods.
In the research, they built a new dataset with 24 vehicle colors, called Vehicle Color-24. The colors of Vehicle Color-24 are divided into 24 types, including red, dark-red, pink, orange, dark-orange, red-orange, yellow, lemon-yellow, earthy-yellow, green, dark-green, grass-green, cyan, blue, dark-blue, purple, black, white, silver-gray, gray, dark-gray, champagne, brown and dark-brown. Vehicle Color-24 can make up for the current needs of practical vehicle traffic management and criminal vehicle tracking applications.
Then, they propose a novel vehicle color recognition method based on SMNN-MSFF. Firstly, this algorithm starts to pay attention to the color distribution imbalance nature existing in any dataset. The loss function fine-tunes the network so that the algorithm can better capture the characteristics of small-scale classes than focal loss through ablation experiments. Secondly, this network adds an FPN module to extract edges and corners information, which is helpful to extract vehicle shape features and local location information to assist vehicle recognition. Thirdly, this backbone network is designed with only 42 layers, which belong to a lightweight network, to relieve the pressure of storage and increase the possibility of implementation in practical applications.
The experimental results show that the mAP of our method in our paper is 94.96% in recognizing 24 types of colors. The proposed SMNN-MSFF outperformed state-of-the-art VCR methods, and better meet the requirements for fine classification of vehicle colors.
However, since the actual environment can be affected by unpredictable factors and the long tail effect exists in vehicle color distribution, further efforts to improve the fine recognition of vehicle color are still required. Future work will continue studying the solution of class imbalance because the vehicle color is diverse, and the vehicle color dataset must have the characteristics of the long-tail distribution.