TY - JOUR
T1 - FSDA
T2 - Frequency re-scaling in data augmentation for corruption-robust image classification
AU - Nam, Ju Hyeon
AU - Lee, Sang Chul
N1 - Publisher Copyright:
© 2024 Elsevier Ltd
PY - 2024/6
Y1 - 2024/6
N2 - Modern convolutional neural networks (CNNs) are used in various applications, including computer vision, speech recognition, and robotics. However, practical usage in various applications requires large-scale datasets, and real-world data contains various corruptions that degrade the model's performance owing to the inconsistencies in the training and testing distributions. In this study, we propose Frequency re-Scaling Data Augmentation (FSDA) to improve the classification performance, robustness against corruption, and localizability of classifiers trained on various image classification datasets. Our method consists of two processes: mask generation process (MGP) and pattern re-scaling process (PSP). MGP clusters the frequency domain spectra to produce similar frequency patterns, and then PSP scales frequency by learning rescaling parameters from frequency patterns. Because the CNN classifies images by focusing on their structural features highlighted with FSDA, CNN trained with the proposed method has more robustness against corruption than that with the other data augmentations (DAs). Our technique outperforms the existing DAs on four public image classification datasets, including the CIFAR-10/100, STL-10, and ImageNet. Particularly, our strategy increases the robustness of the classifier against the different corruption errors by an average of 5.04% over the baseline.
AB - Modern convolutional neural networks (CNNs) are used in various applications, including computer vision, speech recognition, and robotics. However, practical usage in various applications requires large-scale datasets, and real-world data contains various corruptions that degrade the model's performance owing to the inconsistencies in the training and testing distributions. In this study, we propose Frequency re-Scaling Data Augmentation (FSDA) to improve the classification performance, robustness against corruption, and localizability of classifiers trained on various image classification datasets. Our method consists of two processes: mask generation process (MGP) and pattern re-scaling process (PSP). MGP clusters the frequency domain spectra to produce similar frequency patterns, and then PSP scales frequency by learning rescaling parameters from frequency patterns. Because the CNN classifies images by focusing on their structural features highlighted with FSDA, CNN trained with the proposed method has more robustness against corruption than that with the other data augmentations (DAs). Our technique outperforms the existing DAs on four public image classification datasets, including the CIFAR-10/100, STL-10, and ImageNet. Particularly, our strategy increases the robustness of the classifier against the different corruption errors by an average of 5.04% over the baseline.
KW - Convolutional neural networks
KW - Data augmentation
KW - Deep learning
KW - Frequency domain
KW - Image classification
UR - http://www.scopus.com/inward/record.url?scp=85184838830&partnerID=8YFLogxK
U2 - 10.1016/j.patcog.2024.110332
DO - 10.1016/j.patcog.2024.110332
M3 - Article
AN - SCOPUS:85184838830
SN - 0031-3203
VL - 150
JO - Pattern Recognition
JF - Pattern Recognition
M1 - 110332
ER -