Optimization of CNN and Vision Transformer Models in Addressing Long-Tailed Data Imbalance for Satellite Cloud Image Classification

Main Article Content

Authors

    Revatta Manggala Nandivadhano( 1 ) Aditiya Hermawan( 2 ) Lidya Lunardi( 3 )

    (1) Buddhi Dharma University | Indonesia
    (2) Buddhi Dharma University | Indonesia
    (3) Buddhi Dharma University | Indonesia

Abstract

This study investigates long-tailed satellite cloud image classification by comparing CNN and Vision Transformers (ViT) built upon vision–language foundation models. A large-scale satellite cloud dataset with 11 highly imbalanced classes, including a dominant non-phenomenon category, is used to represent realistic atmospheric variability. The data are split using stratified sampling, standardized to a fixed resolution, and used to fine-tune CLIP-based backbones from RemoteCLIP and GeoRSCLIP through parameter-efficient adaptation. Several loss functions Cross Entropy, Logit Adjustment, Focal, Class-Balanced, and label-distribution–aware variants are evaluated, along with experiments examining majority-class removal and adapter bottleneck adjustments. Initial results show that Logit Adjustment causes majority-class collapse under default settings. After optimization, ViT-based models consistently outperform CNN models, achieving higher accuracy and more balanced macro-level performance. Class-Balanced loss emerges as the most effective objective, offering a strong trade-off between overall accuracy and per-class fairness. Increasing the adapter bottleneck dimension further boosts ViT performance, enabling the best configuration to match or exceed prior benchmarks while improving minority-class recognition. The final optimized model is deployed in a web-based prediction system, demonstrating the practical potential of foundation-model approaches for satellite-driven weather analysis.

Downloads

Download data is not yet available.

Article Details

Section
Articles

References

T. B. Turrisi et al., “Seasons, weather, and device-measured movement behaviors: a scoping review from 2006 to 2020,” Int. J. Behav. Nutr. Phys. Act., vol. 18, no. 1, p. 24, Feb. 2021, doi: 10.1186/s12966-021-01091-1.

B. Clarke, F. Otto, R. Stuart-Smith, and L. Harrington, “Extreme weather impacts of climate change: an attribution perspective,” Environ. Res. Clim., vol. 1, no. 1, p. 012001, Sep. 2022, doi: 10.1088/2752-5295/ac6e7d.

S. Fawzy, A. I. Osman, J. Doran, and D. W. Rooney, “Strategies for mitigation of climate change: a review,” Environ. Chem. Lett., vol. 18, no. 6, pp. 2069–2094, Nov. 2020, doi: 10.1007/s10311-020-01059-w.

C. Bai, M. Zhang, J. Zhang, J. Zheng, and S. Chen, “LSCIDMR: Large-Scale Satellite Cloud Image Database for Meteorological Research,” IEEE Trans. Cybern., vol. 52, no. 11, pp. 12538–12550, Nov. 2022, doi: 10.1109/TCYB.2021.3080121.

S. Shang, J. Zhang, X. Wang, X. Wang, Y. Li, and Y. Li, “Faster and Lighter Meteorological Satellite Image Classification by a Lightweight Channel-Dilation-Concatenation Net,” IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens., vol. 16, pp. 2301–2317, 2023, doi: 10.1109/JSTARS.2023.3243915.

E. Chuvieco, Fundamentals of Satellite Remote Sensing. CRC Press, 2020. doi: 10.1201/9780429506482.

F. A. Diaz-Gonzalez, J. Vuelvas, C. A. Correa, V. E. Vallejo, and D. Patino, “Machine learning and remote sensing techniques applied to estimate soil indicators – Review,” Ecol. Indic., vol. 135, p. 108517, Feb. 2022, doi: 10.1016/j.ecolind.2021.108517.

W. Han et al., “A survey of machine learning and deep learning in remote sensing of geological environment: Challenges, advances, and opportunities,” ISPRS J. Photogramm. Remote Sens., vol. 202, pp. 87–113, Aug. 2023, doi: 10.1016/j.isprsjprs.2023.05.032.

F. Li, T. Yigitcanlar, M. Nepal, K. Nguyen, and F. Dur, “Machine learning and remote sensing integration for leveraging urban sustainability: A review and framework,” Sustain. Cities Soc., vol. 96, p. 104653, Sep. 2023, doi: 10.1016/j.scs.2023.104653.

M. Marjani, M. Mahdianpari, F. Mohammadimanesh, and E. W. Gill, “CVTNet: A Fusion of Convolutional Neural Networks and Vision Transformer for Wetland Mapping Using Sentinel-1 and Sentinel-2 Satellite Data,” Remote Sens., vol. 16, no. 13, p. 2427, Jul. 2024, doi: 10.3390/rs16132427.

M. Segal-Rozenhaimer, A. Li, K. Das, and V. Chirayath, “Cloud detection algorithm for multi-modal satellite imagery using convolutional neural-networks (CNN),” Remote Sens. Environ., vol. 237, p. 111446, Feb. 2020, doi: 10.1016/j.rse.2019.111446.

A. Galdran, G. Carneiro, and M. A. G. Ballester, “Convolutional Nets Versus Vision Transformers for Diabetic Foot Ulcer Classification,” Nov. 2021, doi: 10.48550/arXiv.2111.06894.

Z. Zhang, M. Lu, S. Ji, H. Yu, and C. Nie, “Rich CNN Features for Water-Body Segmentation from Very High Resolution Aerial and Satellite Imagery,” Remote Sens., vol. 13, no. 10, p. 1912, May 2021, doi: 10.3390/rs13101912.

A. Dosovitskiy et al., “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” Jun. 2021, [Online]. Available: http://arxiv.org/abs/2010.11929

M. Kaselimi, A. Voulodimos, I. Daskalopoulos, N. Doulamis, and A. Doulamis, “A Vision Transformer Model for Convolution-Free Multilabel Classification of Satellite Imagery in Deforestation Monitoring,” IEEE Trans. Neural Networks Learn. Syst., vol. 34, no. 7, pp. 3299–3307, Jul. 2023, doi: 10.1109/TNNLS.2022.3144791.

R. Rad, “Vision Transformer for Multispectral Satellite Imagery: Advancing Landcover Classification,” in 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), IEEE, Jan. 2024, pp. 8161–8168. doi: 10.1109/WACV57701.2024.00799.

R. Yousaf et al., “Satellite Imagery-Based Cloud Classification Using Deep Learning,” Remote Sens., vol. 15, no. 23, p. 5597, Dec. 2023, doi: 10.3390/rs15235597.

J.-X. Shi, T. Wei, Z. Zhou, J.-J. Shao, X.-Y. Han, and Y.-F. Li, “Long-Tail Learning with Foundation Model: Heavy Fine-Tuning Hurts,” Jun. 2024, [Online]. Available: http://arxiv.org/abs/2309.10019

F. Liu et al., “RemoteCLIP: A Vision Language Foundation Model for Remote Sensing,” IEEE Trans. Geosci. Remote Sens., vol. 62, pp. 1–16, 2024, doi: 10.1109/TGRS.2024.3390838.

R. Reedha, E. Dericquebourg, R. Canals, and A. Hafiane, “Transformer Neural Network for Weed and Crop Classification of High Resolution UAV Images,” Remote Sens., vol. 14, no. 3, pp. 1–20, 2022, doi: 10.3390/rs14030592.

Z. Zhang, T. Zhao, Y. Guo, and J. Yin, “RS5M and GeoRSCLIP: A Large-Scale Vision- Language Dataset and a Large Vision-Language Model for Remote Sensing,” IEEE Trans. Geosci. Remote Sens., vol. 62, pp. 1–23, 2024, doi: 10.1109/TGRS.2024.3449154.

S. Kim, L. Chen, and J. Kim, “Intrusion Prediction using LSTM and GRU with UNSW-NB15,” in 2021 Computing, Communications and IoT Applications (ComComAp), 2021, pp. 101–106. doi: 10.1109/ComComAp53641.2021.9652926.

M. Vlaminck, R. Heidbuchel, W. Philips, and H. Luong, “Region-Based CNN for Anomaly Detection in PV Power Plants Using Aerial Imagery,” Sensors, vol. 22, no. 3, pp. 1–18, 2022, doi: 10.3390/s22031244.

Z. Wang, J. Zhao, R. Zhang, Z. Li, Q. Lin, and X. Wang, “Uatnet: U-shape attention-based transformer net for meteorological satellite cloud recognition,” Remote Sens., vol. 14, no. 1, 2022, doi: 10.3390/rs14010104.

Y. Tong, W. Lu, Y. Yu, and Y. Shen, “Application of machine learning in ophthalmic imaging modalities,” Eye Vis., vol. 7, no. 1, pp. 1–15, 2020, doi: 10.1186/s40662-020-00183-6.

V. R. Joseph, “Optimal ratio for data splitting,” Stat. Anal. Data Min. ASA Data Sci. J., vol. 15, no. 4, pp. 531–538, Aug. 2022, doi: 10.1002/sam.11583.

A. E. Berndt, “Sampling Methods,” J. Hum. Lact., vol. 36, no. 2, pp. 224–226, May 2020, doi: 10.1177/0890334420906850.

A. Radford et al., “Learning Transferable Visual Models From Natural Language Supervision,” Feb. 2021, [Online]. Available: http://arxiv.org/abs/2103.00020

A. K. Menon, S. Jayasumana, A. S. Rawat, H. Jain, A. Veit, and S. Kumar, “Long-tail learning via logit adjustment,” Jul. 2021, [Online]. Available: http://arxiv.org/abs/2007.07314

T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal Loss for Dense Object Detection,” Feb. 2018, [Online]. Available: http://arxiv.org/abs/1708.02002


Abstract views: 161 / PDF downloads: 96