Enhancing Sundanese News Articles Classification: A Comparative Study of Models and Feature Extraction Techniques

Main Article Content

Authors

    Yadhi A. Permana( 1 ) Irwan Setiawan( 2 ) Fitri Diani( 3 ) Suprihanto( 4 )

    (1) Politeknik Negeri Bandung | Indonesia
    (2) Politeknik Negeri Bandung | Indonesia
    (3) Politeknik Negeri Bandung | Indonesia
    (4) Politeknik Negeri Bandung | Indonesia

Abstract

This paper presents a comprehensive investigation into the classification of Sundanese news articles, focusing on the evaluation of various classification models and feature extraction methods. Using a dataset obtained from Sundanese news websites, this study conducts a systematic comparison of Naive Bayes and Logistic Regression classifiers combined with TF-IDF and Bag-of-Words feature extraction methods. The research process involves critical steps such as data preprocessing, model training, hyperparameter optimization, and performance assessment based on standard metrics, including accuracy, precision, recall, and F1-score. Results demonstrate high accuracy across all combinations, with the Logistic Regression model using Bag-of-Words feature extraction achieving the highest accuracy of 98.20%. Beyond model evaluation, the research delves into qualitative data analysis. Word clouds and TF-IDF weighting are employed to uncover prominent themes and topics within the news articles, highlighting recurring patterns in the Sundanese language. The study identifies key challenges, including the scarcity of annotated datasets for low-resource languages like Sundanese and the limitations of traditional models in capturing complex linguistic structures. Future opportunities are highlighted, such as leveraging deep learning models, including transformers, to enhance classification performance and address current limitations. Additionally, ensemble methods and domain-specific adaptations could further improve accuracy. Overall, this research contributes to advancing Sundanese language processing and provides a roadmap for future innovations in text classification and natural language processing applications.

Downloads

Download data is not yet available.

Article Details

Section
Articles

Abstract views: 31 / PDF downloads: 20