Enhancing Sundanese News Articles Classification: A Comparative Study of Models and Feature Extraction Techniques
Main Article Content
Abstract
This paper presents a comprehensive investigation into the classification of Sundanese news articles, focusing on the evaluation of various classification models and feature extraction methods. Using a dataset obtained from Sundanese news websites, this study conducts a systematic comparison of Naive Bayes and Logistic Regression classifiers combined with TF-IDF and Bag-of-Words feature extraction methods. The research process involves critical steps such as data preprocessing, model training, hyperparameter optimization, and performance assessment based on standard metrics, including accuracy, precision, recall, and F1-score. Results demonstrate high accuracy across all combinations, with the Logistic Regression model using Bag-of-Words feature extraction achieving the highest accuracy of 98.20%. Beyond model evaluation, the research delves into qualitative data analysis. Word clouds and TF-IDF weighting are employed to uncover prominent themes and topics within the news articles, highlighting recurring patterns in the Sundanese language. The study identifies key challenges, including the scarcity of annotated datasets for low-resource languages like Sundanese and the limitations of traditional models in capturing complex linguistic structures. Future opportunities are highlighted, such as leveraging deep learning models, including transformers, to enhance classification performance and address current limitations. Additionally, ensemble methods and domain-specific adaptations could further improve accuracy. Overall, this research contributes to advancing Sundanese language processing and provides a roadmap for future innovations in text classification and natural language processing applications.
Downloads
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
The Authors submitting a manuscript do so on the understanding that if accepted for publication, copyright of the article shall be assigned to journal Tech-E, Universitas Buddhi Dharma as publisher of the journal.
Copyright encompasses exclusive rights to reproduce and deliver the article in all form and media, including reprints, photographs, microfilms and any other similar reproductions, as well as translations. The reproduction of any part of this journal, its storage in databases and its transmission by any form or media, such as electronic, electrostatic and mechanical copies, photocopies, recordings, magnetic media, etc. , will be allowed only with a written permission from journal Tech-E.
journal Tech-E, the Editors and the Advisory Editorial Board make every effort to ensure that no wrong or misleading data, opinions or statements be published in the journal. In any way, the contents of the articles and advertisements published in the journal Tech-E, Universitas Buddhi Dharma are sole and exclusive responsibility of their respective authors and advertisers.

