IDENTIFYING STROKE RISK FACTORS IN MEDICAL DATASETS USING THE C4.5 ALGORITHM
Keywords:
C4.5, Database, Data Mining, Information Technology, StrokeAbstract
Stroke is a worldwide health problem that occurs significantly. Based on data from the World Health Organization (WHO), there are 13.7 million new stroke incidents each year, resulting in 5.5 million deaths. Data mining is the process of collecting, processing, and analyzing data to obtain important information. The C4.5 algorithm is the most commonly used in data mining to build decision trees based on labeled data. The problem this study aims to solve is the difficulty and time-consuming nature of processing health data to determine the risk of stroke. The public has difficulty getting correct and reliable information about stroke. For data collection techniques from datasets taken from the Kaggle site with the Stroke Prediction dataset with 5110 records and 12 attributes, as well as literature studies that search for information related to this study to help apply the C4.5 data mining algorithm in detecting the causes of stroke. The libraries are from various media such as the internet, books, journals, and other media. The decision tree algorithm in data mining can be applied to detect the cause of a person having a stroke. After manual calculations and using RapidMiner software, the accuracy of the prediction results is a benchmark for how effective this algorithm is in identifying the cause of stroke in a person, by producing a classification model with an accuracy rate of 94.89% and an AUC value of 0.709. 11 respondents filled out the questionnaire with the results being 45.44% choosing strongly agree, 44.56% choosing agree, and 10% being neutral.