RESEARCH PAPER
Efficient Voice-Based Parkinson Classification via Algorithm-Level Class Balancing.
AI Summary
The paper evaluates data- vs algorithm-level class balancing and feature selection for voice-based machine learning PD diagnosis and reports that Fisher-score feature selection with finely tuned CatBoost achieves high classification metrics (accuracy 97%, AUC 0.96, F1 0.98).
Why It Matters
The work improves noninvasive diagnostic classification and data-handling practices useful for cohort selection and biomarker studies, but it does not investigate disease mechanisms or therapeutic targets and thus has limited direct value for Parkinson's therapeutic discovery.
Abstract
Parkinson's disease (PD) is a progressive neurodegenerative disorder, and the diagnostic procedures are very crucial in enhancing patient outcomes when performed on time and accurately. Machine learning procedures have shown promising results in identifying PD; however, the small sample sizes and strong imbalance of the classes, with substantially more patient samples than healthy ones, are often observed in voice features-based biomedical data from Parkinson's patients, which poses significant limitations to the model generalization and stability. The current study was motivated by the need to conduct an objective evaluation of the influence of data preprocessing and model-level approaches on performance under these constraints. In this regard, three scenario situations were developed. Recursive Feature Elimination was used first to reduce 18 salient features in the first two scenarios, and then resampling methods at the data level were applied: Instance Hardness Threshold undersampling and a hybrid oversampling regime comprising K-means Synthetic Minority Oversampling Technique (SMOTE), Borderline-SMOTE, and SMOTE-Tomek. In spite of the fact that these approaches moderated the imbalance between classes, they brought up side effects like loss of information and distortion of decision limits that were especially acute considering the small size and sensitivity of the PD data. To address these constraints, the third scenario adopted a feature selection method using Fisher score, which was found to be beneficial in reducing the feature redundancy, together with algorithm-level imbalance reduction using highly fine-tuned CatBoost and Support Vector Machine models, which were trained and evaluated to classify PD cases. This plan took advantage of the discriminative ability of the fine feature set and maintained data integrity. The results indicate that CatBoost in the third case achieved the best performance metrics (accuracy = 97%, area under the curve = 0.96, F1 = 0.98), and this in turn supports the fact that the combination of feature-level refinement and algorithmic adaptation is a comparatively stronger performance under benchmark evaluation conditions for diagnosis of PD. The study can be identified by its comprehensive design since it analyzes all of the scenarios systematically with equal experimental conditions and various train-test splits.