RESEARCH PAPER
Machine Learning Approaches to Early Detection of Parkinson's Disease Using Speech Analysis Technique.
Abstract
BACKGROUND: Parkinson's disease (PD) is a progressive neurodegenerative disorder that affects millions globally, particularly those in the elderly population. Several occupational exposures typical of maritime environments are recognized or suspected risk factors for PD, warranting attention within occupational health frameworks. The disease is characterized by motor symptoms such as tremor, rigidity, and bradykinesia, as well as non-motor impairments including speech abnormalities.
OBJECTIVE: Early diagnosis is crucial for effective disease management but remains challenging due to symptoms overlapping with normal aging and other neurological conditions. This study presents a machine learning (ML)-based approach for the early diagnosis of PD using speech signal analysis.
METHODS: We employed six supervised ML classifiers to differentiate between PD patients and healthy controls based on vocal features. The experimental dataset, MDVR-KCL, consists of speech recordings from both reading tasks and spontaneous dialogs, collected via mobile devices. From these recordings, we extracted Mel-Frequency Cepstral Coefficients (MFCCs), Gammatone Frequency Cepstral Coefficients (GTCCs), and acoustic features such as jitter, shimmer, and harmonic-to-noise ratio. These features capture a broad range of prosodic, spectral, and articulatory characteristics associated with PD-related speech impairments. Speaker diarization was applied in spontaneous dialog recordings to separate participant speech. Hyperparameter tuning was performed using GridSearchCV with 10-fold cross-validation, while final model evaluation was conducted using Leave-One-Subject-Out Cross-Validation (LOSOCV) to ensure subject-independent performance assessment.
RESULTS: In the read-text task, the SVM model performed exceptionally, yielding 95.45% accuracy, 94.62% sensitivity, 95.97% specificity, an F1-score of 94.12%, and an AUC of 0.98 with an MCC value of 0.90, for GTCCs with the acoustic features. In the spontaneous dialog task, the XGB model demonstrated the highest overall performance across all metrics, with a test accuracy of 83.7%, a sensitivity of 76.3.9%, a specificity of 88.9%, an F1-score of 79.5%, an AUC value of 0.88, and an MCC value of 0.66.
CONCLUSIONS: Comparable results were obtained on both spontaneous dialog and reading speech subsets, demonstrating the robustness of the approach across different speaking contexts. These results demonstrate the effectiveness of integrating cepstral and acoustic features with machine learning models for non-invasive PD classification. The findings support the use of speech-based digital biomarkers in early PD detection and highlight the potential for developing scalable tools. This work highlights the potential of speech-based digital diagnostics to support clinical decision-making and improve patient outcomes.