Assessing Internal Reproducibility Within a Parkinson’s Disease Cohort by Leveraging an Independent Larger Dataset
Abstract
Background: Parkinson’s disease (PD) is a complex and heterogeneous disorder that is likely composed of several phenotypic subgroups with distinct clinical features and patterns of disease progression. Cluster analysis, which categorizes subjects into groups of “maximal similarity”, is a valuable statistical tool for characterizing phenotypic variability in clinical cohorts and for correlating phenotypes with specific biomarkers. However, data collection methods often differ between clinical and research settings, limiting the ability to obtain statistically significant results from smaller or less characterized cohorts and to compare results across studies. Establishing reproducibility of clinical cluster analysis across different studies/centers would allow generalizability across studies. The goal of this study was to leverage cluster analysis of clinical traits to establish reproducibility of clinical phenotypes in a cohort of patients with PD at local centers (Discovery cohort) and the large PD bioregistry Parkinson’s Progression Markers Initiative (PPMI cohort).
Methods: Nonhierarchical k-means clustering by phenotype of subjects in the Discovery (n = 179) and PPMI (n = 368) cohorts was performed via principal component analysis (cohort-based clusters). Eigenvectors of clustering in the PPMI cohort were identified and utilized to re-cluster the Discovery cohort (PPMI-based clusters). Overlap in cluster membership between cohort-based clusters and PPMI-based clusters of the Discovery cohort was assessed.
Results: Clustering of subjects revealed two clusters in the Discovery cohort and three clusters in the PPMI cohort. The first four principal components for clustering of the PPMI cohort, accounting for 43% of the variability, were driven by depression, anxiety, age at symptom onset, gender, and a tremor-dominant phenotype. After re-clustering the Discovery cohort based on these traits, 89% of subjects remained in their original cluster (κ = 0.776, P < 0.01).
Conclusions: We successfully leveraged cluster analysis of clinical traits in PD patients from the larger and standardized PPMI cohort to validate reproducibility of clustering in our smaller Discovery cohort. We propose a combination of nonhierarchical cluster analysis and testing of generalizability with re-clustering to establish clustering reproducibility. This method can be adapted for use in a wide range of clinical scenarios, allowing for analysis of cohorts that are less extensively characterized or those with low intrinsic power secondary to low sample size.
J Neurol Res. 2024;14(2):49-58
doi: https://doi.org/10.14740/jnr761