Validating the impact of sample size for modeling brain and behaviour interactions with canonical correlation analysis.
The reproducibility crisis in neuroimaging studies has led to an increase in sample size to improve the power of analyses. This increase in sample size has coincided with the deep phenotyping of behaviour and genetic data of individuals in a sample. These rich datasets can help study the relationship between domains, such as brain and behaviour. One of the methods able to relate neuroimaging data with behavioural phenotypes is Canonical Correlation Analysis (CCA). CCA is a multivariate technique that can describe how measures from different domains vary together. However, recent work suggests that thousands of individuals are required in this type of multivariate analysis to obtain consistently reproducible results. We wish to further investigate the effects of sample size on brain-behaviour CCA. We used imaging-derived phenotypes and cognitive measures from around 40,000 individuals from the UK Biobank repository. Specifically, we focused on diffusion magnetic resonance imaging (dMRI) data and cognitive function, which have been previously shown to covary strongly. We plan to assess the replicability of canonical axis correlations as a function of sample size using bootstrapped subsets of the full dataset. We will also compare results from different CCA pipelines to assess whether effect size inflation is reduced in certain pipelines. We expect our study to further inform on the effects of sample size and analysis pipeline on the replicability of multivariate methods.