Skip to main content
Confounds python library to mitigate effect of covariates and for [multi-site] data harmonization
Pradeep Reddy Raamana, Future Contributors
Presenting author:
Pradeep Reddy Raamana
With the advent of large multi-site datasets, there is an active focus on methods to handle confounds and covariates in neuroimaging analyses. Several methods and comparative studies have been published recently e.g. for multi-site data harmonization to mitigate the effect of pooling data across different sites, as well as the other routine regressing out the effects of age and gender. However, the corresponding implementations have been sparse and/or fragmented, and most often not properly tested and validated. Moreover, there are many open questions in this context of “deconfounding e.g. lack of consensus on 1) what really constitutes a confound?, 2) when should we try to deconfound it? and 3) how do we properly assess their impact? etc. These open questions urge the need for a common, open and tested library that the community can rely on. Some functions of this library could be 1) offering reference implementations for key methods, 2) enabling the community to test and validate these implementations independently for their own use-cases, 3) utilities to help researchers visualize and establish the presence of confounds (e.g. quantifying confound-to-target relationships), 4) methods to analyze the effect of the deconfounding methods in the processed data (e.g. ability to check if they worked at all, or if they introduced new or unwanted biases etc).

Towards these goals, an open source python library called confounds is designed in a way to enable users to use it correctly in the context of machine learning and predictive analysis applications, e.g. in maintaining a strict separation between training and "test"/reporting sets. It is available at GitHub.com/raamana/confounds. It currently offers scikit-learn semi-compatible estimator classes to linearly regress out covariates (the most common deconfounding method to mitigate the effect of age and gender).