Comprehensive data and metadata management in a multi-lab collaboration through integration of open-source solutions
Presenting author:
Recent technological advancements in electrophysiology facilitate the simultaneous recording of hundreds of channels while capturing complex behaviors. However, these extensive and complex datasets often remain underutilized due to insufficient solutions for data management and sharing. We address these challenges within the multi-lab collaboration In2PrimateBrains [1], an EU-funded international training network to investigate brain networks in the non-human primate. We utilize a modular and integrated set of open-source tools, methods, and services to achieve user-friendly solutions that balance standardization with adaptability.
Metadata collection is facilitated by the lightweight odML[2] metadata format, enabling automated data annotation and context provision in a human- and machine-readable manner [3]. Standardization is achieved by utilizing a set of controlled terminologies and metadata schemas built upon existing community efforts including BIDS extension BEP032 [4] and the EBRAINS openMINDS [5].
We utilize Neo [6] to achieve a common data representation and as an I/O bridge to interface with the variety of data formats used in participating labs. To ensure data accessibility, we employ the NIX data format [7] for storage, which enables comprehensive organization and integration of metadata alongside various types of data.
For data sharing, we use GIN [8], a collaborative data platform. GIN offers version control for research data, fine-grained access control, collaborative features, and data publication services.
Standardized representation of data in open formats, along with comprehensive metadata, and programmatic access ensures that the data is interoperable for further processing by generic analysis scripts, tools, and services [9,10,11]. In conclusion, we present a case study of multimodal data and metadata management in a multi-lab collaboration. Our approach is to build upon and employ an ecosystem of open tools, methods, and services that collectively work towards making research data more FAIR [12].
References:
1. In2PrimateBrains Consortium - http://In2PrimateBrains.eu
2. odML (RRID:SCR_001376) - https://doi.org/10.3389/fninf.2011.00016
3. Zehl et al (2016) Handling metadata in a neurophysiology laboratory. Front. Neuroinform. 10:26, https://doi.org/10.3389/fninf.2016.00026
4. BIDS BEP032 - https://bids.neuroimaging.io/bep032
5. EBRAINS openMINDS (RRID:SCR_023173) - https://openminds.ebrains.eu/
6. Neo (RRID:SCR_000634) - https://neuralensemble.org/neo
7. NIX (RRID:SCR_016196) - http://www.g-node.org/nix
8. GIN (RRID:SCR_015864) - https://gin.g-node.org/
9. Elephant (RRID:SCR_003833) - http://neuralensemble.org/elephant
10. SpikeInterface (RRID:SCR_021150) - https://spikeinterface.readthedocs.io
11. DataLad (RRID:SCR_003931) - https://www.datalad.org/
12. Wilkinson et al (2016) The FAIR guiding principles for scientific data management and stewardship. Scientific Data 3:160018, https://doi.org/10.1038/sdata.2016.18
Acknowledgements:
Supported by the European Union’s Horizon 2020 research and innovation programme (Grant agreement No 956669)
Metadata collection is facilitated by the lightweight odML[2] metadata format, enabling automated data annotation and context provision in a human- and machine-readable manner [3]. Standardization is achieved by utilizing a set of controlled terminologies and metadata schemas built upon existing community efforts including BIDS extension BEP032 [4] and the EBRAINS openMINDS [5].
We utilize Neo [6] to achieve a common data representation and as an I/O bridge to interface with the variety of data formats used in participating labs. To ensure data accessibility, we employ the NIX data format [7] for storage, which enables comprehensive organization and integration of metadata alongside various types of data.
For data sharing, we use GIN [8], a collaborative data platform. GIN offers version control for research data, fine-grained access control, collaborative features, and data publication services.
Standardized representation of data in open formats, along with comprehensive metadata, and programmatic access ensures that the data is interoperable for further processing by generic analysis scripts, tools, and services [9,10,11]. In conclusion, we present a case study of multimodal data and metadata management in a multi-lab collaboration. Our approach is to build upon and employ an ecosystem of open tools, methods, and services that collectively work towards making research data more FAIR [12].
References:
1. In2PrimateBrains Consortium - http://In2PrimateBrains.eu
2. odML (RRID:SCR_001376) - https://doi.org/10.3389/fninf.2011.00016
3. Zehl et al (2016) Handling metadata in a neurophysiology laboratory. Front. Neuroinform. 10:26, https://doi.org/10.3389/fninf.2016.00026
4. BIDS BEP032 - https://bids.neuroimaging.io/bep032
5. EBRAINS openMINDS (RRID:SCR_023173) - https://openminds.ebrains.eu/
6. Neo (RRID:SCR_000634) - https://neuralensemble.org/neo
7. NIX (RRID:SCR_016196) - http://www.g-node.org/nix
8. GIN (RRID:SCR_015864) - https://gin.g-node.org/
9. Elephant (RRID:SCR_003833) - http://neuralensemble.org/elephant
10. SpikeInterface (RRID:SCR_021150) - https://spikeinterface.readthedocs.io
11. DataLad (RRID:SCR_003931) - https://www.datalad.org/
12. Wilkinson et al (2016) The FAIR guiding principles for scientific data management and stewardship. Scientific Data 3:160018, https://doi.org/10.1038/sdata.2016.18
Acknowledgements:
Supported by the European Union’s Horizon 2020 research and innovation programme (Grant agreement No 956669)