Distributed data management for large collaborative projects: DataLad ecosystem in Collaborative Research Center 1451
Presenting author:
Multi-site research projects offer a unique opportunity for scientific insight based on data collected across different modalities, paradigms, and species. Yet, they also pose unique research data management challenges. Here, we present software developments and lessons learned from the information management project of CRC1451 (crc1451.uni-koeln.de).
Given the large variability of RDM demands across over 20 CRC member projects, we opted for a decentralized approach: Projects retain full control over key data management decisions (standards, storage, sharing), and the findability, accessibility, interoperability, and reusability of their data is achieved with DataLad (datalad.org) as an overlay structure for all distributed datasets. We use DataLad Catalog to generate an online data portal based on metadata. Metadata extraction is done using MetaLad, based on the "capture immediately, curate perpetually" iterative approach. To mitigate DataLad’s limited adoption outside central projects, we are developing two solutions. First, DataLad Gooey is a graphical user interface for basic data management operations. Second, DataLad Tabby is a format specification and a collection of tools for dataset descriptions which can be created and provided as a spreadsheet, using well-defined terms, translatable to catalog records and linked data objects.
This work was supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under grant SFB 1451 (431549029, INF project).
Given the large variability of RDM demands across over 20 CRC member projects, we opted for a decentralized approach: Projects retain full control over key data management decisions (standards, storage, sharing), and the findability, accessibility, interoperability, and reusability of their data is achieved with DataLad (datalad.org) as an overlay structure for all distributed datasets. We use DataLad Catalog to generate an online data portal based on metadata. Metadata extraction is done using MetaLad, based on the "capture immediately, curate perpetually" iterative approach. To mitigate DataLad’s limited adoption outside central projects, we are developing two solutions. First, DataLad Gooey is a graphical user interface for basic data management operations. Second, DataLad Tabby is a format specification and a collection of tools for dataset descriptions which can be created and provided as a spreadsheet, using well-defined terms, translatable to catalog records and linked data objects.
This work was supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under grant SFB 1451 (431549029, INF project).