Translational Medicine Data catalogue - A comprehensive and informative list of datasets
The ELIXIR-LU/eTRIKS Data Catalogue (TDMC) is designed to list the large variety of clinical and translational medicine datasets that will become available, from academic and industry research projects as well as from clinical cohorts and genome consortia. It provides additional important information about these sets, i.e. curated and standardized meta-data. This means that datasets can be screened by many relevant criteria (e.g. cohort size) and the information returned includes a wealth of information about the study, including where each dataset is hosted. Through this service TDMC users in Europe (and beyond) can discover resources that are likely suitable for their research and to which they may want to request access (if they are sensitive data), or they can look for specific datasets. In a nutshell, the TDMC service is an important step toward implementing the FAIR Data Principles (Findable, Accessible, Interoperable, and Re-usable) for translational medicine data.
ELIXIR-LU currently hosts the meta-data and the querying interface for the catalogue, as well as content updates and further development of the system. The Node further supports data providers in the curation of metadata and with recommendations or with protocols, to assist their curation and standardisation efforts on the primary data.
Another specialty of the current catalogue is that it is suitably designed for curated and standardized meta-data emerging from ongoing and past Innovative Medicines Initiative (IMI) projects. ELIXIR-LU worked in close collaboration with the IMI and eTRIKS on the initial setup and metadata collection of IMI projects.
Data integrated storage and archiving – Important for repositories for translational datasets
Infrastructure services for sustainable, accessible data
The Node provides integrated storage and archiving for curated Translational Medicine data. Data are stored on tranSMART* servers or other suitable servers hosted by ELIXR-LU. This service aims to keep the dataset easy to access and easy to explore.
Integration and curation of multidimensional data
To be integrated into systems like tranSMART datasets including clinical/pre-clinical data (anonymised patient data, biological sample associated information), multi-omics (high-throughput molecular readouts from the samples) and imaging data need to be curated by the data provider. This must be validated by the Node data quality control team before data are stored by the Node service platforms.
** TranSMART is a database service for handling large biomedical datasets maintained by the tranSMART foundation.*
Data sharing platforms – Exploration platforms to make the most of available data
ELIXIR-LU provides platforms and protocols for users wishing to gain access to the data hosted at the Node. Those solutions offer different ways to explore the datasets and to obtain different depths of information about the data. ELIXIR generally supports three different access levels: open access, registered access (ELIXIR bona fide researcher status) and controlled access for sensitive data. Access conditions depend on the data access policy for the dataset and regulatory constraints (e.g. for human data). For controlled access, a data access committee may have to grant final approval before the user can obtain the dataset. As per ELIXIR policy, the data hosted by the Node are accessible to all but can remain privileged (and stored on dedicated servers) for a short time during the development phase of a project. This trial period is normally 18 months long at most.
The ELIXIR-LU set-up currently works with three mechanisms that help users explore the datasets to which they have access.
Support for Luxembourgish Translational Medicine research programmes
ELIXIR-LU aims to offer IT solutions for national researchers faced with commonly encountered challenges. Examples are given below:
- Electronic data capture systems to digitalise and standardise data generated by clinical/translational studies (e.g. electronic case report forms utilising REDCap)
- Hosting infrastructure
- Training for data curation, integration and analysis