Sustainability of data

Translational Medicine Data catalogue - A comprehensive and informative list of datasets

The ELIXIR-LU/eTRIKS Data Catalogue (TDMC) is designed to list the large variety of clinical and translational medicine datasets that will become available, from academic and industry research projects as well as from clinical cohorts and genome consortia. It provides additional important information about these sets, i.e. curated and standardized meta-data. This means that datasets can be screened by many relevant criteria (e.g. cohort size) and the information returned includes a wealth of information about the study, including where each dataset is hosted. Through this service TDMC users in Europe (and beyond) can discover resources that are likely suitable for their research and to which they may want to request access (if they are sensitive data), or they can look for specific datasets. In a nutshell, the TDMC service is an important step toward implementing the FAIR Data Principles (Findable, Accessible, Interoperable, and Re-usable) for translational medicine data.

ELIXIR-LU currently hosts the meta-data and the querying interface for the catalogue, as well as content updates and further development of the system. The Node further supports data providers in the curation of metadata and with recommendations or with protocols, to assist their curation and standardisation efforts on the primary data.

Another specialty of the current catalogue is that it is suitably designed for curated and standardized meta-data emerging from ongoing and past Innovative Medicines Initiative (IMI) projects. ELIXIR-LU worked in close collaboration with the IMI and eTRIKS on the initial setup and metadata collection of IMI projects.

Data integrated storage and archiving – Important for repositories for translational datasets

Infrastructure services for sustainable, accessible data

The Node provides integrated storage and archiving for curated Translational Medicine data. Data are stored on tranSMART* servers or other suitable servers hosted by ELIXR-LU. This service aims to keep the dataset easy to access and easy to explore.

Integration and curation of multidimensional data

To be integrated into systems like tranSMART datasets including clinical/pre-clinical data (anonymised patient data, biological sample associated information), multi-omics (high-throughput molecular readouts from the samples) and imaging data need to be curated by the data provider. This must be validated by the Node data quality control team before data are stored by the Node service platforms.

** TranSMART is a database service for handling large biomedical datasets maintained by the tranSMART foundation.*

Data sharing platforms – Exploration platforms to make the most of available data

ELIXIR-LU provides platforms and protocols for users wishing to gain access to the data hosted at the Node. Those solutions offer different ways to explore the datasets and to obtain different depths of information about the data. ELIXIR generally supports three different access levels: open access, registered access (ELIXIR bona fide researcher status) and controlled access for sensitive data. Access conditions depend on the data access policy for the dataset and regulatory constraints (e.g. for human data). For controlled access, a data access committee may have to grant final approval before the user can obtain the dataset. As per ELIXIR policy, the data hosted by the Node are accessible to all but can remain privileged (and stored on dedicated servers) for a short time during the development phase of a project. This trial period is normally 18 months long at most.

The ELIXIR-LU set-up currently works with three mechanisms that help users explore the datasets to which they have access.

Users can explore, analyse and visualise the available datasets in this management system. Thanks to easy browsing, selection and visualisation tools, they can choose subsets of data relevant for their respective research questions within a study but also across different projects or cohorts. Built-in analytical tools and the capability to use external workflows allow for various exploratory analyses.
The European platform The European Genome-phenome Archive (EGA) allows users to explore datasets from genomic studies (personally identifiable genetic and phenotypic data resulting from biomedical research projects). We refer to a listing of studies and datasets on the public EGA website for more information and which experiments and the data are available. The national branch ELIXIR-LU will soon host a national instance of EGA. This follows the EGA framework but geared to enabling users to explore locally hosted datasets through the Node’s platform, or via the federated meta-data search provided by the ELIXIR Hub. ELIXIR-LU is also working toward offering a national mirror of other EGA instances to help researchers with access to datasets from other countries.
With the help of a beacon, i.e. an open web service which provides consent-based access to aggregated genomic data (or derived data), researchers can screen databases to look for genomes with specific characteristics, e.g. a particular abberation on a particular chromosome. The ELIXIR Beacon implementation is especially designed for finding variants in genomic and clinical data stored at the Node. Ultimately, the beacon idea (GA4GH) seeks to mitigate the risks associated with human genomic data sharing (data protection and privacy issues) and aims to be easily implemented. Each user should be granted access rights depending on his/her approval status, in accordance with the need for data protection in this field.

Support for Luxembourgish Translational Medicine research programmes

ELIXIR-LU aims to offer IT solutions for national researchers faced with commonly encountered challenges. Examples are given below:

  1. Electronic data capture systems to digitalise and standardise data generated by clinical/translational studies (e.g. electronic case report forms utilising REDCap)
  2. Hosting infrastructure
  3. Training for data curation, integration and analysis