INODE service offering is being developed by and will initially respond to the needs of large and diverse scientific communities brought by our three use case providers: (a) Cancer Biomarker Research - SIB SwissInstitute of Bioinformatics, Switzerland, (b) Research and Innovation Policy Making - SIRIS, Spain, and (c) Astrophysics - Max Planck Institute for Extraterrestrial Physics, Germany.
The main purpose of the INODE project in the context of the cancer biomarker research use case is to facilitate and precisely answer questions over multiple cancer-related datasets from the OncoMX project. These questions are written in natural language. Moreover, the difficulty to answer such questions and “understand” the user intent by the system will be tackled by an information discovery functionality that interactively guides the user over the available data and metadata (e.g. ontologies).
For Cancer Biomarker Research, users are scientists, including biologists, bioinformaticians, data scientists, medical doctors, etc. It is important that scientists, without prior training in any technical aspect of computer systems, perform powerful queries across several datasets in ways that cannot be anticipated. These requirements go far and beyond the query functionality of existing search interfaces. INODE will accelerate extraction of useful knowledge from these data by enabling sophisticated semantic queries. INODE will be able to answer questions such as “obtain a list of the cancer types that a given gene is differentially expressed in with a p value cut off of < 0.01” for a researcher to explore significant changes in the gene expression between healthy and disease tissue to identify candidate cancer types for experimental investigation.
Furthermore, there is no consensus representation for some of the data integrated in OncoMX (e.g., biomarker information, provenance information). These representations vary between sources, and are likely to evolve, which makes user tasks of knowledge extraction from OncoMX more challenging. For instance, several efforts aim at identifying relevant cancer biomarkers. While OncoMX integrates data from the Early Detection Research Network (EDRN), it is necessary to link this information to the list of FDA approved biomarkers, or to the biomarker qualification opinions from the European Medicines Agency. INODE will allow scientists to combine the information present in OncoMX with data provided by different sources in an easier and more automated way. There currently is no resource linking all the data available, a difficulty strongly impacting all cancer research, on which the EU spends about 3 billions Euros per year.
Over the last decades, more and more telescopes became devoted to large‐scale surveys of the sky. Today scientists are no longer constrained to the study a couple of astronomical objects from individual observations, and instead have access to tens of thousands of objects of their interest from various astronomical surveys. SDSS (Sloan Digital Sky Survey) is one such survey that brought astronomy into the big data era. Even though thousands of studies have already been carried out and published using SDSS, they are still not fully explored. Not only SDSS, there are many recent and future surveys awaiting to be well explored like DES, MaNGA, 4MOST, PFS, DESI and the Euclid space mission. It is becoming an increasingly difficult task for scientists to keep up with the ever changing landscape of new databases – in particular if they want to carry out queries across multiple such databases.
The main challenge is to query such databases using several long iterative queries in a database specific language, called SQL. This requires a high level domain and technical knowledge, which makes it very limited to experts. Most importantly every survey has their own database which is designed specifically for it. With a large number of such databases, users should keep up with the ever changing data models of each one of them in order to carry out queries across multiple databases.
Inode will allow the user to efficiently explore and access enormous databases and enable the retrieval of information by means of natural language querying. Inode is meant for a wide range of users, reaching from scientists with high domain specific skills to the general public with only limited astronomic background and limited technical knowledge. INODE will help the users to identify the right set of datasets for the objects of interest by guiding them in understanding the data model and by providing them suggestions and through providing assistance in the formulation of new and more complex queries. Unlike any other existing database system, INODE will eventually let users link unstructured data from various open databases, which will be available under INODE, to derive new scientific insights of astronomical datasets.
Evidence-based policy making is increasingly seen as a means to make policies more effective and as a way to maximise the positive impact of those policies on society. In most practical cases, evidence-based policy making translates to having a data driven approach towards the design, implementation and monitoring of policies. Research and Innovation (R&I) policy-making is no exception to this paradigm: a data-driven, evidence-based, approach can help policy makers to design instruments that best fit the potential of their local R&I ecosystems and to maximise the impact of the research they fund.
When it comes to exploiting data to design, implement and monitor policies, R&I policy makers however face today technical difficulties that become strategic challenges. Indeed, for their scopes, data is often not granular enough (i.e., often only aggregations and/or statistical distributions are available), nor recent enough (i.e. updates are not on par with their strategic demands). Moreover, data is often scattered across different sources, owned by different organisations, who use different formats and nomenclatures as well as heterogeneous classification systems and metadata. On top of these difficulties, the gap between experts and stakeholders and the providers of data and analyses and their consumers (due to different vocabularies, knowledge and experiences) further jeopardises the process of embedding data driven tasks into the policy making process.
INODE aims at addressing those challenges, by offering single, interconnected units to answer strategic R&I policy questions via Linked Data and by developing frameworks which will allow a more intuitive interaction with data to less tech-savvy users. Indeed, INODE will on the one hand integrate different datasets relevant for the R&I policy-making within a unique knowledge graph and provide novel means to query data, via tools based on, e.g., natural language querying or interactive data exploration.