What is INODE?
Intelligent Open Data Exploration
Data growth and availability as well as data democratization have radically changed data exploration in the last 10 years. Many different data sets, generated by users, systems and sensors, are continuously being collected. These data sets contain information about scientific experiments, health, energy, education etc., and they are highly heterogeneous in nature, ranging from highly structured data in tabular form to unstructured text, images or videos.
Furthermore, especially online content, is no longer the purview of large organizations. Open data repositories are made public and can benefit more types of users, from analysts exploring data sets for insight, scientists looking for patterns, to dashboard interactors and
consumers looking for information. As a result, the benefit of data exploration becomes increasingly more prominent.
However, the volume and complexity of data make it difficult for most users to access data in an easy way.
In this project we propose INODE – Intelligent Open Data Exploration. The core principle of INODE is that users should interact with data in a more dialectic and intuitive way similar to a dialog with a human. To achieve this principle, INODE will offer a suite of agile, fit-for-purpose and sustainable services for exploration of open data sets that help users (a) link and leverage multiple datasets, (b) access and search data using natural language, using examples and using analytics (c) get guidance from the system in understanding the data and formulating the right queries, and (d) explore data and discover
new insights through visualizations.
Our service offering is formed by and will initially respond to the needs of large and diverse scientific communities brought by our three use case providers:
(a) Cancer Biomarker Research – SIB Swiss Institute of Bioinformatics, Switzerland,
(b)Research and Innovation Policy Making – SIRIS, Spain, and
(c) Astrophysics – Max Planck Institute for Extraterrestrial Physics, Germany.
Cancer Biomarker Research - SIB Swiss Institute of Bioinformatics, Switzerland,
Cancer genomics studies generate highly heterogeneous data sets with variable characteristics. These data sets have different file formats, attribute names, positional coordinates, reference data, processing pipelines, and more: to accelerate extraction of useful knowledge, sophisticated semantic queries should be enabled across large data sets via an intuitive interface.
Astrophysics - Max Planck Institute for Extraterrestrial Physics, Germany.
Since the Sloan Digital Survey (SDSS), which collected spectra of thousands of galaxies throughout vast distances in space, we have been amassing large databases of astronomical data. It is now a challenge to store, transport and query this data, because of volume and heterogeneity issues. Instead of designing several thousand-character-long nested SQL statements and developing routines to match them against imaging or spectroscopic data, INODE can enable retrieving information by means of natural language querying.
Research and Innovation Policy Making - SIRIS, Spain
R&I data is fragmented, highly heterogeneous and disconnected. They are often neither in structured format nor systematically shared across organisations. To leverage the full potential of R&I open data platforms, ontology-based solutions are needed to help bring together inputs from a variety of sources in an interoperable fashion and enable powerful queries even non-technical users, who are the majority in R&I use cases.