By Elodie Bugnicourt (Ph.D.) and David F. Nettleton (Ph.D.)
In our last blog post, we saw how the data mining of past research studies can be used to define emerging scientific areas and shape certain funding policies. Let’s focus now on how it can help in identifying future innovations.
Based on 43 million of scientific papers, Boyak and Klavans generated a map of research and its interconnections using the OpenOrd algorithm, with the specific aim of supporting the identification of the respective trends in the different areas of science (fig. 1). [OpenOrd is a state of the art algorithm, developed at the prestigious Sandia National Laboratories (United States), which aims to solve the “hairball” phenomenon when presenting graph network type data, by using a simulating annealing approach with a five stage cooling schedule.]
Fig 1. Map of science
This map can help, for example, in identifying research topics with greater longevity as well as those with greatest novelty, and it can establish the degree of inter-disciplinarity of the studies in the data-set depending on their main scientific field. As an enabling field, it can be seen in Fig. 1 that computer science is the most widespread through the research map showing relationships with all other disciplines and it is no surprise that the big data field also uses a similar knowledge mining approach to determine its own emerging areas (fig. 2).
When moving away from purely scientific publications into the patent world, the innovative fields with a higher market exploitation potential can also be mapped with the same, or other, data mining-based tools with the aim of forecasting innovation pathways. Comparing such maps between countries or competitors may also help to identify differentiating innovation ecosystems in given fields, as well as showing potential gaps as opportunities.
In fact, one of the key question for any scientist, industrial technologist, innovation manager, investor or even curious mind, is to have an informed idea of where technology is going! Having predictive models to analyse science and technology knowledge could help in defining systematic approaches to innovation! As Voltaire said a while ago: “the present is pregnant with the future” and it’s becoming within our reach to be able to decode it using constantly improving data science techniques. While it is clear that the potential applications of data mining are countless, and indeed already in use in a number of sectors like finances, marketing, etc., in terms of scientific fields, the feasibility of using data to predict the future may be closer to becoming a broader reality in the field of medical sciences. For example, the use of data analytics applied to genotypes in the context of massive data processing could make it possible to predict at birth the future diseases that an individual person may suffer and therefore mitigate some of the related risks. This is one among 10 predictions of the future innovations expected by 2025 in a Thomson Reuters report  based on the analysis of their scientific databases which describe emerging research over the last 2 years. In the top 10 fields of scientific and patent literature, the most active research fields were identified by ranking the number of citations per paper and assessing the number of core papers per field.
At IRIS we are involved in a surprisingly diverse and at the same time highly specialized inter-related web of science and technology applied research areas, the result of participating in over 50 European applied research projects (forty FP7 and eleven H2020 projects, to date). In Fig. 3 we see how we have used the ForceAtlas2 graph layout algorithm to make a preliminary research map of our own projects, using four major “innovation lines” as hub nodes, the H2020 and FP7 projects as secondary nodes, and a couple of keyword nodes. This of course can be refined with further parameters like ‘Technology Readiness Level‘, additional keywords and so on.
Fig 3. Mapping of IRIS’s applied research projects
Figuring out how to fully exploit large data sets will impact the speed of scientific progress, industrial productivity, and many other aspects, and it will give a competitive advantage to the early adopters of such an innovative approach. So, we can expect that its finality will be to give support to humans in orienting research and technological development. Nevertheless, a lot of ground still has to be covered before data-mining-based tools are user friendly and specific enough to allow in-depth forecasting of innovation trends, but we predict that its time will come!
 Boyack, K.W., Klavans, R., (2013). Creation of a highly detailed, dynamic, global model and map of science. Journal of the Association for Science and Technology 65(4): 670-685. DOI 10.1002/asi.22990.
 THE WORLD IN 2025 10 PREDICTIONS OF INNOVATION, Thomson Reuters