Fujitsu fuses Deep Tensor with knowledge graph to explain reason and basis behind AI-generated findings
Sep 21, 2017
Fujitsu Laboratories Ltd.,Fujitsu Limited
Tokyo and Kawasaki, Japan, September 20, 2017
Fujitsu Limited and Fujitsu Laboratories Ltd. today announced that they have developed technology that shows the reason and academic basis for findings from AI that have been trained on large volumes of data. This is done by connecting the proprietary AI technology Deep Tensor(1) , which performs machine learning on graph-structured data, with graph-structured knowledge bases called a knowledge graph(2) , which brings together expert knowledge such as academic literature.
The increased prevalence of machine learning technologies such as deep learning, in which a machine finds characteristics of data on its own after being trained on large volumes of data, has led to issues with the application of this technology to mission-critical fields such as medicine and finance. This is because there are questions of accountability regarding experts' AI-based findings, as it is difficult for humans to evaluate the reason behind the findings gained using these technologies.
Fujitsu and Fujitsu Laboratories have now successfully developed technology that shows the reason and basis for an AI inference by connecting the results of Deep Tensor findings with knowledge stored in a knowledge graph.
With this technology, experts can confirm whether the AI's findings merit trust, based on expert knowledge, such as academic literature, obtained as the basis and reason for the findings of AI. The results can also be used as clues to gain new insights, creating a world where experts cooperate with AI to resolve problems.
This technology will be commercialized as part of Fujitsu Human Centric AI Zinrai, the company’s AI framework, in fiscal 2018.
The development of machine learning technology in recent years has been remarkable, producing results that surpass humans in certain areas. For example, deep learning modeled on the human biological neural network offers high performance in recognition and classification, but because even experts and the developers themselves cannot explain why the AI produced a certain result, it has been called a black-box AI. In light of this property, there are concerns that it will prevent application of this technology to mission-critical areas which require accountability with regard to conclusions from experts using AI, so there has been anticipation of the development of technology that could add an explanatory capability to black-box AI.
Fujitsu Laboratories developed Deep Tensor, which learns from graph-structured data capable of describing complicated phenomena, based on deep learning technology which is a kind of machine learning. It has achieved highly accurate inferences in fields such as security. In addition, Fujitsu Laboratories have developed natural language processing technology that extracts knowledge from text through text data analysis, as well as Linked Open Data (LOD)(3) technology, which creates a knowledgebase from data on the web, offering a free service called LOD4ALL(4) .
Through systematization of these technologies, Fujitsu Laboratories have built a knowledge graph, which is a graph-structured knowledgebase with which a computer can handle the meaning of data and surrounding knowledge.
A major feature of black-box AI is its ability to automatically infer classification for unknown input data just by being trained on a large volume of data. At the same time, however, the inability to explain the reason behind inferences from the learning algorithm is a significant issue. In recent years, research has been conducted around the world to identify parts of the input data that have a significant impact on inferences, but it has only reached the level of being able to explain whether a particular part of the image influenced the recognition results in image recognition.
In addition, in order for experts to work with AI to resolve problems, it has been necessary for them to check sources such as academic literature to see if the AI's findings were correct. This is particularly true with regard to phenomena where relationships are only partially understood, necessitating experts to find the basis supporting these findings and link that information together to understand it.
About the Newly Developed Technology
Now, Fujitsu and Fujitsu Laboratories have developed technology to show the reason and basis for Deep Tensor findings by fusing Deep Tensor with a knowledge graph built from a variety of outside data (Figure 1). This technology identifies the factors (partial graphs) that had a significant influence on an inference and coordinates these with partial graphs from a knowledge graph, building a series of pieces of information in the form of connections in the knowledge graph as the basis for the findings.
Figure 1：Summary of the newly developed technology
This newly developed technology is composed of the following two components:
1. Inference factor identification
Deep Tensor enables highly accurate machine learning from graph-structured data, which was previously difficult to use in machine learning because the same data can be expressed in many ways, by conducting machine learning training simultaneously using both a method to convert graph-structured data to a form of mathematical expression called a tensor(5) and a traditional deep learning method.
Fujitsu Laboratories have now developed technology to perform a reverse search of the output deep learning inference results for each piece of input data, and to identify factors that had a significant impact on the inference results in the form of a partial graph within the input data. This technology extracts each decisive element in the inference result based on the similarities between the tensors that were input into the deep learning system, and then reverses the conversion from tensor back to graph-structured data to identify the partial graph of the input data that corresponds to the extracted elements.
2. Basis formation
By associating the elements that had a significant impact on an inference with a knowledge graph, it is possible to identify information related to those various elements. From those identified portions, it is possible to get related knowledge by tracing the graph structure. Knowledge graph, however, stores a variety of relationships between a range of information in graph form, so there is a problem in that simply tracing the graph structure can associate the reason for an inference with unrelated information.
With this technology, by using factors in an inference as clues when searching through the graph structures, it only extracts information that is highly related to the identified factors in the inference, forming the basis of the findings.
This technology was deployed in a simulation to improve the efficiency of investigatory work for experts in genomic medicine(6) , utilizing training data and a knowledge graph that made use of public databases and medical literature databases in the field of bioinformatics(7) . It was then evaluated to validate that it was possible to find and link the basis supporting findings with regard to phenomena whose interrelationships are only partially understood (Figure 2).
First, AI was trained on the relationships, elicited from a public database, between genetic mutations and the causes of diseases. Then, academic papers and associated information were extracted for the factors that had a significant impact on the inference and their basis. Regarding the genetic mutations that were the subject of the inference (red), it was possible to get a simultaneous overview of the multiple factors that had a significant impact on the inference results (blue), the academic basis for the findings from sources such as medical papers obtained from the knowledge graph (yellow), and also a candidate illness (purple).
Figure 2: Application to data for genomic medicine
Going forward, with the cooperation of research institutions involved in medicine, Fujitsu and Fujitsu Laboratories will evaluate whether the academic bases shown by this technology are meaningful to experts, and whether they are sufficiently easy to understand. In addition, Fujitsu and Fujitsu Laboratories plan to apply this technology to other fields, such as finance, in order to confirm the validity of automatic loan evaluations using knowledge of rules and regulations.
Fujitsu and Fujitsu Laboratories plan to continue conducting proofs of concept and expanding knowledge graph for various fields, with the goal of commercializing this technology as part of Zinrai, Fujitsu’s AI framework, in fiscal 2018.
A portion of the data used in evaluating the effectiveness of this technology was results obtained from joint development with Kyoto University, as part of the Project to Create an Integrated Database for Clinical Genome Information under the Japan Agency for Medical Research and Development (AMED).
 Deep Tensor - Fujitsu Technology to Elicit New Insights from Graph Data that Expresses Ties between People and Things (press release, October 20, 2016) http://www.fujitsu.com/global/about/resources/news/press-releases/2016/1020-01.html
 Knowledge graph - A dataset that uses connections representing relationships between information collected from a variety of information sources.
 Linked Open Data (LOD) - A dataset published in the Linked Data format, and a type of knowledge graph. Currently, more than 900 major data publications sites exist, which in total have published more than 10,000 datasets. Linked Data is an online data publication format which makes machine processing easy without relying on any specific application, recommended by the World Wide Web Consortium (W3C), an organization promoting the standardization of all sorts of technologies and regulations relating to the web.
 LOD4ALL - A search service for the use of LOD made public by Fujitsu Laboratories since 2013.
 Tensor - Data representing multidimensional arrays, a generalization of the concepts of vectors and matrices.
 Genomic medicine - A treatment method in which cells are analyzed at the genetic level and appropriate medications are provided at the individual level, being used as part of movements such as precision medicine in the US.
 Training data and a knowledge graph that made use of public databases and medical literature databases in the field of bioinformatics - Training data was drawn from ClinVar, a database collecting pathogenic genetic mutations, while the knowledge graph was composed of sources such as PubMed, a repository of medical papers, and Gene Ontology, a genetic catalog. A portion of the data was from the results of joint development between Fujitsu and Kyoto University.
Stay up to date with the latest industry developments: sign up to receive TelecomTV's top news and videos plus exclusive subscriber-only content direct to your inbox – including our daily news briefing and weekly wrap.