This is the ninth in a series of blogs on the “AI is here.” podcasts. Each blog in the series will highlight the insights from a specific industry leader as they describe how their organization is deriving significant value from AI today.
In this edition of the “AI is here.” podcast, Dan Faggella, Founder and CEO, of market research and publishing company Emerj, speaks with Krishna Bulusu, Director, Early Computational Oncology with AstraZeneca. Dan and Krishna speak to how AI and knowledge graphs are transforming life sciences and accelerating the process of drug discovery.
According to Bulusu, knowledge graphs are new to the biomedical space. He describes them as collections of data and the relationships between those points of data which can be used to make non-intuitive connections.
He makes an analogy with Netflix. When making a movie recommendation, they do so based on both your past viewing history and what everyone around the world is watching that likes what you do. Instead of recommending what to watch next, AstraZeneca is using it to recommend the next new drug target and to identify the next new patient population.
Drug discovery starts with identifying what the next new drug target should be, then identifying the next new small molecule, which is in turn made into a drug. Next comes marketing the drug and delivering it to the patient.
Knowledge graphs fit in at every stage of that journey. At the beginning, which is the target discovery stage, there is a taxonomy that the graph needs to learn from. The taxonomy goes across biology, chemistry, pharmacology and disease ontology. This is a huge, complex volume of data, all coming into a single platform.
Bulusu uses the acronym FAIR, which stands for findable, accessible, interoperable, and reusable. The idea behind this is that every piece of data that they generate should be:
- Findable and accessible by other people
- It should have the right kind of cross references so that it is interoperable
- It must be reusable, meaning they can use the data for a question that is not the same question from which the data is generated, but rather for something else for which the inferences will be valuable
This complex world of taxonomies needs to come together into one single platform that can help answer a range of questions across the drug discovery pipeline.
He then discusses the challenges of data curation. Bulusu says that there is a saying that 80% of a data scientist’s life is putting data into the right format and the other 20% is analysis. He says that it could not be more true with knowledge graphs
Part of this is because what is “good enough” data curation for one person or problem may not be the same for another. That brings in a significant amount of bias in the way information is captured, let alone made interoperable.
He gives the example where chemistry follows one set of principles, which is similar to mathematics, in that things can only be named in a particular way. In biology, they can describe a gene in five different ways and all five would be true. The problem is when trying to define a true positive training set, it becomes a challenge if you are not naming them all in a similar sense, or at least finding one common identifier that can connect everything.
It is particularly challenging in life sciences as biology encompasses all of nature. Bulusu says the diseases themselves are getting more complex. He says that when he started learning cancer research, which is his particular area of specialization, there were 18 cancer types. Now there are actually about 200 known cancer types. So when trying to categorize patients based on cancer type, there can’t be just one level. There is a whole hierarchy that needs to come together and the knowledge graphs need to be robust enough to capture that hierarchy.
As a result of this complexity, there is a concept of “minimum information standards” which relates to “good enough”. It helps answer the question of what is the level of annotation and information of the metadata that will get you started to run an analysis? What is the minimum information standard that a piece of data must adhere to so that you can start using it? They have realized that there is not just one answer to this. It has to be context specific. There is no one knowledge graph. What is a clean enough and decipherable enough knowledge graph for a given question might not be enough for another question. What they have learned is to take it on a case by case basis and discover the information for a particular question that gives enough insight into the complex world of biology and pharmacology.
He gives a real world example. They are studying a given treatment where there are patients that do not respond to the treatment. They need to identify who the patients are and discover why they did not respond. He says this is one of the most practical uses of AI within the space of biology.
When asked what are the takeaways business leaders can learn from this, he says that it is important to understand that there is no one single knowledge graph. Each knowledge graph has to be context specific. The accuracy of the AI model will depend on the amount of data that can be applied to a given question.
The value proposition lies in two places:
- Can we identify the same solution in a much faster way?
- Can we identify those hypotheses which as humans we would never otherwise identify?
AI is here. Discover how you can fundamentally transform what is possible for your organization with the power of AI in weeks, not years. Powered by the industry’s most powerful, full stack deep learning platform, SambaNova is the industry leader for GPT and large language models, delivering the highest accuracy and performance, while dramatically reducing the need for significant investment in infrastructure, personnel, and other resources.