Artificial intelligence to generate new cancer drugs on demand

Public Release: 22-Dec-2016


Towards the cornucopia of meaningful leads: applying deep adversarial autoencoders for new molecule development in oncology

InSilico Medicine, Inc.



IMAGE: This is the Architecture of the Adversarial Autoencoder (AAE).

Credit: Insilico Medicine


  • Clinical trial failure rates for small molecules in oncology exceed 94% for molecules previously tested in animals and the costs to bring a new drug to market exceed $2.5 billion
  • There are around 2,000 drugs approved for therapeutic use by the regulators with very few providing complete cures
  • Advances in deep learning demonstrated superhuman accuracy in many areas and are expected to transform industries, where large amounts of training data is available
  • Generative Adversarial Networks (GANs), a new technology introduced in 2014 represent the “cutting edge” in artificial intelligence, where new images, videos and voice can be produced by the deep neural networks on demand
  • Here for the first time we demonstrate the application of Generative Adversarial Autoencoders (AAEs), a new type of GAN, for generation of molecular fingerprints of molecules that kill cancer cells at specific concentrations
  • This work is the proof of concept, which opens the door for the cornucopia of meaningful molecular leads created according to the given criteria
  • The study was published in Oncotarget and the open-access manuscript is available in the Advance Open Publications section
  • Authors speculate that in 2017 the conservative pharmaceutical industry will experience a transformation similar to the automotive industry with deep learned drug discovery pipelines integrated into the many business processes
  • The extension of this work will be presented at the “4th Annual R&D Data Intelligence Leaders Forum” in Basel, Switzerland, Jan 24-26th, 2017

Thursday, 22nd of December Baltimore, MD – Scientists at the Pharmaceutical Artificial Intelligence (pharma.AI) group of Insilico Medicine, Inc, today announced the publication of a seminal paper demonstrating the application of generative adversarial autoencoders (AAEs) to generating new molecular fingerprints on demand. The study was published in Oncotarget on 22nd of December, 2016. The study represents the proof of concept for applying Generative Adversarial Networks (GANs) to drug discovery. The authors significantly extended this model to generate new leads according to multiple requested characteristics and plan to launch a comprehensive GAN-based drug discovery engine producing promising therapeutic treatments to significantly accelerate pharmaceutical R&D and improve the success rates in clinical trials.

Since 2010 deep learning systems demonstrated unprecedented results in image, voice and text recognition, in many cases surpassing human accuracy and enabling autonomous driving, automated creation of pleasant art and even composition of pleasant music.

GAN is a fresh direction in deep learning invented by Ian Goodfellow in 2014. In recent years GANs produced extraordinary results in generating meaningful images according to the desired descriptions. Similar principles can be applied to drug discovery and biomarker development. This paper represents a proof of concept of an artificially-intelligent drug discovery engine, where AAEs are used to generate new molecular fingerprints with the desired molecular properties.

“At Insilico Medicine we want to be the supplier of meaningful, high-value drug leads in many disease areas with high probability of passing the Phase I/II clinical trials. While this publication is a proof of concept and only generates the molecular fingerprints with the very basic molecular properties, internally we can now generate entire molecular structures according to a large number of parameters. These structures can be fed into our multi-modal drug discovery pipeline, which predicts therapeutic class, efficacy, side effects and many other parameters. Imagine an intelligent system, which one can instruct to produce a set of molecules with specified properties that kill certain cancer cells at a specified dose in a specific subset of the patient population, then predict the age-adjusted and specific biomarker-adjusted efficacy, predict the adverse effects and evaluate the probability of passing the human clinical trials. This is our big vision”, said Alex Zhavoronkov, PhD, CEO of Insilico Medicine, Inc.

Previously, Insilico Medicine demonstrated the predictive power of its discovery systems in the nutraceutical industry. In 2017 Life Extension will launch a range of natural products developed using Insilico Medicine’s discovery pipelines. Earlier this year the pharmaceutical artificial intelligence division of Insilico Medicine published several seminal proof of concept papers demonstrating the applications of deep learning to drug discovery, biomarker development and aging research. Recently the authors published a tool in Nature Communications, which is used for dimensionality reduction in transcriptomic data for training deep neural networks (DNNs). The paper published in Molecular Pharmaceutics demonstrating the applications of deep neural networks for predicting the therapeutic class of the molecule using the transcriptional response data received the American Chemical Society Editors’ Choice Award. Another paper demonstrating the ability to predict the chronological age of the patient using a simple blood test, published in Aging, became the second most popular paper in the journal’s history.

“Generative AAE is a radically new way to discover drugs according to the required parameters. At Pharma.AI we have a comprehensive drug discovery pipeline with reasonably accurate predictors of efficacy and adverse effects that work on the structural data and transcriptional response data and utilize the advanced signaling pathway activation analysis and deep learning. We use this pipeline to uncover the prospective uses of molecules, where these types of data are available. But the generative models allow us to generate completely new molecular structures that can be run through our pipelines and then tested in vitro and in vivo. And while it is too early to make ostentatious claims before our predictions are validated in vivo, it is clear that generative adversarial networks coupled with the more traditional deep learning tools and biomarkers are likely to transform the way drugs are discovered”, said Alex Aliper, president, European R&D at the Pharma.AI group of Insilico Medicine.

Recent advances in deep learning and specifically in generative adversarial networks have demonstrated surprising results in generating new images and videos upon request, even when using natural language as input. In this study the group developed a 7-layer AAE architecture with the latent middle layer serving as a discriminator. As an input and output AAE uses a vector of binary fingerprints and concentration of the molecule. In the latent layer the group introduced a neuron responsible for tumor growth inhibition index, which when negative it indicates the reduction in the number of tumour cells after the treatment. To train AAE, the authors used the NCI-60 cell line assay data for 6252 compounds profiled on MCF-7 cell line. The output of the AAE was used to screen 72 million compounds in PubChem and select candidate molecules with potential anti-cancer properties.

“I am very happy to work alongside the Pharma.AI scientists at Insilico Medicine on getting the GANs to generate meaningful leads in cancer and, most importantly, age-related diseases and aging itself. This is humanity’s most pressing cause and everyone in machine learning and data science should be contributing. The pipelines these guys are developing will play a transformative role in the pharmaceutical industry and in extending human longevity and we will continue our collaboration and invite other scientists to follow this path”, said Artur Kadurin, the head of the segmentation group at Mail.Ru, one of the largest IT companies in Eastern Europe and the first author on the paper.


About Insilico Medicine, Inc

Insilico Medicine, Inc. is a bioinformatics company located at the Emerging Technology Centers at the Johns Hopkins University Eastern campus in Baltimore with Research and Development (“R&D”) resources in Belgium, UK and Russia hiring talent through hackathons and competitions. The company utilizes advances in genomics, big data analysis, and deep learning for in silico drug discovery and drug repurposing for aging and age-related diseases. The company pursues internal drug discovery programs in cancer, Parkinson’s Disease, Alzheimer’s Disease, sarcopenia, and geroprotector discovery. Through its Pharma.AI division, the company provides advanced machine learning services to biotechnology, pharmaceutical, and skin care companies. Brief company video: