by Alex Zhavoronkov: First published in 2016, predictors of chronological and biological age developed using deep learning (DL) are rapidly gaining popularity in the aging research community…
These deep aging clocks can be used in a broad range of applications in the pharmaceutical industry, spanning target identification, drug discovery, data economics, and synthetic patient data generation. We provide here a brief overview of recent advances in this important subset, or perhaps superset, of aging clocks that have been developed using artificial intelligence (AI).
Advances in AI
Recent advances in machine learning (ML) (Box 1), coupled with increases in computational power and availability of the large publicly available datasets, have led to a renaissance in AI. These advances have generated substantial investment and hype, and many data scientists and companies are exploiting the surge in AI hype for promotional purposes. This has sown confusion in the market and triggered criticism from scientists working in the pharmaceutical industry, where approval in clinical trials is the ultimate measure of success.
Applied AI Algorithms
Machine learning (ML) refers to data analysis tools that can extract dependencies from the data without being explicitly programmed, thereby providing an attractive alternative to other approaches in areas where few or no prior data are available about those dependencies or where they are too complex.
Deep machine learning or deep learning (DL) comprises a set of methods that rely on deep architectures with cascades of multiple layers, and include architectures such as deep neural networks (DNN), generative adversarial networks (GAN), deep reinforcement learning, and others.
DNNs are models with multiple hidden layers between the input and output layers. The multilinearity of DNNs combined with non-linear activation functions provides them with exceptional ability to extract complex dependencies in the data and automatically select features that are most relevant to predictions. In the case of the age prediction, networks are trained using biological data as the input to predict age as accurately as possible.
GANs are a type of a DL model that comprises discriminator and generator networks. A generator produces a candidate vector of synthetic data, and a discriminator networks check the vector validity. Such data generation has been extensively explored for new pharmacological agents, and can also be used to generate synthetic data for patients.
Reinforcement learning (RL) is a type of goal-oriented algorithm that is trained to attain a complex objective over many steps. In case of drug discovery, such an objective could include the drug-likeness of molecules, their ease of synthesis, and other desired properties. RL algorithms could also be deep and have a multilayered architecture.
Most of the credible advances in the field have been in DL and reinforcement learning (RL) (Box 1). Since 2013, DL systems have surpassed human performance in multiple applications, including strategy games as well as image and text recognition. In healthcare, DL systems outperformed human dermatologists, ophthalmologists, and radiologists in various tasks. DL also demonstrated significant improvement over conventional ML methods in biomedical data analysis [
Revolution in Biomarkers of Aging
During this same period of DL progress, aging research has also experienced a renaissance, and new breakthroughs are rapidly emerging. Multiple data types can be used to predict age and associate the prediction with mortality, disease, general wellbeing, or other biological processes including methylation, gene expression, microbiome, and imaging data. Since the publication of the first multitissue methylation aging clock by Steven Horvath in 2013, multiple methylation aging clocks and applications of these clocks in humans and mice were developed. Even though these clocks were developed using traditional ML approaches – notably linear regression with regularization and the use of a limited number of samples – the results suggest that gradual changes during aging can be tracked using various data types with reasonable accuracy.
Age as a Universal Feature of Every Organism
In 2015, advances in DL and aging biomarker development began to converge. AI researchers at Insilico Medicine recognized that DL requires very large datasets, and that the most ubiquitous feature among varied and incompatible biological datasets is age. Age is a universal feature of every living organism and object on the planet. It is also the most biologically relevant feature because it is most strongly correlated with mortality, a broad range of diseases, and remaining quality-adjusted life years (QALYs). Although it can be difficult to correlate any individual feature with age, the combination of many features can be very predictive. A human can guess with reasonable accuracy the age of another human using low- and high-resolution imaging data, movement patterns, or even scent and touch.
Cross-Species Aging Research
Surprisingly, many patterns are similar in other species and can be used for cross-species analysis, and transfer learning techniques can be very helpful for research. For example, a human who is shown a macaque for the first time in his or her life is often able to classify it correctly into one of three age brackets: young, middle-aged, or old. In ML this technique is called ‘zero-shot learning’. After seeing 100 macaques of varying ages, a human is able to achieve much better accuracy. This technique is called ‘one-shot learning’. The same techniques may be applicable to learning biological processes by using age as a feature and then retraining on various diseases with few available datasets or cross-species comparisons. There are many challenges in classifying aging as a disease in the traditional biomedical paradigm , but treating aging as a process with >100 stages for the development of deep age predictors helps to capture a broad set of biological processes in a holistic way. Although the study of the classical methylation aging clocks did not uncover many similarities between mice and humans, the application of AI and multiple data types may help with cross-species research. A crowdfunded and crowdsourced project called MouseAgei attempts to develop a photographic biomarker of aging in mice with the aim to apply transfer learning to other animals, and possibly to humans. The effectiveness of this approach remains to be seen; however, there are clearly many features that are common in rodents and even humans that can be observed by the naked eye, and DL may help to uncover these similarities.
Deep Aging Clocks
Deep Blood Biochemistry and Cell-Count Aging Clocks
The realization that changes during aging can be tracked has led to the search for a biologically relevant data type that has abundant historical datasets as well as a small number of highly variable but standardized features that can be easily anonymized. Using one of the broadest panels of routine blood tests performed in multiple countries in a standardized way, the first aging clock study utilizing deep neural networks (DNNs) (Box 1) was published by the laboratory of Zhavoronkov in 2016. The scientists utilized over one million clinical blood tests (blood biochemistry and cell count) to generate from routine screening tests a dataset of over 60 000 reasonably healthy subjects annotated with sex and age. The proof-of-concept study demonstrated the basic application of evaluating the relevant contributions of each simple feature to the accuracy of the predictor. The abundant blood biochemistry data allowed comparison of the various ML models, and the DNNs clearly outperformed in every test.
The deep hematological aging clock study was extended to several million subject records to evaluate the population specificity and biological relevance of these clocks in multiple populations, as well as the association of predicted age with mortality. In this study the three DNNs were trained on anonymized Korean, Canadian, and Eastern European blood test samples annotated with age. Testing Korean and Eastern European data with a DNN trained on Canadian data revealed that Koreans on average appeared younger than their chronological age, whereas Eastern Europeans looked significantly older, thus demonstrating population differences. In addition, through testing on an independent dataset, researchers found that the people predicted to be older had higher mortality rates than those predicted to be in line with their chronological age, confirming the biological and suggesting clinical relevance of the clock.
Deep Imaging Aging Clocks
Photographic imaging, a highly accessible and prevalent data type used in AI applications, has been explored by the research team at Haut.AIii, a company specializing in digital skin analysis. The deep photographic aging clock, using only images of the corners of the eye, can predict the age of an individual within an accuracy of 1.9 years mean absolute error. Although photographic data are not the most biologically relevant, many genetic and phenotypic disorders can be diagnosed from a picture. For many applications, images are found to be more valuable than genomic data, and are even more valuable in combination with other data types. Photographs are also among the most abundant data types, and results can be validated and interpreted instantaneously by human experts, making images ideal for proof-of-concept experiments.
Deep Transcriptomic Aging Clocks
Transcriptomic data are one of the most abundant but variable types of data. The evolution of microarray and RNA sequencing technology since 2000 has resulted in the production of millions of gene expression datasets from multiple tissues, and varying numbers of genes have been measured using different equipment in diverse experimental settings. Despite high variability, transcriptomic data are among the most valuable types of data because they enable the identification of the genes most implicated in specific diseases, such as cancer. In 2018, the first transcriptomic aging clock developed using DL and other ML techniques based on gene expression data from muscle tissue was published. The work presented several ideas on prioritizing specific genes as possible targets for pharmaceutical intervention in sarcopenia and other muscle-wasting diseases.
Other Data Types
Wearable and mobile devices provide a vast amount of biologically relevant data. In 2018, age-associated changes in physical activity were studied for the first time in context of age prediction using neural networks. A DL-based model trained on activity-monitor data achieved a relatively high accuracy in predicting age, but showed lower association with mortality compared to a less accurate age-prediction model. To address this lack of mortality association, the authors proposed a DL mortality predictor as a tool for the identification of various health risks.
Generation of Synthetic Data as a Tool for Target Identification in Aging
In addition to expanding the scope of aging clocks, neural networks can be used to generate synthetic data in large volumes. Generative adversarial networks (GANs) (Box 1), a new ML technique first introduced by Ian Goodfellow in 2014 and now commonly used in drug discovery, enable the generation of biologically relevant synthetic data with specified conditions. Synthesizing new patient data using GANs trained on millions of samples, using only age as a generation condition, allows massive anonymization of data while maintaining the most biologically relevant features. It also enables the identification of potential targets that drive aging and disease-related processes.
Use of Deep Aging Clocks
The intersection of recent advances in AI and aging research yields many new tools and applications for the pharmaceutical industry to exploit – at every step of the R&D process as well as in personalization, marketing, and real-world evidence. Over a dozen of these possible applications are summarized in Figure 1. We highlight a few of these below.
Moreover, aging research is a broad multidisciplinary field that converges with many other scientific disciplines directed at age-related diseases. Many interventions in immuno-oncology rely on the state of the patient’s immune system and general health. Aging clocks may be used to track immunosenescence levels and identify new interventions designed to boost the immune system in the elderly. For companies specializing in vaccines and looking for immediate revenue gains from AI, aging clocks can provide a way to track response rates. If a meta-analysis of clinical trials demonstrates that patients predicted to be older than their chronological age respond better to an alternative dosage or vaccination protocol, then necessary additional doses of the vaccine may be sold.
Digital Twin for a Patient
Multimodal aging clocks obscure the difference between aging and disease status, essentially turning the many aging clocks into a marker of the health status of an individual. Because all living beings change over time, multimodal aging clocks and clock ensembles trained on all accessible data types can act as a digital twin for a patient. This likeness can be moved forward and backward in time using GANs with multiple defined generation conditions, including lifestyle choices and interventions. These clocks may also be embedded into field-trainable mobile devices that learn on the individual and help to maintain an optimal biological age.
In this article we highlight the convergence of AI with aging research and review some of the deep aging clocks that have been developed in the recent past. We also lay out the potential utilities of these clocks in the pharmaceutical industry. In the coming years we expect the convergence of AI and aging research to accelerate, given the emergence of longevity biotechnology as a standalone industry [
] and the many players who are entering the field, from universities and non-profits to large corporations, investment funds, and startups. After the newly announced call by famed Silicon Valley incubation program Y Combinator to provide seed funding for extending longevityiii, several startups in data-driven longevity are sure to emerge in 2019 and beyond. Further, we see that large holding companies in longevity, including Juvenescence.AIiv, Longevity Vision Fundv, and Life Biosciencesvi, are also growing AI-powered longevity companies in an effort to find longevity interventions and complementary biomarkers that can be used to evaluate the effectiveness of such therapies in a clinical setting.