• Log in with Facebook Log in with Twitter Log In with Google      Sign In    
  • Create Account
              Advocacy & Research for Unlimited Lifespans


A defined human aging phenome

aging phenotype

  • Please log in to reply
No replies to this topic
⌛⇒ new years donation: support LE labs

#1 Engadin

  • Guest
  • 135 posts
  • 298
  • Location:Madrid
  • NO

Posted 17 August 2019 - 08:22 PM








Aging is among the most complex phenotypes that occur in humans. Identifying the interplay between different age-associated features is undoubtedly critical to our understanding of aging and thus age-associated diseases. Nevertheless, what constitutes human aging is not well characterized. Towards this end, we mined millions of PubMed abstracts for age-associated terms, enabling us to generate a detailed description of the human aging phenotype. We discovered age-associated features in clusters that can be broadly associated with previously defined hallmarks of aging, consequently identifying areas where interventions could be pursued. Importantly, we validated the newly discovered features by manually verifying the prevalence of these features in combined cohorts describing 76 million individuals, allowing us to stratify features in aging that appear to be the most prominent. In conclusion, we propose a comprehensive landscape of human aging: the human aging phenome.





Aging represents the largest risk factor for chronic diseases and a significant and growing socioeconomic challenge for most societies worldwide. Nevertheless, what constitutes the human phenotype of aging is not well characterized, likely due to the highly complex and heterogeneous nature of human aging. Indeed, aging is probably caused by the stochastic failure of a myriad of different biological processes leading to increased susceptibility to disease and death [1].


Due to the role of aging in numerous diseases, interventions leading to healthy aging are being heavily investigated. Clinical trials for aging interventions are challenging due to the possibility of long trial times and/or the necessity to investigate large cohorts. The generation of biomarkers that may predict the age and health of an individual has therefore received significant interest. Importantly, several recent breakthroughs have allowed us to discover complex biomarkers, or aging clocks, which are able to predict the age and risk of death and/or age-associated disease of individuals [26]. Nevertheless, it is unclear how these biomarkers predict the multitude of phenotypes associated with aging. To this end, having a well-defined phenotypical description of human aging and an understanding of how different aging phenotypes associate with each other will enable us to better understand aging, design trials and discover drugs targeting the aging process.


Herein, we used a previously incomplete list of phenotypes associated with human aging to mine millions of PubMed articles for co-occurring phenotypes, allowing us to better define what we term the human aging phenome. We used this computationally unbiased approach to generate a list of approximately a thousand terms and then manually curated this list to extract features associated with aging. We then validated these features manually against the description of more than 75 million individuals from published studies. Notably, these parameters cover all tissues in the human body and illustrate the heterogeneity of the human aging phenotype. Collectively, our results allow us to propose a description of what human aging is.





Identification of abstracts describing human aging


As a starting point for defining human aging we used 44 clinical terms that we had previously used to describe human aging [79]. To increase our ability to capture semantically similar age-associated terms we extracted synonyms and spelling analogues for each of these 44 clinical terms from the SNOMED CT terminology, which contains a comprehensive and validated collection of terms describing clinical features (Table S1Figure S1A) [10]. In all subsequent analyses using the 44 clinical terms we also included their synonyms and spelling analogues. To quantitatively test whether the terms in the list are associated with human aging, we measured their enrichment in aging-related abstracts when compared to all PubMed abstracts. To that end, we mined 17,730,690 PubMed abstracts for occurrences of the 44 clinical terms and investigated whether they co-occur with the word aging. In addition to aging we included other ‘aging keywords’ with similar semantic meaning, e.g., elderly, old age, retirement (Figure 1A and Table S2Figure S1B). Indeed, the 44 terms were enriched 3.1-fold (mean, p-value < 2e-16, chi-squared test) in abstracts that also contained aging keywords, suggesting that this list could be used as bait for finding other terms describing aging (Figure 1B, 1C and Figure S2).






To qualitatively test the algorithms’ ability to find new terms, we selected 100 random abstracts and manually picked out terms of interest to determine if the text-mining algorithm would be able to capture them. We then calculated the F-measure (F1 score) based on the precision and recall of the algorithm [11]. This score is determined by identifying how many terms are included and how many are missing in the abstracts by comparing a manual selection versus the automated algorithm. The algorithm was calculated to have an Fl score of 0.898, suggesting that our text-mining algorithm captures the majority of terms allowing us to interrogate the aging phenotype.



Mining for potential aging-associated phenotype terms


We next identified 3,198,218 PubMed abstracts containing one or more of the 44 age-associated clinical terms and 431,949 abstracts containing two or more of the 44 age-associated clinical terms. We speculated that abstracts containing two or more age-associated clinical terms are more accurately associated with aging compared to abstracts containing just one term. For example, if we search for abstracts containing the single term ‘cancer’ we would possibly find terms that show only minor association with aging. We therefore compared the frequency of co-occurrence of each of the terms by dividing the number of times a term is mentioned together with any other term versus when it is mentioned on its own (Figure S3). Indeed, if we only considered abstracts where single clinical-terms were mentioned we observed that very common terms, like ‘cancer’, skewed the entire dataset towards those terms instead of aging. We therefore only considered abstracts that contain two or more age-associated clinical terms for finding new terms that describe human aging.


Employing this approach, we identified 28,516 PubMed abstracts which contain: 1) at least two occurrences of the 44 clinical terms, and 2) at least one aging keyword. These age-associated abstracts were then used as a foundation for mining new terms associated with aging. We generated a list of the most frequent words in the age-associated abstracts. We chose a cutoff of at least 100 occurrences, including repeated occurrences of a term in an abstract, as a way to filter the number of terms identified and to make sure that only well-recognized terms are included. We discarded terms based on their semantic tags in SNOMED (e.g., “procedure”, “qualifier value”, “body structure”). This led to the identification of 994 new terms that could be considered age-associated (Table S3).



Association analyses reveal tissue specific clustering in aging


To further investigate the relationships between these features, we generated a clinical term matrix reflecting the co-occurrence of terms in each abstract. To avoid bias towards terms that were more commonly or less commonly mentioned than average, we employed both standard score (z-score) and term frequency–inverse document frequency (tf-idf) normalization [12,13]. These two normalization algorithms compensate for the ways in which terms associate differently: z-score emphasizes connections between more rare co-occurrences while tf-idf emphasizes correlations between more common terms. By using these matrices, we could perform further analyses and investigate how different features associate with each other. To find large-scale patterns in the data we applied T-distributed Stochastic Neighbor Embedding (t-SNE) clustering to the matrices. This unsupervised machine-learning algorithm allowed us to identify groups of terms that appeared closely associated (Figure 2 and Figure S4). In particular, it was apparent that terms relating to specific pathologies (e.g., heart disease, neurodegeneration) associate with one another, thereby validating our normalization methods. Notably, the term cancer appeared to associate with a cluster including ‘iron’, Ferritin’, ‘Anemia’ suggesting that these are possible markers for cancer identification/progression. Indeed, this may be the case [14]. In sum, these algorithms show that the results generated from our data-mining effort agree with current knowledge and suggest that our method is robust.



... / ...



F O R   T H E   R E S T   O F   T H E   S T U D Y,    P L E A S E    V I S I T   T H E   S O U R C E




Edited by Engadin, 17 August 2019 - 08:23 PM.

0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users