• Log in with Facebook Log in with Twitter Log In with Google      Sign In    
  • Create Account
              Advocacy & Research for Unlimited Lifespans


Telomerecat: A ploidy-agnostic method for estimating telomere length from whole genome sequencing data

telomere telomere lenght telomerecat telomerase genotype

  • Please log in to reply
No replies to this topic

#1 Engadin

  • Guest
  • 178 posts
  • 536
  • Location:Madrid
  • NO

Posted 08 May 2019 - 03:16 PM

*** To be presented at Senescence UK Symposium of the ICSA (International Cell Senescence Association) on Friday 10th of May 2019 at Institute of Genetics & Molecular Medicine, University of Edinburgh, UK.***





Telomere length is a risk factor in disease and the dynamics of telomere length are crucial to our understanding of cell replication and vitality. The proliferation of whole genome sequencing represents an unprecedented opportunity to glean new insights into telomere biology on a previously unimaginable scale. To this end, a number of approaches for estimating telomere length from whole-genome sequencing data have been proposed. Here we present Telomerecat, a novel approach to the estimation of telomere length. Previous methods have been dependent on the number of telomeres present in a cell being known, which may be problematic when analysing aneuploid cancer data and non-human samples. Telomerecat is designed to be agnostic to the number of telomeres present, making it suited for the purpose of estimating telomere length in cancer studies. Telomerecat also accounts for interstitial telomeric reads and presents a novel approach to dealing with sequencing errors. We show that Telomerecat performs well at telomere length estimation when compared to leading experimental and computational methods. Furthermore, we show that it detects expected patterns in longitudinal data, repeated measurements, and cross-species comparisons. We also apply the method to a cancer cell data, uncovering an interesting relationship with the underlying telomerase genotype.




Telomeres are the ribonucleoprotein structures that shield the ends of chromosomes from DNA damage responses1. They are multifunctional regions of the genome that, unless being actively lengthened (by e.g. telomerase) will shorten with DNA duplication2. In this manner they both act as a molecular clock and provide a natural limit on the replicative potential of a cell, with possible pathways to apoptosis, senescence and, in cancer cells, genomic instability3. Telomere length is thus not only a risk factor for cancer and other diseases4, with germline mutations near to TERT (the gene encoding telomerase) being associated with several cancers5, but also has a mechanistic role in tumour aetiology through driving instability, influencing regulation of telomere-proximal genes6, and (through activation of telomere-lengthening) provision of replicative immortality7. In humans, the DNA component of telomere is an extremely repetitive region of the genome comprised of the nucleotide hexamer: (TTAGGG)n.


In this study we present Telomerecat, the first tool designed specifically to estimate mean telomere length from cancer whole genome sequencing (WGS) data. There have been previous approaches to using WGS data to say something about telomeres. Castle et al. provided a proof of concept in 20108, and this was refined by the first group to use such an approach in earnest9. Ding et al.10 published the first fully-fledged method for estimating length rather than just telomere content, with the accompanying tool ‘TelSeq’. Their study was also the first time a computational method had been validated against an established experimental method.


TelSeq assumes a fixed number of chromosomes when estimating telomere length and so makes no allowance for aneuploidy. Nevertheless, as the strongest available tool there are several examples of TelSeq being used to analyse cancer datasets11,12. Ṅotably a recent pan-cancer analysis made use of the TelSeq tool6. While generally sound, such analyses are vulnerable to misinterpretation in the event of systematic differences in aneuploidy (as may be the case when comparing different cancer types). Indeed, recurrent somatic copy number alterations involving the telomere were observed in all cancer types studied in a pan-cancer study of Cancer Genome Atlas data13.


Where such changes (suggestive of aneuploidy) occur, cells will likely be left with an altered number of telomeres. Accordingly the quantity (and proportion) of telomere sequence within the sample is altered, even if the mean length of telomeres is unaltered. Thus if we observe more telomere sequence in a cancer sample, we do not know if this is due to longer telomeres.


Two other tools of note have been published: Telomere Hunter and Computel. TelomereHunter14 reports telomere content rather than telomere length, and so does not provide a direct comparison. TelomereHunter classifies reads based on their mapping location within the parent BAM file and outputs statistics relating to variations of the canonic telomere hexamer. Computel15 does allow the user to specify the number of telomeres present, but since this is unknown (and cannot safely be inferred from copy-number profiles or ploidy statistics) it again does not provide a direct comparison. Since TelSeq is more frequently used in the literature, has greater experimental validation than Computel, and a recent comparison study16 did not find that the greater convenience of TelSeq was at the cost of poorer performance, we take TelSeq as the representative of current methods in our comparisons.


Rather than normalizing against the entire genome, Telomerecat normalizes the telomeric content against the subtelomeric regions. In this manner it is agnostic to the ploidy of the sample, and assumes only that each telomere has a subtelomere.


Erroneous regions of apparent telomere and subtelomere can arise from other stretches of the TTAGGG repeat sequence that appear in the human genome: so-called Interstitial telomeric repeats (‘ITRs’)17. Telomerecat estimates and corrects for the number of ITR-originating reads by assuming that the aggregate number of reads from the 3′ end of TTAGGG ITR sequences will be approximately equal to the aggregate number of reads from the 5′ end, while true telomeres only have a boundary at one end. In this manner, telomerecat obtains an estimate of ITR contributions without having to align to these difficult-to-map regions.


A third potential hindrance for telomere estimation, after aneuploidy and ITRs, is that it is difficult to define the end of the telomere precisely, based solely on genomic sequence (explicit information about DNA secondary structures and the locations of bound proteins having been lost). The subtelomere is composed of subtelomeric repeat sequences and segmental duplicates, interspersed by canonic telomere repeats18. These subtelomeric repeat sequences can look much like the telomere but with the addition of sequencing errors. Too strict a definition of telomere as being the region of TTAGGG repeats would be hostage to genuine variations, sequencing errors, and somatic mutations.


Telomere length is therefore necessarily a subjective measure, consistent only within the method used. Accordingly there may be an off-set in comparisons with other methods. Even ‘gold standard’ laboratory methods for measuring telomere lengths may have their own biases in this regard19.


Core to Telomerecat’s estimation process is the ratio between read-pairs that lie within the telomere and read-pairs that span the telomere boundary. Observing reads on the boundary between telomere and subtelomere provides a quantification of telomere numbers through which we normalize the telomere lengths. Where other samples always assume that more telomere reads mean longer telomere, Telomerecat is able to account for the fact that there may actually be more individual telomeres.


Moreover, differences in patterns of sequencing error have the potential to lead to inconsistency between samples even if using the same method. To this end, Telomerecat includes a novel method for correcting sequencing error in telomere sequencing reads. This model automatically adapts to differing error across sequencing preparations.


Telomerecat is an open source tool, the code is available from https://github.com/jhrf/telomerecat. Full installation and usage documentation is available at https://telomerecat.readthedocs.io.


R E S T   A T   S O U R C E

Edited by Engadin, 08 May 2019 - 03:22 PM.

Also tagged with one or more of these keywords: telomere, telomere lenght, telomerecat, telomerase genotype

0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users