All families of transposable elements were active in the recent wheat genome evolution and polyploidy had no impact on their activity

Bread wheat (Triticum aestivum L.) is a major crop and its genome is one of the largest ever assembled at reference-quality level. It is 15 Gb, hexaploid, with 85% of transposable elements (TEs). Wheat genetic diversity was mainly focused on genes and little is known about the extent of genomic variability affecting TEs, transposition rate, and the impact of polyploidy. Multiple chromosome-scale assemblies are now available for bread wheat and for its tetraploid and diploid wild relatives. In this study, we computed base pair-resolved, gene-anchored, whole genome alignments of A, B, and D lineages at different ploidy levels in order to estimate the variability that affects the TE space. We used assembled genomes of 13 T. aestivum cultivars (6x=AABBDD), T. durum (4x=AABB), T. dicoccoides (4x=AABB), T. urartu (2x=AA), and Aegilops tauschii (2x=DD). We show that 5 to 34% of the TE fraction is variable, depending on the species divergence. Between 400 and 13,000 novel TE insertions per subgenome were detected. We found lineage-specific insertions for nearly all TE families in di- tetra- and hexaploids. No burst of transposition was observed and polyploidization did not trigger any boost of transposition. This study challenges the prevailing idea of wheat TE dynamics and is more in agreement with an equilibrium model of evolution.


INTRODUCTION
Transposable elements (TEs) are key factors of genome evolution and their contribution to plant phenotypic variations and adaptation were shown in many studies (for reviews: (Lisch, 2013;Baduel and Quadrana, 2021). They are particularly prominent in the genomes of Triticeae, a tribe of monocot plants encompassing important crops like wheat, barley, and rye, which diverged from a common ancestor ~13 MYA. Triticeae genomes contain millions of copies of TEs, making them a good model to understand the dynamics of TEs in complex genomes, their contribution to structural variations (SVs) and species adaptation. Bread wheat (Triticum aestivum) is the most widely grown crop on earth. Its genome is hexaploid and was shaped by two successive events of allo-polyploidization (reviewed recently in (Levy and Feldman, 2022) that brought together three related diploid subgenomes called A, B, and D, originated from three diploid species which common ancestor was estimated around 6 MYA (Marcussen et al., 2014;Middleton et al., 2014;Glémin et al., 2019;Avni et al., 2022;Li et al., 2022). They share a common karyotype with 7 pairs of chromosomes representing around 5 Gb each. A first interspecific hybridization event occurred ~0.8 MYA between two Triticeae diploid species from the A and B lineages, and a second event occurred between a tetraploid (AABB) and a diploid species carrying a D genome during the period of wheat domestication 9000 YA. The The large size of its genome (~5 Gb per haploid subgenome), mainly explained by a massive TE content (85%), has limited for decades our capability to assemble them and, thus, characterize their content, organization, and diversity. Before assessing a complete genome sequence, studies showed high variation and sub-genome specificity of TEs (Yaakov et al., 2013a;Yaakov et al., 2013b). Evidence of TE mobilization in newly formed polyploids was described for a few families as well as changes in the epigenetic status (Kraitshtein et al., 2010;Kashkush, 2011, 2012). These studies tended to conclude that polyploidy was a genomic shock and suggest that TEs evolve by bursts where a few families escape to silencing and expand massively in a short time period, leading to rapid genome diversification. In 2018, a reference-quality genome assembly was produced for the hexaploid cultivar Chinese Spring (IWGSC, 2018) and a deep comparative analysis of the ~4 million TE copies shaping the A, B, and D subgenomes led to unexpected conclusions about the evolutionary dynamics of TEs . Since their divergence 6 MYA, there was a near-complete TE turnover between A-B-D, so that ancestral TEs have been lost while more recent ones have amplified independently in the three diploid lineages, and no burst of TE transposition was observed after polyploidization events. Surprisingly, despite the TE turnover, the wide majority of the ~500 TE families are still present in similar proportions in each subgenome, although they evolved into subgenome-specific variants i.e., subfamilies. Abundant families are still the same while low-copy families persist at low-copy numbers. Hypotheses raised were that TE turnover is highly regulated and may follow evolutionary rules to maintain a functional genome architecture. These conclusions challenged the "burst"-centered view of plant TE dynamics and is rather in line with an alternative scenario of equilibrium as observed, for instance, in Brachypodium distachyon natural populations where TE activity is "remarkably constant" (Stritt et al., 2018) or as exemplified by the Alesia family (Stritt et al., 2021) which maintains at low-copy numbers in the Angiosperms across the generations, suggesting self-regulatory mechanisms (Cosby et al., 2019). TEs might maintain at an equilibrium between deletion and amplification in Triticeae but it is still a matter of debate (Bariah et al., 2020).
The limit with comparing A-B-D genomes is that all ancestral TEs have been erased so that it is not possible to trace recent deletion/transposition events. For that, it is necessary to compare genomes that have diverged much more recently and identify structural variations. This is what was performed by producing resequencing data on flow-sorted chromosomes 3B in a panel of 45 diverse Triticeae (De Oliveira et al., 2020). The extent of TE presence-absence variations (PAVs) was estimated to 7-8% per T. aestivum accession and up to 24% in more distant species and, again, no burst of any specific TE family was observed. In contrast, recent TE insertions were detected for a wide diversity of families, confirming that most of families were active and transposed. Recently, several Triticeae genomes have been assembled at reference-quality level, offering the opportunity to analyze TE dynamics at a resolution never reached so far using whole genome alignments. Analysis of five full-length LTR-retrotransposon families across T. aestivum assembled genomes confirmed that distinct subfamilies were active in diploids mainly, in waves lasting hundred thousand years, with only sparse evidence of recent insertions in polyploids (Wicker et al., 2022). However, the extent of variability due to TEs deletion/amplification is still unknown.
In this study, we developed a method in order to compute Triticeae whole-genome sequence alignments guided by anchor-genes and retrieve the complete landscape of TE variability originated from deletions and transpositions. We thus compared the fully assembled sequences of the A genomes, the B genomes, and also the D genomes between di-tetra-and hexaploids: T. urartu (AA; (Ling et al., 2013)), Ae. tauschii (DD;(Jia et al., 2013)), T. dicoccoides (AABB; (Avni et al., 2017)), T. durum (AABB; (Maccaferri et al., 2019)), and 13 T. aestivum (AABBDD; (IWGSC, 2018;Walkowiak et al., 2020;Aury et al., 2022)). We show that variability affects from 5 to 34% of the TEs depending on the species compared. We identified 51,928 recent transposition events involving 346 different families that were active recently in all species whatever their ploidy level. We show that transposition rate is similar in di-tetraand hexaploids, confirming the equilibrium we observed previously, and confirming that polyploidy did not disturb this equilibrium.

TE annotation and comparison of family proportions
We used the available TE annotations that we computed previously for Chinese Spring Aury et al., 2022) and Renan Aury et al., 2022). For all the other genome sequences compared in this study, we did not use the available annotation but rather produced a TE annotation using CLARITE and the ClariTeRep library (Daron et al., 2014) with the same parameters as for Chinese Spring and Renan. CLARITE uses RepeatMasker (Smit et al., 1996(Smit et al., -2004 for similarity search, applies a step of defragmentation in order to merge adjacent predictions that describe a single element, and eventually applies a step of reconstruction of nested insertions. The abundance of each TE (sub)family in a genome was calculated by cumulating the length of all fragments assigned a given (sub)family divided by the total length of TEs. To investigate differences of TE family abundance between genomes, and potential enrichment in the variable fraction of the genome, we calculated proportions of each TE family (cumulated length assigned to a given family divided by the subgenome size) and computed log2 ratios. Only families accounting for more than 100 kb in the analyzed subgenome were considered.

Estimation of the extent of genomic variability using ISBP markers
Based on CLARITE predictions of TEs, we extracted 150 bps encompassing the 5' and 3' junctions of each TE extremity with its insertion site with 75 bps on each side of the junction as previously described (De Oliveira et al., 2020). These 150 bps tags correspond to ISBP markers (Insertion Site Based Polymorphism (Paux et al., 2006)) that are expected to be unique kmers at the whole genome level. For each genome studied, we extracted all ISBPs and discarded those containing Ns. In case two ISBPs overlap by 100 bps or more, we kept only one of both. We also discarded ISBPs that are non-unique in the genome from which they were designed: mapped with Minimap2 (Li, 2018) at multiple loci with cutoff 98% identity and 100% query overlap. Presence/absence of ISBPs were searched in all compared genomes using Minimap2 option -xsr -w5. An ISBP was considered absent (PAV) if no match was found considering at least 95% identity and 95% coverage (maximum 7 bases soft-clipped). Only pseudomolecule sequences were considered for this analysis, meaning that unanchored scaffolds ("ChrUn") were excluded. To determine nucleotide positions defining the limits between distal (R1 and R3), proximal (R2), and (peri)-centromeric (C) regions of each chromosome (as previously defined in Chinese Spring (IWGSC, 2018)) in all species/accessions analyzed, we used the Chinese Spring ISBP mapping data. Hence, the chromosomal position of the ISBP that was the closest to a border (between R1 and R2 of chr1A for instance) defined in CS was used as border in the compared species/accessions.

Identification and selection of orthologous intergenic regions
We used the 105,200 High Confidence (HC) genes with a position along the 21 pseudomolecules of RefSeq v1.1 (IWGSC, 2018) as anchors to find orthologous intergenic regions in other genomes. The nucleotide sequences of the corresponding CDSs were mapped using GMAP (v18.05.11, (Wu and Watanabe, 2005), options: gmapl -cross_species) on the homeologous chromosome for each genome compared. Only best hit with at least 90% identity and 90% coverage were considered and kept for the subsequent analyses. We developed a Python script (Get_Collinear_Region.py) in order to retrieve all orthologous intergenic regions (oIGRs) that were flanked by the same pair of neighbor orthologs (collinearity) in Chinese Spring and in the compared genome. Cases of tandem duplicated gene copies that mapped at the same positions in a compared genome were dealt with by the script so that it specifically determined which copies delimitate an oIGR. Chromosome positions of these pairs of neighbor orthologs were used to extract the corresponding genomic segments (including the genes at both extremities of oIGRs) in both compared genomes. oIGRs were then aligned with BLASTN (v2.11.0+, (Camacho et al., 2009), threshold evalue: 1e-5) and we filtered out HSPs (High Scoring Pairs) with a cutoff at 90% identity. We excluded HSPs which coordinates were included into a larger one. This happened in case of lineage-specific tandem duplications, or when a novel copy of an element shared homology with another TE present within the aligned region. Coordinates of HSPs were then used to create BED files and we used "Bedtools merge" (bedtools/2.26, (Quinlan and Hall, 2010)) to merge overlapping conserved segments for each genome. We then used "Bedtools complement" to create BED files describing the variable (i.e., specific) segments between genomes i.e., sequences that are subject to presence-absence variations (PAVs) between two compared oIGRs. PAV candidates corresponding to gaps in the assembly (stretches of Ns) were identified and discarded from PAVs. We eventually used "Bedtools intersect" to retrieve TE annotation of the conserved/variable sequences.

Detection of recent TE insertions and estimation of the insertion dates
BED files describing the positions of PAVs (see above) were analyzed in the objective of finding PAV coordinates that fit nearly perfectly (i.e., with a tolerance of 10 bps at 5' and 3' extremities) with the coordinates of a TE, with status "complete" annotated by CLARITE, as evidence for recent transposition (or possible excision for class 2 elements  . Distances of newly inserted TE from the nearest predicted gene were computed with "Bedtools closest".

TE annotation and comparison of A, B, and D subgenomes between di, tetra-and hexaploid Triticeae
To avoid biases due to different TE annotation approaches, we predicted TEs in the genome sequences of T. aestivum (13 accessions), T. urartu, Ae. tauschii, T. dicoccoides and T. durum (abbreviated Tae, Tur, Aet, Tdi, Tdu, respectively) with the same method and criteria, using CLARITE (Daron et al., 2014). TEs represent between 86 and 87% for the A lineage, between 84 and 85% for B, and between 82 and 83% for D. We previously showed that A-B-D diverged ~5-6 MYA, a time during which the TE turnover has erased ancestral TEs so that there is (almost) no TE conserved between orthologous loci (homeologous in polyploids). Here, we focused our work on a much more recent evolutionary scale, by comparing A subgenomes together, B together, and D together, at different ploidy levels. TEs shaping the intergenic space are, thus, conserved because they were present in the common ancestor of the genomes compared.
The amount of TEs appeared very similar, even of each TE superfamily, when we compared A (sub)genomes, so did B, and D (Table 1; Supplemental Table S1). T. urartu A appeared the most different, which fits with its earlier divergence (1.3 MYA). To get a first flavor about the extent of variability, we started by comparing (sub)genome-wide proportions of TE families between lineages. Using Chinese Spring (CS) as a reference, we distinguished 301 TE families (those accounting for more than 100 kb in at least 1 subgenome), with the 20 most abundant representing >84% of the TE fraction, meaning that most families are present at low-copy number. We showed that 97% (292/301) of the families were present in similar proportions in the compared genomes (fold-change<2), suggesting that none of these Triticeae have experienced a massive transposition "burst" of any family since they diverged, neither before, nor after, the polyploidization events. Rare cases of differential abundance were observed for nine very low-copy families (DTM_famc13, RIX_famc25 XXX_famc10-150-46-53-57-60, DHH_famc1) which abundance varies around the 100 kb cutoff applied here and for which the extent of the difference may not be associated with TE amplification but rather to methodological limits (partial genome assembly and anchoring, TE annotation proportions may hide higher levels of structural variations between those genomes which we, thus, investigated using uniquely mappable TE-derived markers.

Extent of structural variations affecting TEs through the mapping of ISBP markers
Although TEs are repeated, each copy is inserted into a different locus, which makes the 5' and 3' junctions between TE extremities and the insertion site unique kmers at the whole genome level. We used this valuable feature to address presence/absence variations (PAVs), also called TIPs (transposon insertion polymorphisms) or ISBPs (Insertion Site-Based Polymorphisms) among the wheat research community, between orthologous loci. We extracted 150 bps encompassing each TE extremity of all genomes which provided a dataset of, on average, 1.7, 1.9, and 1,5 million uniquely mappable ISBP markers from the A, B, and D subgenomes, respectively. ISBPs mapped with at least 95% identity over 95% of their length were considered present while no match revealed the absence i.e., PAV. Proportions of markers subject to PAVs in all pairwise comparisons are presented in Figure 1. PAV levels detected here were in agreement with the existence of two phylogenetically distinct groups corresponding to the Asian (CS, Norin61, Zang1817) and European wheat genetic pools.
These global estimates at the whole subgenome level may hide strong local differences.
Chromosome extremities (distal regions) were, on average, four times more variable than the central regions of chromosomes (borders as defined in (IWGSC, 2018)). For instance, between the closely related Asian accessions CS and Norin61 (Fig. 1) We then wondered about the composition of the variable fraction of the TE space in order to investigate which families have impacted the recent Triticeae genome evolution. Indeed, recent TE amplification of active families could be detected because they are enriched in the variable fraction. Thus, we searched for families whose abundance is substantially enriched in the variable fraction of the genome compared to its genome average. We calculated enrichment ratios for the 100, 113, and 98 most abundant families of the A, B, and D subgenomes, respectively (those representing at least 1 Mb per subgenome in CS). Together they represent 99% of the TE content, the others being low-copy TEs. Fold-changes are shown as heatmaps in Figure 3. The main result here is that such enrichment is rare. The composition of the variable fraction is quite similar to the genome average. However, we found 3 TE families (2 CACTAs and 1 Gypsy) and 2 satellite repeats that were enriched (log2 fold-change>=2.0) in the variable TE fraction of some genomes: RLG_famc8 (Cereba, Quinta), DTC_famc4 (Clifford/Mandrake/Byron), DTC_famc9 (Isaac), and satellites XXX_famc1 (TaiI) and XXX_famc10 (unnamed). RLG_famc8 (Cereba/Quinta) is a centromeric gypsy family which represents on average 1% of the oIGRs but accounts for 5% of the variable sequences detected.
Centromeric TEs and telomeric satellites have been, thus, main components of the recent (intraspecific) genome structure diversification. But if these examples are striking, together they were responsible for at maximum 5-6% (in bps) of the PAVs, showing that the observed variability does not originate from amplification bursts of a few very active families. In contrast, the composition of the recently shaped TE space resembles the ancestral one.

Traces of recent transposition events confirms the equilibrium model of TE evolution
Variable regions originated from both deletions and insertions. To get deeper insights into the dynamics of transposition, we searched for PAVs (identified by oIGR pairwise BLAST alignments) which borders fit with borders of a TE, as potential traces of transposition (or potentially excision for class 2 elements). Number of such events are summarized in Table 3.
Pairwise comparisons with CS revealed between 400 and 13,000 recent TE insertions per subgenome, depending on the species/accession considered. We found presence of TSDs (target A striking result is that the numbers of newly inserted TEs were quite similar between the two genomes aligned, whatever the comparison considered ( predictions, or misclassified repeats, and we cannot find newly inserted copies for such families.
Thus, we applied a 100 kb threshold per subgenome analyzed in order to estimate the proportion of active families in wheat. This retained 301 high confidence families and 89% of them were active. We conclude that virtually all families were active recently and gave rise to newly inserted copies in the recent Triticeae evolution. Even at the intraspecific level, we found traces of transposition for 328 families. This situation cannot be explained by cycles of silencing/bursts but is rather in favor of an equilibrium model of evolution.
To go further, we wondered if the level of activity of a given family was different between species/accessions and if the rate of transposition was correlated or not with the abundance of the family. Figure  Fatima copies in CS D and Aet D . These values were also quite similar when comparing CS with tetraploids ( Fig. 4)

Proximity to genes and inconsistent estimates of insertion dates
In total, our bp-resolved alignments revealed the presence of 51,928 elements (39,966 class 1, 11,048 class 2, and 914 unclassified) in the Chinese Spring genome that are absent from the corresponding insertion site in one or more compared genomes. They represent a large and clean dataset of novel insertions, although some of the class 2 elements detected here may be traces of recent excision in the compared genome. We used this dataset in order to investigate the global distribution of new insertions and the potential preferential insertion in gene vicinity. least 100 novel insertions detected in CS. Half of them exhibit preferential insertion around genes (at least 3 times more insertions close to genes than expected randomly). They belong to class 2 transposons and LINEs retrotransposon families that were previously shown to be enriched in gene promoters . In contrast, Gypsy and Copia insertions tend to be excluded from the gene vicinity and insert preferentially in the core of large IGRs.
Finally, we estimated the age of these novel insertions with the approach that is widely used by

DISCUSSION
Assembling the wheat genome has long been a challenge but we now have reached the pangenomics area with multiple high-quality genome assemblies available. SNP diversity was intensively characterized in order to get a world-wide view of the Triticum population structure, impact of selection, introgressions (Balfourier et al., 2019;Zhou et al., 2020), and to even build haplotype maps for genotype imputations (Brinton et al., 2020;Jordan et al., 2022). SNPs are easy to discover, to genotype, because bioinformatics pipelines that handle short-reads are well established and because technology advances tackle the complexity of the genome. However, lots remain to be done to go beyond the type of diversity we could investigate with SNPs which is basically no more than allele combinations. This is the goal of pangenomics which relies on discovering the hidden diversity with loci under presence/absence variations. For that, it is important to go beyond the "uniquely mappable area" of short-read based bioinformatics, especially when studying complex genomes. Structural variations (SVs) were only poorly characterized for wheat genes and partially addressed for TEs (Montenegro et al., 2017;De Oliveira et al., 2020). SVs are, however, of major importance to understand the molecular factors responsible for phenotypic variations and to understand evolutionary rules governing TE dynamics. Here, we faced the challenge of characterizing variability affecting Gbs of repeated elements in one of the most complex genomes ever assembled. Addressing this question required dedicated tools, strategies, and expertise in TE classification and annotation.
We homogenized TE annotations of all available genome sequences using CLARITE and ClariTeRep library that were specifically developed for modeling wheat TEs (Daron et al., 2014) and previously used to comparing A-B-D TE content . TE annotations are actually very dependent on the tools and library used so that it is hazardous to compare annotations performed by different groups. Most striking difference comes from CACTA transposons because the ClariTeRep library is enriched in CACTAs manually curated (Choulet et al., 2010) that are generally absent from other wheat specific TE libraries. Indeed, several Triticeae sequencing projects concluded that CACTAs represent 5-6% of the genome (Jia et al., 2013;Ling et al., 2013), while their proportion is around 15% when using ClariTeRep.
We established a workflow in order to compute base pair-resolved whole-genome sequence alignments for Gb-sized genomes, by taking advantage of the high level of gene collinearity of Triticeae genomes to anchor the alignments. We split the subgenomes into ~30,000 intervals corresponding to individual orthologous intergenic regions flanked by pairs of collinear orthologs, thus reducing the alignment space so that most TEs are not repeated within the aligned interval. This, however, excluded noncollinear regions from the analyses and it was important to check whether such a filter introduces bias by excluding regions that may be more variable than the genome average. This is why we checked with unbiased estimators of variability: comparing genome-wide family proportions and mapping of all ISBP markers.
Results from oIGR alignments were consistent with these unbiased estimates. Firstly, TE families showing copy number differences at the whole genome level were also the ones enriched in the variable TE fraction defined by the alignments. Secondly, the extent of variability defined by oIGR alignments were fully in line with the unbiased estimates based on ISBP mapping although slightly (2% on average) lower, confirming that noncollinear (nonaligned) regions contain more SVs than the genome average. However, this quality control demonstrated that we can be confident that the conclusions reached by interval-based whole genome alignments are sound.
Our conclusions are in favor of an equilibrium model and do not fit the model of TE evolution by cycles of bursts/silencing. First pieces of evidence that wheat TE dynamics do not follow this model was brought by comparing A with B and D subgenomes . These lineages diverged millions of years ago so that TE turnover is nearly complete i.e., there are (almost) no more orthologous TEs at that scale. Although TEs evolved independently in diploids, many striking features appeared: the wide majority of TE families maintained their ancestral copy-number with abundant families remaining abundant, and low-copy families remaining at low-copy number in the three lineages. Moreover, families that tend to be enriched close to genes kept this behavior in all lineages, although novel copies did not target the same genes. All features led to conclude that TE dynamics was highly regulated and suggested evolutionary constraints to maintain global equilibrium. This is this model that we temptingly challenged here by characterizing the initial events of the ongoing TE turnover in three Triticeae lineages and the impact of polyploidy on TE activity. Our data revealed that the TE composition of the variable TE fraction resembles the ancestral conserved one. This confirms that TE turnover does not modify the global TE landscape. In contrast, it occurs while maintaining an equilibrium with unchanged TE family proportions. We did not see any burst of a given family nor families that would have decreased in proportion because of being completely silenced.
Rare cases of families that contribute more than others to recent genome diversification were observed. Interestingly, this was the case for subtelomeric satellite repeats (XXX_famc10) and centromeric retrotransposons Cereba/Quinta (RLG_famc8). This suggests that such elements that play a major structural role in shaping chromosome architecture comprise the most rapidly evolving elements of the genome.
Specific regions we identified originate from both TE deletions and insertions. An important outcome is that, whatever the genomes compared, whatever the ploidy levels, the proportion of specific sequences was similar in both the query and reference. We did not see a species that would have accumulated more novel insertions than others. The TE turnover seems to occur at a conserved rate in the different lineage which is, again, in favor of the equilibrium model with a controlled genome size. This strongly suggests that TE deletions compensate genome expansion by novel insertions over time. Since we reached a base pair-resolution by BLAST alignments, we were able to search for SVs that are traces of a recent insertion/excision: a complete TE with TSDs in the query genome versus an empty insertion site in the reference genome. Although we found thousands of such insertions, their accumulated size reached only 10% of the specific fraction (in bps), while we would expect close to 50%. The reason for this is most probably because criteria to call a novel insertion were too strict to avoid false positives.
In addition, subsequent events of deletions/rearrangements either in the query or reference may have erased molecular traces of such recent insertions.
Another interesting result is that polyploids have not experienced more transpositions than their diploid relatives. Diploids and polyploids accumulated a very similar number of TE insertions per subgenome, confirming there was no genomic shock that would have been followed by TE deregulation. In contrast, our results tend to show that the rate of TE transposition and, thus, TE turnover, is stable over time and whole genome duplications did not destabilize this equilibrium. However, this model is not in agreement with the conclusions of a differential accumulation of TE families in Triticum/Aegilops (Yaakov et al., 2013b;Keidar-Friedman et al., 2018). One of the main sources of such discrepancies is the definition of what we call a family. Previous studies observed lineage-specific accumulations of families whereas we observed an equilibrium. In fact, we agree that families evolved into variants i.e., subfamilies, that have accumulated independently in different lineages. But many families that were called by different names are actually related, sometimes can even be considered members of the same families. Deciphering these phylogenetic relations between copies is crucial to understand that, even if it is possible to see variants overrepresented in one lineage, the important and striking result is that its family remained at an equilibrium in all lineages. The importance of connecting elements was also underlined in Stritt et al. (2021) where it has been shown that a rare TE family tend to maintained vertically at low-copy number throughout angiosperms, suggesting an alternative scenario prevailing the usual view of TE dynamics. Also similar to what we found, TE insertion polymorphisms explored in 53 Brachypodium accessions did not reveal any lineage-specific TE activations but rather a homogeneous activity and a stable transposition rate among populations, suggesting a conserved regulatory mechanism (Stritt et al., 2018). TE proliferation in Triticeae does not appear to be the result of an induction by the environment but rather a highly regulated process playing a role in basic genome functioning. This challenges the previous view of TEs as "invasive" elements as it becomes more and more suggested in literature by the perspective of the genome ecology where TEs would persist by creating their own niche (Kremer et al., 2020). This has been recently commented in maize where the analysis of family-level ecology of the genome led to conclude that, like in our study, most families have been active recently, each family having its own survival strategy representing the evolution of a distinct ecological niche (Stitzer et al., 2021). Despite the short evolutionary time frame studied here, we discovered traces of novel insertions for 346 families i.e., meaning that virtually all families are active actually. Same conclusions were reached in rice where ca. 11,000 new insertions of 251 families were detected in genomes of 3000 natural rice accessions (Liu et al., 2021). This highlighted the discrepancies between artificial stressinduced transposition and activity under natural conditions which provide two different views.
Similarly, we wondered why our results are not in agreement with TE behavior observed in synthetic newly formed polyploid wheats where some TEs were mobilized in response to polyploidy (Yaakov and Kashkush, 2012;Yaakov et al., 2013b). As suggested earlier, instability following polyploidy is unlikely to be selected for under natural conditions (Zhao et al., 2011) and we support the hypothesis that only polyploids for which the equilibrium remained stable maintained across the generations.
Finally, the fact that recently inserted LTR-RTs did not carry identical LTRs made us underline some doubts about the method used to estimate dates of insertions. Similar inconsistencies were reported based on the analysis of ca. 25,000 LTR-RTs in fifteen plant species which revealed that one fourth of nested retrotransposons exhibited older insertion date than pre-existing elements in which they had been inserted (Jedlicka et al., 2020). Gene conversion was suggested to be involved in erasing divergence over time. Another possible explanation would be that replication of LTRs during the insertion mechanism is error-prone. This may have strong impact on the way the community uses the molecular clock to date insertion events.

AUTHOR APPROVALS
All authors have seen and approved the manuscript. Manuscript has not been accepted or published elsewhere.