Deciphering the demographic history of allochronic differentiation in the pine processionary moth Thaumetopoea pityocampa

Understanding the processes of adaptive divergence, which may ultimately lead to speciation, is a major question in evolutionary biology. Allochronic differentiation refers to a particular situation where gene flow is primarily impeded by temporal isolation between early and late reproducers. This process has been suggested to occur in a large array of organisms, even though it is still overlooked in the literature. We here focused on a well‐documented case of incipient allochronic speciation in the winter pine processionary moth Thaumetopoea pityocampa. This species typically reproduces in summer and larval development occurs throughout autumn and winter. A unique, phenologically shifted population (SP) was discovered in 1997 in Portugal. It was proved to be strongly differentiated from the sympatric “winter population” (WP), but its evolutionary history could only now be explored. We took advantage of the recent assembly of a draft genome and of the development of pan‐genomic RAD‐seq markers to decipher the demographic history of the differentiating populations and develop genome scans of adaptive differentiation. We showed that the SP diverged relatively recently, that is, few hundred years ago, and went through two successive bottlenecks followed by population size expansions, while the sympatric WP is currently experiencing a population decline. We identified outlier SNPs that were mapped onto the genome, but none were associated with the phenological shift or with subsequent adaptations. The strong genetic drift that occurred along the SP lineage certainly challenged our capacity to reveal functionally important loci.

occurring populations, has received growing attention in the last 12 years (Rundle & Nosil, 2005). The fate of diverging populations maintaining a certain level of gene flow, and the conditions in which speciation can still occur are central questions in evolutionary biology (Smadja & Butlin, 2011). A mechanism possibly causing sympatric speciation is allochronic differentiation, which occurs when differences in breeding time within a species lead to temporal assortative mating and limit gene flow between early and late reproducers (Alexander & Bigelow, 1960). Isolation by time can further lead to adaptation by time (Hendry & Day, 2005) when divergent selection operates between contrasting environmental conditions encountered at the different breeding times. This process remains largely unexplored in the literature, but has been suggested to occur in a large array of organisms such as plants (Devaux & Lande, 2008;Savolainen et al., 2006;Weis et al., 2005), birds (Friesen et al., 2007), fishes (Limborg, Waples, Seeb, & Seeb, 2014), corals (Rosser, 2015(Rosser, , 2016 or insects (Santos, Burban, et al., 2011;Sota et al., 2013;Yamamoto, Beljaev, & Sota, 2016;Yamamoto & Sota, 2009, 2012. Many more examples probably remain to be discovered, and only nine study cases were identified in a recent review of 200 papers as examples of "true allochronic speciation" (Taylor & Friesen, 2017). To go beyond the description of such case studies and disentangle the evolutionary scenarios underlying allochronic differentiation, much remains to be done; in particular, the initial reduction in migration rate between the diverging populations and the underlying genomic mechanisms remains to be explored in most cases.
The recent advent of high-throughput genomic techniques as well as statistical advances for analysing large-scale data sets has opened unprecedented opportunities to address ecological, evolutionary and genetic questions in nonmodel organisms (Hasselmann, Ferretti, & Zayed, 2015). Genomewide data have been proven to be powerful for estimating the age of demographic events (McCoy, Garud, Kelley, Boggs, & Petrov, 2014), retrieving fine-scale population genetic structures (Szulkin, Gagnaire, Bierne, & Charmantier, 2016) and identifying phylogeographic patterns (Derkarabetian, Burns, Starrett, & Hedin, 2016). More, even if studying wild populations of nonmodel organisms is still a major challenge, population genomic approaches have allowed identification of genomic regions underlying phenotypic characteristics or traits involved in local adaptation (Berdan, Mazzoni, Waurick, Roehr, & Mayer, 2015;Guo, DeFaveri, Sotelo, Nair, & Meril€ a, 2015;Hohenlohe, 2014). Here, we used population genomics in a nonmodel insect species to disentangle the evolutionary scenario of allochronic differentiation, followed by adaptation to new environmental conditions. We focused on one of the few cases identified by Taylor and Friesen (2017) as a well-documented example of "true incipient allochronic speciation," namely the pine processionary moth Thaumetopoea pityocampa (Dennis & Schifferm€ uller). This species is a wellstudied pest of pine trees over the Mediterranean basin. Its caterpillars bear urticating hair, causing public and animal health concern (Battisti, Holm, Fagrell, & Larsson, 2011;Battisti, Larsson, & Roques, 2017;Rodr ıguez-Mahillo et al., 2012). Briefly, T. pityocampa reproduces in summer and larval development occurs through autumn and winter all over its range. Reproduction immediately follows adult emergence, as adults have a very limited lifespan of 1-2 days. In 1997, a population of T. pityocampa showing a shift in phenology (reproduction in spring and larval development in summer) was discovered in the Mata Nacional de Leiria (MNL) in Portugal, where it co-occurred with individuals following the typical biological cycle (Pimentel et al., 2006;Santos et al., 2007). This unique shifted population is known as the "summer population" (SP) as opposed to all other known populations that are referred to as "winter populations" (WPs), in relation to the development time of the conspicuous larvae. The SP was initially restricted to a small area of the Mata Nacional and has been slowly expanding along the coast since then (Godefroid et al., 2016). Strikingly, all larval stages of the SP develop under radically different environmental conditions compared to the typical WPs, experiencing much higher temperatures that were so far supposed to be lethal to early larval stages (Huchon & D emolin, 1970;Santos, Paiva, Tavares, Kerdelhu e, & Branco, 2011). Understanding the scenario of this divergence is thus of interest in the context of current climate warming.
Previous studies have brought significant preliminary knowledge about the genetic and ecological characteristics of the peculiar SP.
Analysis of a fragment of the mitochondrial COI gene and of the ITS1 region showed a high sequence similarity between the SP and the sympatric WP, which suggested a local origin of the SP, while microsatellites revealed a high differentiation between the SP and all studied Iberian populations (Santos, Burban, et al., 2011;Santos et al., 2007). Moreover, a recent study showed that some individuals belonging to the SP genetic cluster emerge during the WP reproductive season and are referred to as "LateSP individuals" (Burban et al., 2016). This study also documented signs of rare hybridization between the two allochronic populations. Consistently, hybrids between SP and WP individuals could be obtained in laboratory conditions, and the time of adult emergence (a proxy for breeding time) was shown to be highly heritable (Branco, Paiva, Santos, Burban, & Kerdelhu e, 2017). These patterns suggested that the SP originated from the WP, following a phenological shift of a few individuals, and that gene flow between the SP and the WP is now highly reduced but not absent. Yet, the population genetic data relied on a limited number of microsatellite markers and did not allow us to characterize the successive stages of the divergence between the SP and the WP.
The objectives of the present work were to uncover major characteristics of this prime example of allochronic differentiation and significantly move towards the fulfilment of the criteria proposed by Taylor and Friesen (2017) by deciphering the evolutionary history of the SP and characterizing its different stages. In particular, we aimed at (i) inferring the timing of the divergence, (ii) measuring the migration rate between diverging populations at different stages to determine whether the differentiation occurred in the presence or absence of gene flow; (iii) determining the extent of population size changes, in particular to decipher if the SP experienced a strong bottleneck during the primary divergence step; and (iv) characterizing genomic regions possibly involved in the phenological shift and LEBLOIS ET AL.
| 265 subsequent adaptations. To achieve these aims, we took advantage of RAD-seq technology (Baird et al., 2008;Davey & Blaxter, 2011) and the recent release of a first draft genome for T. pityocampa (Gschloessl et al., Unpublished data) to obtain a large number of informative loci genotyped in the SP and in two WPs occurring in the same region. We used these loci to explore complex demographic scenarios including drift, migration and variation in population sizes, and to perform genomewide scans for signatures of selection. We could thereby successfully disentangle the main characteristics of the ongoing allochronic differentiation process.

| Biological material
A total of 180 individuals (adults or larvae) of T. pityocampa were collected in Portugal between May 2008 and September 2010 following Santos, Burban, et al., 2011. These individuals originated from three distinct populations or sampling sites: two were collected in the MNL (39°47 0 N 8°58 0 W) and corresponded to the winter and summer populations from Leiria (referred to as LWP and LSP), and one winter population was collected in the Setubal peninsula, near Apostic ßa (38°34 0 N 9°07 0 W), ca. 150 km south from Leiria, at the same elevation and longitude, and was hereafter denoted as AWP.
All individuals were sampled from the host plant Pinus pinaster Aiton.
Forty L5 larvae (i.e., fifth larval stage) belonging to the AWP were collected in December 2010; 40 males, 10 females and 20 L5 larvae belonging to the LWP were sampled in 2008-2010, and 60 males and 10 females belonging the LSP were sampled in 2008-2010. For the LSP and LWP, we used two subsamples in each case.
The first one included individuals assigned to the winter or the summer population based on the phenology observed in the field following Santos, Burban et al. (2011) (n = 40 for each population, subsamples referred to as LSP1 and LWP1). The second subsample gathered males caught with pheromone traps and previously genotyped using 17 microsatellite loci, from which we excluded the individuals assigned as LateSP, F1 and F2 following Burban et al. (2016); these subsamples (n = 30 in each population) will be referred to as LSP2 and LWP2. The exact sampling design is described in Table 1. 2.2 | RAD-sequencing and SNP calling

| RAD libraries
We carried out RAD tag sequencing (Baird et al., 2008) using both individual DNA and population pools of DNA. Individual data were used in combination with the pools to explore the possible causes of the somewhat unexpected results yielded from the LWP1 pool (see Results). In total, we constructed six PstI-digested paired-end (PE) RAD libraries as described in Gautier et al. (2013).
Twenty LSP1 (10 males and 10 females) and 20 LWP1 (10 males and 10 females) samples were barcoded and processed individually in libraries #1 to #3 (RAD Ind-Seq using 40 barcodes in total, see Table 1). The DNAs of each of the three sampled populations (all the 40 LSP1, all the 40 LWP1 and all the 40 AWP individuals, respectively) were pooled, and each population pool was barcoded with three different barcodes (nine barcodes in total, library #4). The RAD libraries #1 to #4 were then combined and PE sequenced (2 9 101 bp) on two Illumina HiSeq2000 lanes in the Edinburgh Genomics facility.
To replicate the experiment using only individuals genetically assigned to the LSP and LWP clusters following Burban et al. (2016), we further constructed 1 LSP2 and 1 LWP2 RAD libraries. Library #5 was constructed using 20 LWP2 males individually barcoded and a pool of all the 30 LWP2 DNAs that was identified with ten different barcodes. Finally, library #6 was the counterpart of library #5 for the LSP2 batch. Libraries #5 and #6 were each PE sequenced (2 9 101 pb) on a single Illumina HiSeq2000 lane on The Edinburgh Genomics facility.

| Bioinformatic analyses
Reads were first demultiplexed according to their barcode into individual and pool sequences using the default options of the process_radtags program of the STACKS package (version 0.99994) (Catchen, Amores, Hohenlohe, Cresko, & Postlethwait, 2011;Catchen, Hohenlohe, Bassham, Amores, & Cresko, 2013), including -q to remove poorquality reads. PCR duplicates were further discarded using the clone_filter program (STACKS version 0.99994). The remaining reads were trimmed by removing the first five bases and keeping the next 90 bases for reads 1 and keeping the first 95 bases for reads 2. The reads originating from the same population identified with different barcodes or from the same individual run on different lanes were merged to increase coverage. We decided to discard three LSP1 and two LWP1 individuals from further analyses because their final coverage was too low, hence resulting in 75 genotyped individuals. The

| Generation of the Ind-Seq SNP data set (gIS)
The RAD Ind-Seq bam files were processed using the mpileup command of SAMTOOLS 0.1.19   The five RAD Pool-Seq bam files were processed using the mpileup command of SAMTOOLS with default options and -d 5000 -q 20. The resulting file was further processed using a custom awk script to compute read counts for each alternative base after discarding bases with a BAQ quality score <25. A position was then considered as variable if (i) it had a coverage of more than 20 and less than c with haploid sample size n j . We further denote similarly y ij the allele count for the reference allele in the sample and n ij the haploid sample size (n ij ≤ n j ) for SNP i in population (pool) j. The pPS data set consists of the y ij 's and n ij 's, which were computed as follows: where ∧ and ∨ stand for the maximum and the minimum, respectively. Note that formally (2) provides the maximum-likelihood estimate of the y ij 's under the assumption that the a ij 's follow a binomial distribution a ij~B in(y ij /n j .n ij ). This approximation thus amounts in assuming equal contribution of each individual of the pool to the Pool-Seq read data.

| Estimation of F ST from Pool-Seq data
Pairwise and across populations F ST were estimated using the estimator by Weir and Cockerham (1984)

| Estimation and visualization of the scaled covariance matrix of the population allele frequencies
To further assess the overall structuring of genetic diversity, we estimated the scaled covariance matrix of allele frequencies (Ω) across LEBLOIS ET AL.
| 267 the five samples using the software BAYPASS (Gautier, 2015) with default options. When applied to read count data (rPS data), the Bayesian model underlying BAYPASS provides an accurate estimate of Ω by integrating over the unobserved allele count estimation. An eigenvalue decomposition of the resulting Ω matrix was further performed using the R function svd() to represent its major axis of variation. This latter approach amounts to performing a PCA that accounts rigorously for the specificities of the Pool-Seq data in the estimation of the covariance matrix.
then computed the (observed) allele frequencies f ij = x ij /n j for each SNP i and sample j, and the overall mean weighted allele frequency missing. Finally, we computed the standardized allele frequency A weighted principal component analysis (PCA) was then carried out based on the matrix M and using w as a (row) weight vector with the dudi.pca() function of the R package ade4 (Chessel, Dufour, & Thioulouse, 2004 a iQ c iQ where for SNP i, a iP (resp. a iQ ) represents the number of reads of the reference allele and c iP (resp. c iQ ) the coverage in population P (resp. Q) with haploid sample size n p . To assess the significance of the departure of each statistic to the null hypothesis (F = 0), Z-scores were computed as the ratio of the c F 3 mean to its standard deviation both estimated over 5,000 bootstrap samples.
2.4.2 | Estimation of tree topology and divergence times using KIMTREE For a given tree topology, we estimated divergence times using KIM-TREE 1.3 (Gautier & Vitalis, 2013), with the standard MCMC parameters recommended in the user manual. KIMTREE is a hierarchical Bayesian model where the allele frequencies are modelled along each branch of a population tree using Kimura's time-dependent diffusion approximation for genetic drift (Kimura, 1964). The support of the different topologies was assessed using a deviance information criterion (DIC) computed as described in Gautier and Vitalis (2013), and up to a constant term, with slight modifications for POOL-SEQ data: where y ij (t) represents the sampled allele count value for SNP i in pool j at the tth MCMC iteration (out of T) and b y ij the posterior mean of the corresponding allele count computed over the T MCMC samples.

| Inference of complex demographic histories using FASTSIMCOAL
To explore more complex demographic scenarios, we analysed the joint site frequency spectrum (SFS) of the three sampled populations using the approach of Nielsen (2000) implemented in FASTSIMCOAL 2.5.2.21 (Excoffier, Dupanloup, Huerta-Sanchez, Sousa, & Foll, 2013). This approach uses coalescent simulations to infer the likelihood of the observed SFS under any demographic model and performs well even in situations where the events are recent (Excoffier et al., 2013). The analyses were run on the folded SFS, that is, using the observed counts of the minor allele, obtained from the pPS data sets of the LSP2, LWP2 and AWP samples (LSP2 and LWP2 were chosen based on the F 3 statistics, see Result section) with a pool size of 30, that is, the haploid size of the smallest pool. We directly estimated the scaled parameters of the models in a coalescent or a diffusion timescale (i.e., 2Nl, 2Nm, T/N and lT) and then inferred the canonical parameters (divergence time T expressed in generations, migration rates and population sizes expressed in number of genes N, i.e., twice the number of diploid individuals) using the mutation rate l = 2.9 9 10 À9 mutations per generation per SNP, as recently estimated for the Lepidoptera Heliconius melpomene (Keightley et al., 2015). Note that this estimate of mutation rate is lower than that of Drosophila melanogaster (Haag-Liautard et al., 2007) and is expected to be more appropriate for our Lepidoptera model spe-  (Gautier, 2015) that are both handling Pool-Seq data. SELESTIM is based on a diffusion approximation for the distribution of allele frequencies in a subdivided population, which explicitly accounts for selection. In particular, SELESTIM assumes that each and every locus is targeted by selection to some extent, and estimates the strength of selection at each locus in each population. For each analysis, twenty-five short pilot runs (1,000 iterations each) were set to adjust the proposal distributions for each model parameter and, after a 100,000 burn-in period, 100,000 updating steps were performed with a thinning interval of 40 steps. Candidate markers under selection were selected on the basis of the distance between the posterior distribution for the locus-specific coefficient of selection and a "centring distribution" derived from the distribution of a genomewide parameter that accounts for the among-locus variation in selection strength.
SELESTIM uses the Kullback-Leibler divergence (KLD) as a distance between the two distributions, which is calibrated using simulations from a posterior predictive distribution based on the observed data (Vitalis et al., 2014). Hereafter, we report candidate markers with KLD values above the 99.9% quantile of the so-obtained empirical distribution of KLD.
In BAYPASS, we identified candidate markers using the XtX differentiation measure (G€ unther & Coop, 2013). This metrics might be viewed as a SNP-specific F ST that explicitly corrects for the scaled covariance of population allele frequencies (matrix Ω), making it robust to the unknown demographic history relating the populations.
The XtX was estimated using default options of BAYPASS. Pairwise correlations of the XtX estimates across ten independent runs were all found to be above 0.995 demonstrating the stability of the estimates. As described in Gautier (2015), the XtX was calibrated based on a posterior predictive distribution obtained by analysing a pseudo-observed data set of 250,000 SNPs generated under the inference model with hyperparameters fixed to their respective posterior means as estimated from the analysis of the original data.
Hereafter, we report candidate markers with XtX values above the 99.9% quantile of the so-obtained empirical distribution of XtX. To identify the population of origin of the signal for overly differentiated SNPs, we examined the posterior means of the standardized population allele frequencies defined for each SNP i and population j as:    suggested a higher variability across the LWP samples than across the LSP ones. They also revealed that some LateSP and introgressed individuals were included in the LWP1 pool that contained individuals that were only phenotypically assigned to their "phenological" population.

| F 3 -based tests of admixture
Three-population tests were carried out for all the 30 possible configurations among the five pool samples (Table S2)   We then analysed the joint SFS of the three populations using the estimated allele count data pPS for different demographic models. Considering a simple model of divergence and drift (Div-Drift), the best-supported tree according to the AIC corresponded to the best-supported tree obtained with KIMTREE (Table S3). In the following steps, we thus only considered the topology (LSP, (LWP, AWP)) ( Table 3) Comparison of AICs for these four demographic models showed that the data strongly supported the DivDriftVarMig model detailed in Figure 3 ( Table S4). Most of the 24 canonical parameters of this latter model, that is, all population sizes and divergence times as well as some migration rates, were inferred with good precision ( Table 3).
The few exceptions concerned some migration rates for which CIs were relatively broad.
Overall, the SFS analyses suggested that the ancestral SP and WP diverged relatively recently, ca. 560 generations ago (with a con- used, which is lower in Heliconius butterflies (Keightley et al., 2015) than in Drosophila (Haag-Liautard et al., 2007). LWP recently experienced a relatively severe contraction while AWP showed an expansion event. Accordingly, negative growth rates (corresponding to expansions in the coalescence analyses) were inferred for all but the LWP. Finally, inferred migration rates were relatively large for the pairs (AWP, LWP) and (LSP, LWP), with especially large values for the migration from LWP to AWP and to a lesser extent from LSP to LWP. On the contrary, inferred migration rates were lowest for the pair (LSP, AWP) as well as for the migration from SP to WP, that is, the ancestral populations.

| Genome-scan for adaptive differentiation
We performed genome scans for adaptive differentiation across the three-population samples (AWP, LWP2 and LSP2) using both the SELESTIM and BAYPASS software packages. Of the 54,040 analysed SNPs (4,170 SNPs from the original rPS data set were discarded as monomorphic in the three analysed population pools), 12 were found outlier with SELESTIM and 73 with BAYPASS; 11 were in common between the two analyses ( Figure 4; Figure S1 and Table S5).

| DISCUSSION
In this study, we analysed an illustrative example of "true allochronic differentiation" (sensu Taylor & Friesen, 2017) Table 3. SP is the ancestral population of LSP; WP is the ancestral population of AWP and LWP; P is the ancestral population of SP and WP; ANC is the ancestral population of P. All populations have undergone past exponential variations in size, except the ANC population that had a constant size through time. Because of the logarithm representation of time and population sizes, these past exponential population size variations appear linear on the graphic. Arrows represent migration from one population to another, their thickness being proportional to the inferred migration rates F I G U R E 4 Results of the genome scans for adaptive differentiation performed with SELESTIM and BAYPASS. For each SNP, the KLD estimated with SELESTIM that quantifies to which extent the locus-specific coefficient of selection is extreme is plotted against the XtX measure of differentiation estimated with BAYPASS. The horizontal (resp. vertical) dotted line indicates the 0.1% significance thresholds for the KLD (resp. XtX) analysis that was determined as the 99.9% quantile of an empirical distribution obtained after analysing a pseudo-observed data set simulated under the SELESTIM (resp. BAYPASS) null model. According to these thresholds, SNPs are displayed in red (outliers based on both the KLD and XtX), in blue (outlier based on the KLD only), in green (outlier based on the XtX only) or in black (not outlier) [Colour figure can be viewed at wileyonlinelibrary.com] sympatric differentiation is to know whether it occurred in the presence or absence of gene flow in the first steps of the divergence, and how migration evolved over time (Powell, Forbes, Hood, & Feder, 2014;Smadja & Butlin, 2011). The question of the levels of gene flow can shed light on the differentiation process and impact the possible fate of the diverging populations. In the particular case of allochronic differentiation, the shift in breeding time can occur progressively, an overlap in reproductive times of the incipient populations then maintaining gene flow (with some similarities between isolation by distance and isolation by time in this case, see Hendry & Day, 2005). Conversely, it can also appear as an abrupt phenological change that would immediately disrupt gene flow and lead to an "automatic" complete assortative mating, acting as an "automatic magic trait" sensu Servedio, Doorn, Kopp, Frame, and Nosil (2011). Our results showed that the first step of the differentiation occurred in a context of highly limited gene flow between the ancestral SP and WP (migration rate 10 À5 to 10 À8 ). This corroborates the hypothesis of a sudden event of divergence, resulting in an immediate barrier to gene flow between the two incipient populations.
Allochrony can in some situations evolve as a by-product of another primary driver of speciation, such as host plant shift followed by alteration of breeding time to match with the new host's phenology (Powell et al., 2014). It is not possible from our results to rule out the hypothesis that the two populations primarily diverged due to other factors, and that allochrony evolved more recently and would now be the main differentiated trait. On the other hand, no host or habitat change is associated with the differentiation of the SP. The land was once covered by mixed forests and shrubs (AFN-Autoridade Florestal Nacional, 2012) and then sowed with P. pinaster during the 13th and early 14th century. The divergence of the SP probably occurred after this large afforestation programme, which took place ca. 700 years ago, when P. pinaster was already predominant in this region. We thus conclude that in the particular case of the pine processionary moth, allochrony can still be hypothesized to be the initial driver of divergence. It is very likely that the periods of adult activity of the two diverging populations did not overlap in the early phases of their differentiation, immediately disrupting gene flow. A scenario of an initial mutation in key genes involved in seasonal rhythms or affecting diapause termination which first occurred by chance and drove the differentiation event in a very limited number of founder individuals can thus be favoured (Schluter, 2009), and would be consistent with the high heritability found in experimental rearing . This information is crucial for our understanding of the allochronic differentiation process.
We obtained a relatively recent estimate of the divergence time, but our results suggest that the SP was already present few hundred years before its discovery in 1997 (Santos, Burban, et al., 2011;Santos et al., 2007). No mention was found in the historical archives of the Mata Nacional de Leiria (MB, pers. obs.), even though these archives contain much information because the national park has been a major wood production area for more than seven centuries. Yet, it is also possible that the ancestral SP evolved in the same region, but outside the limits of the park, and remained undocumented in historical times. In a recent study, Godefroid et al. (2016) showed that the current distribution of the LSP is limited by the high summer temperature occurring elsewhere in Portugal, even though larvae of this population were proved to cope better with higher temperatures than larvae of Portuguese and French WPs (Santos, Paiva, et al., 2011). In the first steps of the differentiation, milder environmental conditions could have favoured the success of the diverging population. Interestingly, a period of colder climate known as the Little Ice Age occurred between years 1300 and 1900, including in Portugal, bringing favourable climatic conditions (Abrantes et al., 2005;Bartels-J onsd ottir, Knudsen, Abrantes, Lebreiro, & Eir ıksson, 2006). Other phenotypic trait divergences between the SP and the WP were documented, with obvious adaptations to the environmental changes experienced by the SP eggs and larvae due to the shift in breeding time (Rocha et al., 2017;Santos, Paiva, Rocha, Kerdelhu e, & Branco, 2013;Santos, Paiva, et al., 2011), consistent with the concept of "adaptation by time" proposed by Hendry and Day (2005). Whether such phenotypic changes occurred over ca. 500 years or whether they occurred over some tens of generations as previously suggested (Santos, Burban, et al., 2011;Santos et al., 2007), these adaptations can still be considered as rapid.

| Recent demographic changes in the diverging population and increased recent gene flow
The best demographic model we obtained further suggested that the SP experienced a recent bottleneck ca. 70 years ago, which reduced the population to a few hundred reproducers at most. The SP then expanded again until its high current population size (between 25,000 and 100,000 individuals). The cause of this recent and drastic reduction in size is difficult to characterize and could be due to a local climatic or epidemiological event or to human activities (e.g., local habitat destruction, forest fire, management options). This bottleneck actually coincides with the recent establishment of intensive planning and forest management in the MNL. The first forest plan dates back from 1892 and was intensified during the 1960s, including management by clear-cuts and development of 120 km of forest roads (AFN-Autoridade Florestal Nacional, 2012). This major demographic event is consistent with the fact that the SP remained undetected in the recent history and was discovered only recently during a very severe and thus conspicuous outbreak in 1997 (Pimentel et al., 2006;Santos et al., 2007). Parallel to the SP history, our model also suggested a complex scenario for the studied WPs. AWP and LWP diverged ca. 200 years ago, with a very strong founder event in Apostic ßa as the estimated population size reached 43 individuals only. This event could be linked to human activities and to the deforestation process that occurred to provision wood and agricultural goods, which dramatically decreased forest land in the region of Lisbon (Devy-Vareta, 1985). This probably tended to fragment the PPM habitat and strongly reduced its populations in the vicinity of Lisbon. It is worth noting that in the recent years, the population size has strongly increased in Apostic ßa, which is consistent with the recent afforestation activity, whereas the Leiria WP tended to decrease. Whether the decline of the WP observed in the MNL could be linked to possible competition between the sympatric summer and winter populations should now be tested. Monitoring tools could moreover allow us to determine if this is a long-term trend or if the local LWP would increase again. On the other hand, our results consistently show that AWP was closely related to LWP, and could not be used as an out-group as we initially planned. A thorough phylogeographic study of Portuguese and/or Iberian populations would now be helpful to understand the genetic structure of populations in this PPM clade (Rousselet et al., 2010) and to develop further demographic analyses.
To complete the picture, our results suggested that some gene flow currently occurs between existing populations. Not surprisingly, in the best demographic model, migration rates were maximal between the two WPs but they were also relatively high in both directions between the two sympatric LSP and LWP (ca. 10 À3 ). This is consistent with the recent identification of few hybrid individuals by Burban et al. (2016). Interestingly, our results suggest that the level of gene flow between the sympatric populations is higher today than in earlier stages of differentiation. This could be explained by the recent geographic and demographic expansion observed in the SP, which could have increased the probability of contact and thus introgression between the two populations. We could also hypothesize that plasticity in reproductive time plays a role by allowing some degree of overlap in reproductive time between the two populations, which can possibly vary over time as occurs in some plants (Devaux & Lande, 2008). Some individuals belonging to the SP but emerging during the LWP reproductive season were recently identified with molecular markers (Burban et al., 2016). Such "LateSP" individuals can only be identified through genotyping and could also allow some introgression between the two populations. The results further showed that assigning individuals from their phenology alone can lead to erroneous mixing of some LateSP individuals in the LWP1 pool, and that robust results could only be obtained when pooling genetically well-characterized individuals. Preliminary observations suggest that some of these LateSP correspond to the last-emerging SP individuals, that is, to events at the tail-end of the distribution of SP emergence time in July, during the early WP season. Other LateSP actually emerge very late, after the WP season, and could correspond to a dysfunction in diapause termination (Burban et al., 2016). The origins and fate of these categories of LateSP remain to be studied.

| Identifying and interpreting signatures of selection
All of the SNPs identified by BAYPASS and SELESTIM as presumably under selection displayed population-specific signatures associated with both the LSP and the LWP, which did not allow us to clearly identify a pattern linked to the phenotypic evolution of the SP. It is likely that the strong drift experienced by the SP and the high level of differentiation between the SP and both LWP and AWP (F ST > 0.3) impedes optimal use of genomic scans of adaptation. A similar challenge in revealing functionally important loci due to a stronger than expected background differentiation was encountered by Lozier, Jackson, Dillon, and Strange (2016) in their study of Bombus colour patterns. Moreover, even if RAD-seq was proved to be a powerful approach to easily develop population genomic studies for nonmodel organisms, the technique only allows us to analyse a reduced proportion of the genome, which increases the likelihood of missing the genomic region truly targeted by selection (Lowry et al., 2017;but see McKinney, Larson, Seeb, & Seeb, 2017). Our study also pointed a major challenge in arthropod genomics, which is the low proportion of functionally annotated genes. We could annotate only two of the genes in the vicinity of the detected candidate SNP, which strongly limits the functional interpretation of the results.
Moreover, the draft genome currently available for T. pityocampa has low scaffold sizes (Gschloessl et al., Unpublished data), which explains why most of the identified SNP were found in different genomic fragments. Improving the genome assembly will greatly increase our analysing capacities.

| Perspectives and future directions
Several studies have recently identified candidate genes involved in circadian and seasonal rhythms and in diapause termination, and their roles and interactions are increasingly understood (Denlinger, 2002;Derks et al., 2015;. In particular, there is increasing evidence that genes involved in circadian rhythms are also involved in reproductive cycles (Fuchikawa et al., 2010;Levy, Kozak, Wadsworth, Coates, & Dopman, 2015;Ragland, Egan, Feder, Berlocher, & Hahn, 2011;Ragland & Keep, 2017). One possible approach will be to target those genes both to resequence them in the SP and WP and possibly identify sequence polymorphisms and to determine whether they are differentially expressed in the allochronic populations at key stages of the development. An alternative approach could be QTL-mapping, which has proved to be a successful strategy in a number of studies (e.g., Alem et al., 2013;Franchini et al., 2014). It is however expected to be tedious in the particular example of the pine processionary moth for which rearing in experimental conditions is a difficult task due to a high mortality, the urticating nature of its larvae and the obligate 1-year generation time (Berardi, Branco, Paiva, Santos, & Battisti, 2015;Branco et al., 2017;Rocha et al., 2017).

ACKNOWLEDG EMENTS
We thank H. Santos for sampling the PPM individuals analysed in this article. We are grateful to B. Gschloessl who facilitated access to the genome assembly and provided the SNP annotations, and to A. Estoup for fruitful discussions in the early steps of the project.

DATA ACCESSIBI LITY
Read count data for the five pool samples and individual genotyping data sets are provided in Table S7.