Accéder directement au contenu Accéder directement à la navigation

Acceleration of important sampling methods for the calculation of likelihood in population genetics

Abstract : • Model: The population evolves under a Wright-Fisher model. Hence, the sample evolves according to the King-man coalescent. • Problem: The likelihood is the sum over all possible histories (not observed), which is not feasible in practice. • Solution : A class of Monte-Carlo methods, based on Sequential Important Sampling (SIS), allows the likelihood calculation despite the hidden process. The efficiency of these methods was been proven by [1], [2], [3] and [5]. In the IS sheme, the importance sampling distributions propose histories which contribute most to the sum. But these distribution are not efficient for equilibrium population models and the computation time strongly increases for the same accuracy of the likelihood estimation, so that we can not have a correct estimation. • Improvement : For changing population size model, we decide to use : Sequential Important Sampling with Resam-pling (SISR). The idea is to resample, during the backward building of the histories, so that we learn wich are the histories proposed by the IS distribution which really contribute most of the sum and so save computation time. Genetic polymorphism modelling Evolution Model • A sample of n gene copies at a single locus from the population of effective size N(t). • For any given locus, each individual has exactly one ancestor in the previous generation. • The ancestral relationships between the individuals of the sample going back in time to the MRCA are described by a gene tree, distributed according to the n-coalescent. Gene tree of microsatellite markers. Demographic model We consider a demographic model, never treated before, where the population effective size varies in the time, notes N(t). In particular we work with an Exponentially Contracting Population. If we look backward in time, we have : N(t) =        N 0 N anc N 0 t/D si 0 ≤ t ≤ D N anc si t ≥ D. Exponentially contracting population model. Likelihood of the data The histories are not observed. The likelihood of the data is obtained by summing over all the possibilities : Prob(n obs |θ) = H p 0 (n 0) m+1 =1 p s (n |n −1) f (s |n −1 , s −1) dH = g(n obs , H|θ)dH. Where : n obs : observed data, (n 0 ,. .. , n m+1) count vector of lenght (m + 2), such as n m = n obs and |n m+1 | = |n obs | + 1, s 0 , s 1 , s 2 ,. .. : dates of jump (in forward time), g(n obs , H|θ) = 1{H ∈ H}p(H), H : set of compatible histories with the observed data. Correction of Importance Sampling distribution by resampling (SISR) Changing effective population size introduce a strong inhomogeneity in the WF model and the IS distributions become inefficient. We decide to resample in our collection of simulated histories : • to prune the bad histories, • to produce multiple copies of good histories, to generate futur better histories. How ? We stop the SIS algorithm that builds the genealogies in parallel at a given time and we modify the composition of the histories collection according to the partial importance weights at this date. This new algorithm is called SISR, voir [4]. Numerical results when comparing SIS and SISR Our parameter of interest is the vector (θ, D, θ anc). We try to estimate this parameter by maximum likelihood inference. The likelihood of the data is estimated by the SIS or SISR algorithm. Comparison of relative Effective Sample Size when the true parameter is θ = 0.4, D = 0.25 and θ anc = 40. Histogram Plot Comparison of relative bias and Root Mean Squar Error (RMSE), analysis with 100 (left) or 2000 (center and right) genealogies, by SIS and SISR of data sets simulated under the ECP model. SIS SISR Rel. bias θ 0.56 0.364 D −0.0201 −0.0308 θ anc 0.0479 −0.138 RMSE θ 0.711 0.557 D 0.142 0.142 θ anc 0.369 0.305 With θ = 0.4, D = 1.25 and θ anc = 400. SIS SISR Rel. bias θ 4.92 1.62 D −0.0606 0.177 θ anc 0.0438 −0.00967 RMSE θ 5.17 1.82 D 0.141 0.417 θ anc 0.245 0.21 With θ = 0.4, D = 0.25 and θ anc = 400. SIS SISR Rel. bias θ 1.35 0.188 D 0.191 0.687 θ anc 0.0522 0.0196 RMSE θ 2.98 1.82 D 0.442 0.909 θ anc 0.322 0.267 With θ = 0.4, D = 0.25 and θ anc = 40.
Liste complète des métadonnées

Littérature citée [5 références]  Voir  Masquer  Télécharger
Déposant : Raphael Leblois <>
Soumis le : lundi 7 septembre 2020 - 16:24:47
Dernière modification le : mardi 2 février 2021 - 03:34:40
Archivage à long terme le : : vendredi 4 décembre 2020 - 17:55:04


Fichiers produits par l'(les) auteur(s)


  • HAL Id : hal-02932305, version 1


Coralie Merle, Raphaël Leblois, J.-M Marin, P Pudlo, F. Rousset. Acceleration of important sampling methods for the calculation of likelihood in population genetics. Day of The Institute for Computational Biology (IBC), May 2014, Montpellier, France. 2014. ⟨hal-02932305⟩



Consultations de la notice


Téléchargements de fichiers