Acceleration of important sampling methods for the calculation of likelihood in population genetics

Coralie Merle; Raphaël Leblois; J.-M Marin; P Pudlo; F. Rousset

Poster De Conférence Année : 2014

Acceleration of important sampling methods for the calculation of likelihood in population genetics

(1, 2) , (2) , (1) , (1, 2) , (3)

1
2
3

Coralie Merle

Fonction : Auteur

Institut de Mathématiques et de Modélisation de Montpellier

Centre de Biologie pour la Gestion des Populations

Raphaël Leblois

Fonction : Auteur
PersonId : 736469
IdHAL : raphael-leblois
ORCID : 0000-0002-3051-4497
IdRef : 114914508

Centre de Biologie pour la Gestion des Populations

J.-M Marin

Fonction : Auteur

Institut de Mathématiques et de Modélisation de Montpellier

P Pudlo

Fonction : Auteur

Institut de Mathématiques et de Modélisation de Montpellier

Centre de Biologie pour la Gestion des Populations

F. Rousset

Fonction : Auteur
PersonId : 19671
IdHAL : francois-rousset
ORCID : 0000-0003-4670-0371
IdRef : 073298182

Institut des Sciences de l'Evolution de Montpellier

Résumé

• Model: The population evolves under a Wright-Fisher model. Hence, the sample evolves according to the King-man coalescent. • Problem: The likelihood is the sum over all possible histories (not observed), which is not feasible in practice. • Solution : A class of Monte-Carlo methods, based on Sequential Important Sampling (SIS), allows the likelihood calculation despite the hidden process. The efficiency of these methods was been proven by [1], [2], [3] and [5]. In the IS sheme, the importance sampling distributions propose histories which contribute most to the sum. But these distribution are not efficient for equilibrium population models and the computation time strongly increases for the same accuracy of the likelihood estimation, so that we can not have a correct estimation. • Improvement : For changing population size model, we decide to use : Sequential Important Sampling with Resam-pling (SISR). The idea is to resample, during the backward building of the histories, so that we learn wich are the histories proposed by the IS distribution which really contribute most of the sum and so save computation time. Genetic polymorphism modelling Evolution Model • A sample of n gene copies at a single locus from the population of effective size N(t). • For any given locus, each individual has exactly one ancestor in the previous generation. • The ancestral relationships between the individuals of the sample going back in time to the MRCA are described by a gene tree, distributed according to the n-coalescent. Gene tree of microsatellite markers. Demographic model We consider a demographic model, never treated before, where the population effective size varies in the time, notes N(t). In particular we work with an Exponentially Contracting Population. If we look backward in time, we have : N(t) =        N 0 N anc N 0 t/D si 0 ≤ t ≤ D N anc si t ≥ D. Exponentially contracting population model. Likelihood of the data The histories are not observed. The likelihood of the data is obtained by summing over all the possibilities : Prob(n obs |θ) = H p 0 (n 0) m+1 =1 p s (n |n −1) f (s |n −1 , s −1) dH = g(n obs , H|θ)dH. Where : n obs : observed data, (n 0 ,. .. , n m+1) count vector of lenght (m + 2), such as n m = n obs and |n m+1 | = |n obs | + 1, s 0 , s 1 , s 2 ,. .. : dates of jump (in forward time), g(n obs , H|θ) = 1{H ∈ H}p(H), H : set of compatible histories with the observed data. Correction of Importance Sampling distribution by resampling (SISR) Changing effective population size introduce a strong inhomogeneity in the WF model and the IS distributions become inefficient. We decide to resample in our collection of simulated histories : • to prune the bad histories, • to produce multiple copies of good histories, to generate futur better histories. How ? We stop the SIS algorithm that builds the genealogies in parallel at a given time and we modify the composition of the histories collection according to the partial importance weights at this date. This new algorithm is called SISR, voir [4]. Numerical results when comparing SIS and SISR Our parameter of interest is the vector (θ, D, θ anc). We try to estimate this parameter by maximum likelihood inference. The likelihood of the data is estimated by the SIS or SISR algorithm. Comparison of relative Effective Sample Size when the true parameter is θ = 0.4, D = 0.25 and θ anc = 40. Histogram Plot Comparison of relative bias and Root Mean Squar Error (RMSE), analysis with 100 (left) or 2000 (center and right) genealogies, by SIS and SISR of data sets simulated under the ECP model. SIS SISR Rel. bias θ 0.56 0.364 D −0.0201 −0.0308 θ anc 0.0479 −0.138 RMSE θ 0.711 0.557 D 0.142 0.142 θ anc 0.369 0.305 With θ = 0.4, D = 1.25 and θ anc = 400. SIS SISR Rel. bias θ 4.92 1.62 D −0.0606 0.177 θ anc 0.0438 −0.00967 RMSE θ 5.17 1.82 D 0.141 0.417 θ anc 0.245 0.21 With θ = 0.4, D = 0.25 and θ anc = 400. SIS SISR Rel. bias θ 1.35 0.188 D 0.191 0.687 θ anc 0.0522 0.0196 RMSE θ 2.98 1.82 D 0.442 0.909 θ anc 0.322 0.267 With θ = 0.4, D = 0.25 and θ anc = 40.

Domaines

Evolution [q-bio.PE] Génétique des populations [q-bio.PE]

Fichier principal

PosterJourneeIBC.pdf (1.54 Mo)

Origine	Fichiers produits par l'(les) auteur(s)

Raphael Leblois : Connectez-vous pour contacter le contributeur

https://hal.inrae.fr/hal-02932305

Soumis le : lundi 7 septembre 2020-16:24:47

Dernière modification le : samedi 27 avril 2024-03:10:07

Archivage à long terme le : vendredi 4 décembre 2020-17:55:04

Dates et versions

hal-02932305 , version 1 (07-09-2020)

Identifiants

HAL Id : hal-02932305 , version 1

Citer

Coralie Merle, Raphaël Leblois, J.-M Marin, P Pudlo, F. Rousset. Acceleration of important sampling methods for the calculation of likelihood in population genetics. Day of The Institute for Computational Biology (IBC), May 2014, Montpellier, France. 2014. ⟨hal-02932305⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

IRD CIRAD EPHE CNRS INRA I3M_UMR5149 ISEM INSMI IMAG-MONTPELLIER AGROPOLIS PSL B3ESTE UNIV-MONTPELLIER INSTITUT-AGRO-MONTPELLIER INRAE INRAEOCCITANIEMONTPELLIER CBGP

35 Consultations

15 Téléchargements

Acceleration of important sampling methods for the calculation of likelihood in population genetics

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager