A Bos taurus sequencing methods benchmark for assembly, haplotyping, and variant calling - Projet SEQuençage OCCitanie INnovation
Article Dans Une Revue (Data Paper) Scientific Data Année : 2023

A Bos taurus sequencing methods benchmark for assembly, haplotyping, and variant calling

Résumé

Inspired by the production of reference data sets in the Genome in a Bottle project, we sequenced one Charolais heifer with different technologies: Illumina paired-end, Oxford Nanopore, Pacific Biosciences (HiFi and CLR), 10X Genomics linked-reads, and Hi-C. In order to generate haplotypic assemblies, we also sequenced both parents with short reads. From these data, we built two haplotyped trio high quality reference genomes and a consensus assembly, using up-to-date software packages. The assemblies obtained using PacBio HiFi reaches a size of 3.2 Gb, which is significantly larger than the 2.7 Gb ARS-UCD1.2 reference. The BUSCO score of the consensus assembly reaches a completeness of 95.8%, among highly conserved mammal genes. We also identified 35,866 structural variants larger than 50 base pairs. This assembly is a contribution to the bovine pangenome for the “Charolais” breed. These datasets will prove to be useful resources enabling the community to gain additional insight on sequencing technologies for applications such as SNP, indel or structural variant calling, and de novo assembly.

Domaines

Biologie animale
Fichier principal
Vignette du fichier
s41597-023-02249-1.pdf (4.78 Mo) Télécharger le fichier
Origine Fichiers éditeurs autorisés sur une archive ouverte
licence

Dates et versions

hal-04122201 , version 1 (08-06-2023)

Licence

Identifiants

Citer

Camille Eché, Carole Iampietro, Clément Birbes, Andreea Dréau, Claire Kuchly, et al.. A Bos taurus sequencing methods benchmark for assembly, haplotyping, and variant calling. Scientific Data , 2023, 10 (1), pp.369. ⟨10.1038/s41597-023-02249-1⟩. ⟨hal-04122201⟩
274 Consultations
73 Téléchargements

Altmetric

Partager

More