10.5061/DRYAD.7PVMCVDVG
Edwards, Scott
0000-0003-2535-6217
Harvard University
Burley, John
Brown University
Orzechowski, Sophie
Harvard University
Sin, Yun Wa
University of Hong Kong
Data from: Whole-genome phylogeography of the Blue-faced honeyeater
(Entomyzon cyanotis) and discovery and characterization of a neo-Z
chromosome
Dryad
dataset
2022
FOS: Earth and related environmental sciences
Harvard University
https://ror.org/03vek6s52
Erasmus Mundus Master Programme in Evolutionary Biology*
Harvard University
2022-06-26T00:00:00Z
2022-06-26T00:00:00Z
en
5542712174 bytes
5
CC0 1.0 Universal (CC0 1.0) Public Domain Dedication
Whole-genome surveys of genetic diversity and geographic variation often
yield unexpected discoveries of novel structural variation, which
long-read DNA sequencing can help clarify. Here we report on whole-genome
phylogeography of a bird exhibiting classic vicariant geographies across
Australia and New Guinea, the Blue-faced honeyeater (Entomyzon cyanotis),
and the discovery and characterization of a novel neo-Z chromosome by
long-read sequencing. Using short-read genome-wide SNPs, we inferred
population divergence events within E. cyanotis across the Carpentarian
and other biogeographic barriers during the Pleistocene (~0.3 – 1.7 MYA).
Evidence for introgression between non-sister populations supports a
hypothesis of reticulate evolution around a triad of dynamic barriers
around Pleistocene Lake Carpentaria between Australia and New Guinea.
During this phylogeographic survey, we discovered a large (134 Mbp) neo-Z
chromosome and explore its diversity, divergence and introgression
landscape. We show that, as in some Sylvioid passerine birds, a fusion
occurred between chromosome 5 and the Z chromosome to form a neo-Z
chromosome, with the ancestral pseudoautosomal region (PAR) appearing to
become non-recombinant between Z and W, along with most of the fused
chromosome 5 (~37.2 Mbp). The added non-recombinant portion of the neo-Z
displays reduced heterozygosity and faster population genetic
differentiation compared with the ancestral Z. Yet, the new PAR shows
elevated diversity and reduced differentiation compared to autosomes,
potentially resulting from introgression. In our case, long-read
sequencing helped clarify the genomic landscape of population divergence
on autosomes and sex chromosomes in a species where prior knowledge of
genome structure was still incomplete.
We generated VCF files for downstream analyses using the GATK pipeline
(McKenna et al. 2010) and samtools (Li et al. 2009). We generated
estimates of heterozygosity and coverage across scaffolds with samtools.
Sliding window population genetic statistics were generated using ANGSD
and ngstools (Fumagalli et al. 2013, 2014; Korneliussen et al. 2014). pixy
was used to calculate population statistics across windows (Korunes and
Samuk 2021). We used SNAPP to generate a coalescent estimate of the
population tree using SNPs (Bryant et al. 2012). We estimated migration
surfaces with EEMS (Petkova et al. 2016). Satsuma was used to align
contigs and scaffolds between species, sexes and different assemblies
(Grabherr et al. 2010). We generated statistics for a phylogenetic network
using TreeMix (Pickrell and Pritchard 2012). Bryant, D., Bouckaert, R.,
Felsenstein, J., Rosenberg, N. A., & RoyChoudhury, A. (2012).
Inferring Species Trees Directly from Biallelic Genetic Markers: Bypassing
Gene Trees in a Full Coalescent Analysis. Molecular Biology and Evolution,
29(8), 1917–1932. https://doi.org/10.1093/molbev/mss086 Fumagalli, M.,
Vieira, F. G., Korneliussen, T. S., Linderoth, T., Huerta-Sánchez, E.,
Albrechtsen, A., & Nielsen, R. (2013). Quantifying population
genetic differentiation from next-generation sequencing data. Genetics,
195(3), 979–992. https://doi.org/10.1534/genetics.113.154740 Fumagalli,
M., Vieira, F. G., Linderoth, T., & Nielsen, R. (2014). ngsTools:
Methods for population genetics analyses from next-generation sequencing
data. Bioinformatics (Oxford, England), 30(10), 1486–1487.
https://doi.org/10.1093/bioinformatics/btu041 Grabherr, M. G., Russell,
P., Meyer, M., Mauceli, E., Alföldi, J., Di Palma, F., &
Lindblad-Toh, K. (2010). Genome-wide synteny through highly sensitive
sequence alignment: Satsuma. Bioinformatics, 26(9), 1145–1151.
https://doi.org/10.1093/bioinformatics/btq102 Korneliussen, T. S.,
Albrechtsen, A., & Nielsen, R. (2014). ANGSD: Analysis of Next
Generation Sequencing Data. BMC Bioinformatics, 15(1), 356.
https://doi.org/10.1186/s12859-014-0356-4 Korunes, K. L., & Samuk,
K. (2021). pixy: Unbiased estimation of nucleotide diversity and
divergence in the presence of missing data. Molecular Ecology Resources,
21(4), 1359–1368. https://doi.org/10.1111/1755-0998.13326 Li, H.,
Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G.,
Abecasis, G., Durbin, R., & 1000 Genome Project Data Processing
Subgroup. (2009). The Sequence Alignment/Map format and SAMtools.
Bioinformatics, 25(16), 2078–2079.
https://doi.org/10.1093/bioinformatics/btp352 McKenna, A., Hanna, M.,
Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., Garimella, K.,
Altshuler, D., Gabriel, S., Daly, M., & DePristo, M. A. (2010).
The Genome Analysis Toolkit: A MapReduce framework for analyzing
next-generation DNA sequencing data. Genome Research, 20(9), 1297–1303.
https://doi.org/10.1101/gr.107524.110 Petkova, D., Novembre, J., &
Stephens, M. (2016). Visualizing spatial population structure with
estimated effective migration surfaces. Nature Genetics, 48(1), 94–100.
https://doi.org/10.1038/ng.3464 Pickrell, J. K., & Pritchard, J.
K. (2012). Inference of Population Splits and Mixtures from Genome-Wide
Allele Frequency Data. PLoS Genetics 8(11), e1002967.
See README.txt.