10.5061/DRYAD.RB1BT3J
Chang, Ching-Ho
University of Rochester
Chanvan, Ankita
University of Connecticut
Palladino, Jason
University of Connecticut
Wei, Xiaolu
University of Rochester
Martins, Nuno M. C.
Harvard Medical School
Santinello, Bryce
University of Connecticut
Chen, Chin-Chi
University of Connecticut
Erceg, Jelena
Harvard Medical School
Beliveau, Brian J.
Harvard Medical School
Wu, Chao-Ting
Harvard Medical School
Larracuente, Amanda M.
University of Rochester
Mellone, Barbara G
University of Connecticut
Data from: Islands of retroelements are major components of Drosophila
centromeres
Dryad
dataset
2019
centromere
Long read sequencing
ChIP-seq
Retroelement
National Science Foundation
https://ror.org/021nxhr62
1330667
2019-05-16T12:14:07Z
2019-05-16T12:14:07Z
en
https://doi.org/10.1371/journal.pbio.3000241
851668393 bytes
1
CC0 1.0 Universal (CC0 1.0) Public Domain Dedication
Centromeres are essential chromosomal regions that mediate kinetochore
assembly and spindle attachments during cell division. Despite their
functional conservation, centromeres are amongst the most rapidly evolving
genomic regions and can shape karyotype evolution and speciation across
taxa. Although significant progress has been made in identifying
centromere-associated proteins, the highly repetitive centromeres of
metazoans have been refractory to DNA sequencing and assembly, leaving
large gaps in our understanding of their functional organization and
evolution. Here, we identify the sequence composition and organization of
the centromeres of Drosophila melanogaster by combining long-read
sequencing, chromatin immunoprecipitation for the centromeric histone
CENP-A, and high-resolution chromatin fiber imaging. Contrary to previous
models that heralded satellite repeats as the major functional components,
we demonstrate that functional centromeres form on islands of complex DNA
sequences enriched in retroelements that are flanked by large arrays of
satellite repeats. Each centromere displays distinct size and arrangement
of its DNA elements but is similar in composition overall. We discover
that a specific retroelement, G2/Jockey-3, is the most highly enriched
sequence in CENP-A chromatin and is the only element shared among all
centromeres. G2/Jockey-3 is also associated with CENP-A in the sister
species Drosophila simulans, revealing an unexpected conservation despite
the reported turnover of centromeric satellite DNA. Our work reveals the
DNA sequence identity of the active centromeres of a premier model
organism and implicates retroelements as conserved features of centromeric
DNA.
FileS1. Custom repeat libraryWe created a custom Drosophila-specific
consensus repeat library modified from RepBase v20150807 to include all
complex satellite DNAs from Drosophila
melanogaster.File.S1.Chang_et_al.fastaFileS2. ChIPtigs from R1 libraryWe
created de novo contigs from ChIPseq reads (ChIPtigs) with Spades v3.11.0
(-t 24 -careful –sc;) for the R1 library.File.S2.Chang_et_al.fastaFileS3.
ChIPtigs from the R2 libraryWe subsampled reads from R2 ChIP-seq to 100x
coverage using BBnorm (v37.54) with the parameters "threads=24
prefilter=t target=100", and created de novo contigs from the
subsampled ChIPseq reads (ChIPtigs) with Spades v3.11.0 (-t 24 -careful
–sc;).File.S3.Chang_et_al.fastaFileS4. ChIPtigs from the R3 libraryWe
created de novo contigs from ChIPseq reads (ChIPtigs) with Spades v3.11.0
(-t 24 -careful –sc;) for the R3 library.File.S4.Chang_et_al.fastaFileS5.
ChIPtigs from the R4 libraryWe created de novo contigs from ChIPseq reads
(ChIPtigs) with Spades v3.11.0 (-t 24 -careful –sc;) for the R4
library.File.S5.Chang_et_al.fastaFileS6. ChIPtigs from the S2 libraryWe
created de novo contigs from S2 ChIPseq reads (ChIPtigs) with Spades
v3.11.0 (-t 24 -careful –sc;) from the S2
library.File.S6.Chang_et_al.fastaFileS7. Hybrid PacBio-Nanopore assemblyWe
assembled nanopore (Solare et al. 2018) and PacBio reads (Kim et al. 2014)
into a hybrid assembly using Canu v1.7 with default settings. The assembly
size is 162,798,260 bp with N50=5,104,646
bp.File.S7.Chang_et_al.fastaFileS8. PacBio-only assembly with extra
contigsWe use the PacBio-only assembly in Chang and Larracuente 2018 and
added 19 sequences with CENP-A-enriched repeats. Of these 19 sequences, 6
were contigs from the hybrid PacBio-Nanopore assembly (File S7) and the
rest were error-corrected PacBio reads.File.S8.Chang_et_al.fastaFileS9.
Repeat annotation file for the finished PacBio-only assembly with extra
contigsWe annotated the finished assembly using our custom repeat library
(-lib library.fasta -s) and RepeatMasker
4.06.File.S9.Chang_et_al.gff.txtFileS10. Gene annotation file for the
PacBio-only assembly with extra contigsWe transferred gene annotations
from Flybase r6.20 to our genome using BLAT and CrossMap
v0.2.5.File.S10.Chang_et_al.gffFile S11. The sequences of Stellaris probes
for RspThe following Stellaris probes are tagged with Quasar 570 and used
to detect Rsp sequences.File.S11.Chang_et_al.txtFile S12. The fasta
alignment of genomic IGS sequences from D. melanogaster and outgroup
speciesWe extracted all IGS elements from the genome using BLAST v2.7.1
with parameters “- task blastn -num_threads 24 -qcov_hsp_perc 90” and
custom scripts. We then aligned and manually inspected IGS sequences using
Geneious v8.1.6.File.S12.Chang_et_al.fastaFile S13. A fasta alignment of
G2/Jockey-3 sequences from different contigs and outgroup speciesWe
extracted the G2/Jockey-3 sequences based on Repeatmasker annotations and
custom scripts. We then aligned and manually inspected G2/Jockey-3
sequences using Geneious v8.1.6.File.S13.Chang_et_al.fastaFile S14. The
newick consensus tree of IGS sequences inferred using RAxMLWe constructed
maximum likelihood phylogenetic trees for IGS using RAxML v.8.2.11 with
parameters “-m GTRGAMMA -T24 -d -p 12345 -# autoMRE -k -x 12345 -f
a”.File.S14.Chang_et_al.nwkFile S15. The newick consensus tree of
G2/Jockey-3 sequences inferred using RAxMLWe constructed maximum
likelihood phylogenetic trees for G2/Jockey-3 using RAxML v.8.2.11 with
parameters “-m GTRGAMMA -T24 -d -p 12345 -# autoMRE -k -x 12345 -f
a”.File.S15.Chang_et_al.nwkFile S16. Oligopaint coordinates and
sequencesOligopaints sequences and information for centromeres X, 3, 4,
and Y. The columns indicate the centromere contig ID, start and end
coordinates of sequence, followed by the oligo sequence, and the melting
temperature (all.oligos.cen.islands). Included are also the same
Oligopaint sequences with 5’ and 3’ extensions containing the universal
primer followed by library-specific barcodes
(oligos.with.adaptors).File.S16.Chang_et_al.xlsx