10.5061/DRYAD.GK90T
Chaney, Julie L.
Notre Dame University
University of Notre Dame
Steele, Aaron
Notre Dame University
University of Notre Dame
Carmichael, Rory
Notre Dame University
University of Notre Dame
Rodriguez, Anabel
Notre Dame University
University of Notre Dame
Specht, Alicia T.
Notre Dame University
University of Notre Dame
Ngo, Kim
Notre Dame University
University of Notre Dame
Li, Jun
Notre Dame University
University of Notre Dame
Emrich, Scott J.
Clark, Patricia L.
Notre Dame University
University of Notre Dame
Emrich, Scott
Notre Dame University
University of Notre Dame
Data from: Widespread position-specific conservation of synonymous rare
codons within coding sequences
Dryad
dataset
2018
Sequence alignment
Protein structure comparison
Protein domains
Gene ontologies
2018-07-24T00:00:00Z
2018-07-24T00:00:00Z
en
https://doi.org/10.1371/journal.pcbi.1005531
538864379 bytes
1
CC0 1.0 Universal (CC0 1.0) Public Domain Dedication
Synonymous rare codons are considered to be sub-optimal for gene
expression because they are translated more slowly than common codons. Yet
surprisingly, many protein coding sequences include large clusters of
synonymous rare codons. Rare codons at the 5’ terminus of coding sequences
have been shown to increase translational efficiency. Although a general
functional role for synonymous rare codons farther within coding sequences
has not yet been established, several recent reports have identified
rare-to-common synonymous codon substitutions that impair folding of the
encoded protein. Here we test the hypothesis that although the usage
frequencies of synonymous codons change from organism to organism, codon
rarity will be conserved at specific positions in a set of homologous
coding sequences, for example to tune translation rate without altering a
protein sequence. Such conservation of rarity–rather than specific codon
identity–could coordinate co-translational folding of the encoded protein.
We demonstrate that many rare codon cluster positions are indeed conserved
within homologous coding sequences across diverse eukaryotic, bacterial,
and archaeal species, suggesting they result from positive selection and
have a functional role. Most conserved rare codon clusters occur within
rather than between conserved protein domains, challenging the view that
their primary function is to facilitate co-translational folding after
synthesis of an autonomous structural unit. Instead, many conserved rare
codon clusters separate smaller protein structural motifs within
structural domains. These smaller motifs typically fold faster than an
entire domain, on a time scale more consistent with translation rate
modulation by synonymous codon usage. While proteins with conserved rare
codon clusters are structurally and functionally diverse, they are
enriched in functions associated with organism growth and development,
suggesting an important role for synonymous codon usage in organism
physiology. The identification of conserved rare codon clusters advances
our understanding of distinct, functional roles for otherwise synonymous
codons and enables experimental testing of the impact of synonymous codon
usage on the production of functional proteins.
aln_ortho.tarAlignment data of orthologs (protein) in FASTA format, with
induced gaps represented by a '-'lookup_tables.tarMapping of
project short gene IDs to NCBI/Genbank accessions, including relevant
annotation datault_mm.tarMin-max data for each aligned protein, one row
per position of the alignment. This data went into the co-occurrence
analysisminmax_pval_aln.tarMin max data with p-values per position, across
the orthologs in alignment position spacegroupCATH.boundariesaligned
position relative to CATH domains; format is: ortholog group, IN/OUT,
position in domain, position in alignment. If a given position is not in a
domain NA is placed in the domain positiongroupSCOP.boundariesaligned
position relative to SCOP domains; format is: ortholog group, IN/OUT,
position in domain, position in alignment. If a given position is not in a
domain NA is placed in the domain positionsig_pruned_masked_p.05final set
of significantly co-occurring codons with a p-value <= 0.05 after
pruning as described in the text