10.25349/D92C9C
Janzen, Evan
0000-0002-1646-3363
University of California, Santa Barbara
Shen, Yuning
University of California, Santa Barbara
Vazquez-Salazar, Alberto
University of California Los Angeles
Liu, Ziwei
0000-0002-1812-2538
MRC Laboratory of Molecular Biology
Blanco, Celia
University of California Los Angeles
Kenchel, Josh
University of California Los Angeles
Chen, Irene
0000-0001-6040-7927
University of California Los Angeles
Emergent properties as by-products of prebiotic evolution of
aminoacylation ribozymes
Dryad
dataset
2021
FOS: Chemical sciences
k-Seq
prebiotic evolution
aminoacylating ribozymes
Simons Foundation
https://ror.org/01cmst727
290356FY18
National Aeronautics and Space Administration
https://ror.org/027ka1x80
NNX16AJ32G
National Institute of General Medical Sciences
https://ror.org/04q48ey07
DP2GM123457
National Science Foundation
https://ror.org/021nxhr62
1935372
2022-06-08T00:00:00Z
2022-06-08T00:00:00Z
en
https://doi.org/10.1021/jacs.8b13298
https://doi.org/10.1093/nar/gkab199/6194417
https://doi.org/10.1007/s00239-020-09954-0
https://github.com/ichen-lab-ucsb/ClusterBOSS
https://github.com/ichen-lab-ucsb/WFLIVM_k-Seq
85399035664 bytes
6
CC0 1.0 Universal (CC0 1.0) Public Domain Dedication
The emergence of the genetic code was a major transition in the evolution
from a prebiotic RNA world to the earliest modern cells. A prominent
feature of the standard genetic code is error minimization, or the
tendency of mutations to be unusually conservative in preserving
biophysical features of the amino acid. While error minimization is often
assumed to result from natural selection, it has also been speculated that
error minimization may be a by-product of emergence of the genetic code.
During establishment of the genetic code in an RNA world,
self-aminoacylating ribozymes would enforce the mapping of amino acids to
anticodons. Here we show that expansion of the genetic code, through
co-option of ribozymes for new substrates, could result in error
minimization as an emergent property. Using self-aminoacylating ribozymes
previously identified during an exhaustive search of sequence space, we
measured the activity of thousands of candidate ribozymes on alternative
substrates (activated analogs for tryptophan, phenylalanine, leucine,
isoleucine, valine, and methionine). Related ribozymes exhibited
preferences for biophysically similar substrates, indicating that
co-option of existing ribozymes to adopt additional amino acids into the
genetic code would itself lead to error minimization. Furthermore,
ribozyme activity was positively correlated with specificity, indicating
that selection for increased activity would also lead to increased
specificity. These results demonstrate that by-products of the evolution
and functional expansion of the ribozyme system would lead to apparently
adaptive properties of the genetic code. 8.7.3
Data were collected from k-Seq experiments using methods similar to those
in https://doi.org/10.1021/jacs.8b13298
and https://doi.org/10.1093/nar/gkab199/6194417 using BWO, BFO, BLO, BIO,
BVO, or BMO as substrates for aminoacylating ribozymes. 8.7.3
This dataset contains all collected data for this project and the scripts
necessary for its analysis: BXO_k-seq: Folders for each of six k-Seq
experiments performed using BWO, BFO, BLO, BIO, BVO, and BMO,
each containing: raw.reads: compressed paired-end FASTQ files from
triplicate reactions (A, B, and C or D, E, and F) at five substrate
concentrations (2, 10, 50, 250, and 1250 uM) counts: joined and enumerated
reads for each sample generated using EasyDIVER
(https://link.springer.com/article/10.1007/s00239-020-09954-0)
bxo-results: analysis of counts files generated using k-Seq package
(https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkab199/6194417) k-seq_inputs: A folder containing raw reads and counts files for six input samples (A-F) used in k-Seq experiments fastq-trim: A folder containing a Bash script (Trimming.sh) and associated readme file for preprocessing of FASTQ files from k-Seq experiments for further processing by EasyDiver k-seq-fitting: A folder containing a preprocessing script (count-data-preprocessing.py) and associated readme file for preparing counts files for k-Seq fitting, as well as csv files that contain qPCR-measured RNA concentrations from each k-Seq sample (rna-ng.csv) and the median RNA concentration for sequences (wildtype, single-, and double-mutants of each of five families) in the input samples (input-rna-median-ng.csv) WFLIVM-k-seq-analysis: A folder containing scripts for processing k-Seq fitting results and an associated readme file. These scripts can be used to produce the included output file, WFLIVM-k-seq_merged_+r+I.csv, a merged csv file that contains k-Seq fitting results from each experiment as well as additional information including: Associated family of each sequence Calculated catalytic enhancement values and associated 95% confidence intervals Additional promiscuity metrics like aromatic preference and promiscuity index (I) BXO_selection: Two folders for results from aminoacylation selections performed with BFO and BLO containing: raw.reads: compressed paired-end FASTQ files (four lanes) from input samples (R0) and five rounds of selection (R1-R5) in two replicate experiments (A and B) counts: joined and enumerated reads for each sample generated using EasyDIVER clusters: clustered counts files showing sequences group by similarity generated using ClusterBOSS (https://github.com/ichen-lab-ucsb/ClusterBOSS) ClusterBOSS: A folder containing the script (ClusterBOSS.py) and readme files for ClusterBOSS Additional scripts for processing these data can be found at https://github.com/ichen-lab-ucsb/WFLIVM_k-Seq 8.7.3