10.5061/DRYAD.149M8
Ralph, Duncan K.
Fred Hutchinson Cancer Research Center
Matsen IV, Frederick A.
Matsen, Frederick A.
Fred Hutchinson Cancer Research Center
Data from: Consistency of VDJ rearrangement and substitution parameters
enables accurate B cell receptor sequence annotation
Dryad
dataset
2016
B cell maturation
antibody
immunoglobulin
hidden Markov model
2016-01-27T16:00:23Z
en
https://doi.org/10.1371/journal.pcbi.1004409
275946340 bytes
1
CC0 1.0 Universal (CC0 1.0) Public Domain Dedication
VDJ rearrangement and somatic hypermutation work together to produce
antibody-coding B cell receptor (BCR) sequences for a remarkable diversity
of antigens. It is now possible to sequence these BCRs in high throughput;
analysis of these sequences is bringing new insight into how antibodies
develop, in particular for broadly-neutralizing antibodies against HIV and
influenza. A fundamental step in such sequence analysis is to annotate
each base as coming from a specific one of the V, D, or J genes, or from
an N-addition (a.k.a. non-templated insertion). Previous work has used
simple parametric distributions to model transitions from state to state
in a hidden Markov model (HMM) of VDJ recombination, and assumed that
mutations occur via the same process across sites. However, codon frame
and other effects have been observed to violate these parametric
assumptions for such coding sequences, suggesting that a non-parametric
approach to modeling the recombination process could be useful. In our
paper, we find that indeed large modern data sets suggest a model using
parameter-rich per-allele categorical distributions for HMM transition
probabilities and per-allele-per-position mutation probabilities, and that
using such a model for inference leads to significantly improved results.
We present an accurate and efficient BCR sequence annotation software
package using a novel HMM “factorization” strategy. This package, called
partis (https://github.com/psathyrella/partis/), is built on a new
general-purpose HMM compiler that can perform efficient inference given a
simple text description of an HMM.
Parameters and HMM filesParameter values and HMM model files for all
subsets of all humans in the Adaptive and Vollmers data
sets.parameters.tgzFull set of plotsPlots of all alleles for all humans in
the Adaptive and Vollmers data sets.plots.tgz