10.5061/DRYAD.5D3RQ
Pennell, Matthew W.
University of Idaho
University of British Columbia
FitzJohn, Richard G.
Macquarie University
Cornwell, William K.
UNSW Sydney
Data from: A simple approach for maximizing the overlap of phylogenetic
and comparative data
Dryad
dataset
2016
data imputation
Embryophyta
phylogenetic comparative method
missing data
2016-11-23T00:00:00Z
2016-11-23T00:00:00Z
en
https://doi.org/10.1111/2041-210X.12517
1043687 bytes
1
CC0 1.0 Universal (CC0 1.0) Public Domain Dedication
Biologists are increasingly using curated, public data sets to conduct
phylogenetic comparative analyses. Unfortunately, there is often a
mismatch between species for which there is phylogenetic data and those
for which other data are available. As a result, researchers are commonly
forced to either drop species from analyses entirely or else impute the
missing data. A simple strategy to improve the overlap of phylogenetic and
comparative data is to swap species in the tree that lack data with
‘phylogenetically equivalent’ species that have data. While this procedure
is logically straightforward, it quickly becomes very challenging to do by
hand. Here, we present algorithms that use topological and taxonomic
information to maximize the number of swaps without altering the structure
of the phylogeny. We have implemented our method in a new R package
phyndr, which will allow researchers to apply our algorithm to empirical
data sets. It is relatively efficient such that taxon swaps can be quickly
computed, even for large trees. To facilitate the use of taxonomic
knowledge, we created a separate data package taxonlookup; it contains a
curated, versioned taxonomic lookup for land plants and is interoperable
with phyndr. Emerging online data bases and statistical advances are
making it possible for researchers to investigate evolutionary questions
at unprecedented scales. However, in this effort species mismatch among
data sources will increasingly be a problem; evolutionary informatics
tools, such as phyndr and taxonlookup, can help alleviate this issue.
Land plant taxonomic lookup tableThis dataset is a stable version (version
1.0.1) of the dataset contained in the taxonlookup R package (see
https://github.com/traitecoevo/taxonlookup for the most recent version).
It contains a taxonomic reference table for 16,913 genera of land plants
along with the number of recognized species in each genus.plant_lookup.csv