{
  "id": "https://doi.org/10.5061/dryad.nm290",
  "doi": "10.5061/DRYAD.NM290",
  "url": "https://datadryad.org/dataset/doi:10.5061/dryad.nm290",
  "types": {
    "ris": "DATA",
    "bibtex": "misc",
    "citeproc": "dataset",
    "schemaOrg": "Dataset",
    "resourceType": "dataset",
    "resourceTypeGeneral": "Dataset"
  },
  "creators": [
    {
      "name": "Hickey, John M.",
      "nameType": "Personal",
      "givenName": "John M.",
      "familyName": "Hickey",
      "affiliation": [
        {
          "name": "University of New England",
          "schemeUri": "https://ror.org",
          "affiliationIdentifier": "https://ror.org/02n2ava60",
          "affiliationIdentifierScheme": "ROR"
        }
      ],
      "nameIdentifiers": []
    },
    {
      "name": "Gorjanc, Gregor",
      "nameType": "Personal",
      "givenName": "Gregor",
      "familyName": "Gorjanc",
      "affiliation": [
        {
          "name": "University of Ljubljana",
          "schemeUri": "https://ror.org",
          "affiliationIdentifier": "https://ror.org/05njb9z20",
          "affiliationIdentifierScheme": "ROR"
        }
      ],
      "nameIdentifiers": []
    }
  ],
  "titles": [
    {
      "title": "Data from: Simulated data for genomic selection and genome-wide association studies using a combination of coalescent and gene drop methods"
    }
  ],
  "publisher": {
    "name": "Dryad",
    "schemeUri": "https://ror.org/",
    "publisherIdentifier": "https://ror.org/00x6h5n95",
    "publisherIdentifierScheme": "ROR"
  },
  "container": {},
  "subjects": [
    {
      "subject": "quantitative trait loci (QTL)"
    },
    {
      "subject": "simulation method"
    },
    {
      "subject": "genome-wide association studies (GWAS)"
    },
    {
      "subject": "GenPred"
    },
    {
      "subject": "pedigrees"
    },
    {
      "subject": "shared data resources"
    }
  ],
  "contributors": [],
  "dates": [
    {
      "date": "2014-07-25T18:08:57Z",
      "dateType": "Issued"
    },
    {
      "date": "2014-07-25T18:08:57Z",
      "dateType": "Available"
    }
  ],
  "publicationYear": 2014,
  "language": "en",
  "identifiers": [],
  "sizes": [
    "3408310695 bytes"
  ],
  "formats": [],
  "version": "1",
  "rightsList": [
    {
      "rights": "Creative Commons Zero v1.0 Universal",
      "rightsUri": "https://creativecommons.org/publicdomain/zero/1.0/legalcode",
      "schemeUri": "https://spdx.org/licenses/",
      "rightsIdentifier": "cc0-1.0",
      "rightsIdentifierScheme": "SPDX"
    }
  ],
  "descriptions": [
    {
      "description": "An approach is described for simulating data sequence, genotype, and\n phenotype data to study genomic selection and genome-wide association\n studies (GWAS). The simulation method, implemented in a software package\n called AlphaDrop, can be used to simulate genomic data and phenotypes with\n flexibility in terms of the historical population structure, recent\n pedigree structure, distribution of quantitative trait loci effects, and\n with sequence and single nucleotide polymorphism-phased alleles and\n genotypes. Ten replicates of a representative scenario used to study\n genomic selection in livestock were generated and have been made publicly\n available. The simulated data sets were structured to encompass a spectrum\n of additive quantitative trait loci effect distributions, relationship\n structures, and single nucleotide polymorphism chip densities.",
      "descriptionType": "Abstract"
    },
    {
      "description": "File S11) AlphaDrop: executable for Linux 2) macs: MaCS executable for\n linux 3) msformatter: MaCS executable for linux 4) Seed.txt: a file\n containing a random seed for initialising AlphaDrop 5) RunMacs.sh: a shell\n script called by AlphaDrop when it runs MaCS 6) AlphaDropSpec.txt: the\n specification file for AlphaDrop 7) Pedigree.txt: an example externally\n supplied pedigree file 8) MaCsSimulationParameters.xlsx: an excel sheet\n with which MaCS parameters can be calculated 9) Ne100.sh: example of what\n to put into RunMacs.sh (Ne100 population of Hickey et al., 2011 Genetics\n Selection Evolution) 10) Ne1000.sh: example of what to put into RunMacs.sh\n (Ne1000 population of Hickey et al., 2011 Genetics Selection\n Evolution)FileS1.zipSimulated Data - Part 1Ten replicates of a livestock\n data structure were simulated. The structure was designed to cover a\n spectrum of QTL distributions, relationship structures, and SNP densities\n and to mimic some of the scenarios where genomic selection is applied. In\n each replicate sequence data for 4000 base haplotypes for each of thirty\n chromosomes was simulated using the Markovian Coalescence Simulator (MaCS)\n (Chen et al., 2009). The thirty chromosomes were each 100 cM in length\n comprising approximately 108 base pairs and were simulated using a per\n site mutation rate of 2.5*10-8 and an effective population size (Ne) of\n 100 in the final generation of the sequence simulation. The reduction of\n Ne in the preceding generations was modeled with a Ne 1,000 years ago of\n 1,256, a Ne 10,000 years ago of 4,350, and a Ne 100,000 years ago of\n 43,500 with linear changes in between. This reflects estimates by\n Villa-Angulo et al. (2009) for the Holstein population. A pedigree was\n simulated comprising 10 generations of individuals, with 50 sires per\n generation, 10 dams per sire, and 2 offspring per dam. Base individuals in\n the pedigree had their gametes randomly sampled from the 4000 haplotypes\n of the sequence simulation allowing for recombination according to the\n genetic distance using 1% probability of a recombination event per cM.\n Subsequent generations in the pedigree had their gametes generated through\n Mendelian inheritance with recombination. The total number of segregating\n sites across the resulting genome was approximately 1,670,000. A random\n sample of 60,000 segregating sites was selected from the sequence to be\n used as SNP on a 60,000 SNP array. In addition a set of 9,000 segregating\n sites were randomly selected from the sequence to be used as candidate QTL\n loci in two different ways, one a randomly sampled set and the other being\n a randomly sampled set with the restriction that their minor allele\n frequency could not exceed 0.30. Four different traits were simulated\n assuming an additive genetic model. The first pair of traits was generated\n using the 9,000 unrestricted candidate QTL loci. For the first trait\n (PolyUnres) the allele substitution effect at each QTL locus was sampled\n from a normal distribution with a mean of zero and standard deviation of\n one unit. For the second trait (GammaUnres) a random subset of 900 of the\n candidate QTL loci were selected and their allele substitution effects at\n each QTL locus were sampled from a gamma distribution with a shape of 0.4\n and scale of 1.66 (Meuwissen et al., 2001) and a 50% chance of being\n positive or negative. The second pair of traits (PolyRes and GammaRes) was\n generated in the same way as the first pair except that the candidate QTL\n loci comprised the 9,000 with the restriction that their minor allele\n frequency could not exceed 0.30. Phenotypes with a heritability of 0.25\n were generated for each trait. To ensure that the heritability of the four\n traits remained constant the residual variance was scaled relative to the\n variance of the breeding values of individuals in the base generation,\n which was given by a'a/(n-1), where a is a vector of breeding value\n of individuals in the base generation and n is the number of individuals\n in that generation. Ten replicates of each scenario were simulated.\n Training and validation data sets Subsets of the data were extracted for\n training and validation. The training set comprised the 2000 individuals\n in generations 4 and 5. Three validation sets were extracted. The first\n (Gen6) comprised 500 individuals sampled at random from generation 6. The\n second (Gen8) comprised 500 individuals sampled at random from generation\n 8. The third (Gen10) comprised 500 individuals sampled at random from\n generation 10.SimulatedData_Part1.zipSimulated Data - Part 2Ten replicates\n of a livestock data structure were simulated. The structure was designed\n to cover a spectrum of QTL distributions, relationship structures, and SNP\n densities and to mimic some of the scenarios where genomic selection is\n applied. In each replicate sequence data for 4000 base haplotypes for each\n of thirty chromosomes was simulated using the Markovian Coalescence\n Simulator (MaCS) (Chen et al., 2009). The thirty chromosomes were each 100\n cM in length comprising approximately 108 base pairs and were simulated\n using a per site mutation rate of 2.5*10-8 and an effective population\n size (Ne) of 100 in the final generation of the sequence simulation. The\n reduction of Ne in the preceding generations was modeled with a Ne 1,000\n years ago of 1,256, a Ne 10,000 years ago of 4,350, and a Ne 100,000 years\n ago of 43,500 with linear changes in between. This reflects estimates by\n Villa-Angulo et al. (2009) for the Holstein population. A pedigree was\n simulated comprising 10 generations of individuals, with 50 sires per\n generation, 10 dams per sire, and 2 offspring per dam. Base individuals in\n the pedigree had their gametes randomly sampled from the 4000 haplotypes\n of the sequence simulation allowing for recombination according to the\n genetic distance using 1% probability of a recombination event per cM.\n Subsequent generations in the pedigree had their gametes generated through\n Mendelian inheritance with recombination. The total number of segregating\n sites across the resulting genome was approximately 1,670,000. A random\n sample of 60,000 segregating sites was selected from the sequence to be\n used as SNP on a 60,000 SNP array. In addition a set of 9,000 segregating\n sites were randomly selected from the sequence to be used as candidate QTL\n loci in two different ways, one a randomly sampled set and the other being\n a randomly sampled set with the restriction that their minor allele\n frequency could not exceed 0.30. Four different traits were simulated\n assuming an additive genetic model. The first pair of traits was generated\n using the 9,000 unrestricted candidate QTL loci. For the first trait\n (PolyUnres) the allele substitution effect at each QTL locus was sampled\n from a normal distribution with a mean of zero and standard deviation of\n one unit. For the second trait (GammaUnres) a random subset of 900 of the\n candidate QTL loci were selected and their allele substitution effects at\n each QTL locus were sampled from a gamma distribution with a shape of 0.4\n and scale of 1.66 (Meuwissen et al., 2001) and a 50% chance of being\n positive or negative. The second pair of traits (PolyRes and GammaRes) was\n generated in the same way as the first pair except that the candidate QTL\n loci comprised the 9,000 with the restriction that their minor allele\n frequency could not exceed 0.30. Phenotypes with a heritability of 0.25\n were generated for each trait. To ensure that the heritability of the four\n traits remained constant the residual variance was scaled relative to the\n variance of the breeding values of individuals in the base generation,\n which was given by a'a/(n-1), where a is a vector of breeding value\n of individuals in the base generation and n is the number of individuals\n in that generation. Ten replicates of each scenario were simulated.\n Training and validation data sets Subsets of the data were extracted for\n training and validation. The training set comprised the 2000 individuals\n in generations 4 and 5. Three validation sets were extracted. The first\n (Gen6) comprised 500 individuals sampled at random from generation 6. The\n second (Gen8) comprised 500 individuals sampled at random from generation\n 8. The third (Gen10) comprised 500 individuals sampled at random from\n generation 10.SimulatedData_Part2.zipSimulated Data - Part 3Ten replicates\n of a livestock data structure were simulated. The structure was designed\n to cover a spectrum of QTL distributions, relationship structures, and SNP\n densities and to mimic some of the scenarios where genomic selection is\n applied. In each replicate sequence data for 4000 base haplotypes for each\n of thirty chromosomes was simulated using the Markovian Coalescence\n Simulator (MaCS) (Chen et al., 2009). The thirty chromosomes were each 100\n cM in length comprising approximately 108 base pairs and were simulated\n using a per site mutation rate of 2.5*10-8 and an effective population\n size (Ne) of 100 in the final generation of the sequence simulation. The\n reduction of Ne in the preceding generations was modeled with a Ne 1,000\n years ago of 1,256, a Ne 10,000 years ago of 4,350, and a Ne 100,000 years\n ago of 43,500 with linear changes in between. This reflects estimates by\n Villa-Angulo et al. (2009) for the Holstein population. A pedigree was\n simulated comprising 10 generations of individuals, with 50 sires per\n generation, 10 dams per sire, and 2 offspring per dam. Base individuals in\n the pedigree had their gametes randomly sampled from the 4000 haplotypes\n of the sequence simulation allowing for recombination according to the\n genetic distance using 1% probability of a recombination event per cM.\n Subsequent generations in the pedigree had their gametes generated through\n Mendelian inheritance with recombination. The total number of segregating\n sites across the resulting genome was approximately 1,670,000. A random\n sample of 60,000 segregating sites was selected from the sequence to be\n used as SNP on a 60,000 SNP array. In addition a set of 9,000 segregating\n sites were randomly selected from the sequence to be used as candidate QTL\n loci in two different ways, one a randomly sampled set and the other being\n a randomly sampled set with the restriction that their minor allele\n frequency could not exceed 0.30. Four different traits were simulated\n assuming an additive genetic model. The first pair of traits was generated\n using the 9,000 unrestricted candidate QTL loci. For the first trait\n (PolyUnres) the allele substitution effect at each QTL locus was sampled\n from a normal distribution with a mean of zero and standard deviation of\n one unit. For the second trait (GammaUnres) a random subset of 900 of the\n candidate QTL loci were selected and their allele substitution effects at\n each QTL locus were sampled from a gamma distribution with a shape of 0.4\n and scale of 1.66 (Meuwissen et al., 2001) and a 50% chance of being\n positive or negative. The second pair of traits (PolyRes and GammaRes) was\n generated in the same way as the first pair except that the candidate QTL\n loci comprised the 9,000 with the restriction that their minor allele\n frequency could not exceed 0.30. Phenotypes with a heritability of 0.25\n were generated for each trait. To ensure that the heritability of the four\n traits remained constant the residual variance was scaled relative to the\n variance of the breeding values of individuals in the base generation,\n which was given by a'a/(n-1), where a is a vector of breeding value\n of individuals in the base generation and n is the number of individuals\n in that generation. Ten replicates of each scenario were simulated.\n Training and validation data sets Subsets of the data were extracted for\n training and validation. The training set comprised the 2000 individuals\n in generations 4 and 5. Three validation sets were extracted. The first\n (Gen6) comprised 500 individuals sampled at random from generation 6. The\n second (Gen8) comprised 500 individuals sampled at random from generation\n 8. The third (Gen10) comprised 500 individuals sampled at random from\n generation 10.SimulatedData_Part3.zipSimulated Data - Part 4Ten replicates\n of a livestock data structure were simulated. The structure was designed\n to cover a spectrum of QTL distributions, relationship structures, and SNP\n densities and to mimic some of the scenarios where genomic selection is\n applied. In each replicate sequence data for 4000 base haplotypes for each\n of thirty chromosomes was simulated using the Markovian Coalescence\n Simulator (MaCS) (Chen et al., 2009). The thirty chromosomes were each 100\n cM in length comprising approximately 108 base pairs and were simulated\n using a per site mutation rate of 2.5*10-8 and an effective population\n size (Ne) of 100 in the final generation of the sequence simulation. The\n reduction of Ne in the preceding generations was modeled with a Ne 1,000\n years ago of 1,256, a Ne 10,000 years ago of 4,350, and a Ne 100,000 years\n ago of 43,500 with linear changes in between. This reflects estimates by\n Villa-Angulo et al. (2009) for the Holstein population. A pedigree was\n simulated comprising 10 generations of individuals, with 50 sires per\n generation, 10 dams per sire, and 2 offspring per dam. Base individuals in\n the pedigree had their gametes randomly sampled from the 4000 haplotypes\n of the sequence simulation allowing for recombination according to the\n genetic distance using 1% probability of a recombination event per cM.\n Subsequent generations in the pedigree had their gametes generated through\n Mendelian inheritance with recombination. The total number of segregating\n sites across the resulting genome was approximately 1,670,000. A random\n sample of 60,000 segregating sites was selected from the sequence to be\n used as SNP on a 60,000 SNP array. In addition a set of 9,000 segregating\n sites were randomly selected from the sequence to be used as candidate QTL\n loci in two different ways, one a randomly sampled set and the other being\n a randomly sampled set with the restriction that their minor allele\n frequency could not exceed 0.30. Four different traits were simulated\n assuming an additive genetic model. The first pair of traits was generated\n using the 9,000 unrestricted candidate QTL loci. For the first trait\n (PolyUnres) the allele substitution effect at each QTL locus was sampled\n from a normal distribution with a mean of zero and standard deviation of\n one unit. For the second trait (GammaUnres) a random subset of 900 of the\n candidate QTL loci were selected and their allele substitution effects at\n each QTL locus were sampled from a gamma distribution with a shape of 0.4\n and scale of 1.66 (Meuwissen et al., 2001) and a 50% chance of being\n positive or negative. The second pair of traits (PolyRes and GammaRes) was\n generated in the same way as the first pair except that the candidate QTL\n loci comprised the 9,000 with the restriction that their minor allele\n frequency could not exceed 0.30. Phenotypes with a heritability of 0.25\n were generated for each trait. To ensure that the heritability of the four\n traits remained constant the residual variance was scaled relative to the\n variance of the breeding values of individuals in the base generation,\n which was given by a'a/(n-1), where a is a vector of breeding value\n of individuals in the base generation and n is the number of individuals\n in that generation. Ten replicates of each scenario were simulated.\n Training and validation data sets Subsets of the data were extracted for\n training and validation. The training set comprised the 2000 individuals\n in generations 4 and 5. Three validation sets were extracted. The first\n (Gen6) comprised 500 individuals sampled at random from generation 6. The\n second (Gen8) comprised 500 individuals sampled at random from generation\n 8. The third (Gen10) comprised 500 individuals sampled at random from\n generation 10.SimulatedData_Part4.zipSimulated Data - Part 5Ten replicates\n of a livestock data structure were simulated. The structure was designed\n to cover a spectrum of QTL distributions, relationship structures, and SNP\n densities and to mimic some of the scenarios where genomic selection is\n applied. In each replicate sequence data for 4000 base haplotypes for each\n of thirty chromosomes was simulated using the Markovian Coalescence\n Simulator (MaCS) (Chen et al., 2009). The thirty chromosomes were each 100\n cM in length comprising approximately 108 base pairs and were simulated\n using a per site mutation rate of 2.5*10-8 and an effective population\n size (Ne) of 100 in the final generation of the sequence simulation. The\n reduction of Ne in the preceding generations was modeled with a Ne 1,000\n years ago of 1,256, a Ne 10,000 years ago of 4,350, and a Ne 100,000 years\n ago of 43,500 with linear changes in between. This reflects estimates by\n Villa-Angulo et al. (2009) for the Holstein population. A pedigree was\n simulated comprising 10 generations of individuals, with 50 sires per\n generation, 10 dams per sire, and 2 offspring per dam. Base individuals in\n the pedigree had their gametes randomly sampled from the 4000 haplotypes\n of the sequence simulation allowing for recombination according to the\n genetic distance using 1% probability of a recombination event per cM.\n Subsequent generations in the pedigree had their gametes generated through\n Mendelian inheritance with recombination. The total number of segregating\n sites across the resulting genome was approximately 1,670,000. A random\n sample of 60,000 segregating sites was selected from the sequence to be\n used as SNP on a 60,000 SNP array. In addition a set of 9,000 segregating\n sites were randomly selected from the sequence to be used as candidate QTL\n loci in two different ways, one a randomly sampled set and the other being\n a randomly sampled set with the restriction that their minor allele\n frequency could not exceed 0.30. Four different traits were simulated\n assuming an additive genetic model. The first pair of traits was generated\n using the 9,000 unrestricted candidate QTL loci. For the first trait\n (PolyUnres) the allele substitution effect at each QTL locus was sampled\n from a normal distribution with a mean of zero and standard deviation of\n one unit. For the second trait (GammaUnres) a random subset of 900 of the\n candidate QTL loci were selected and their allele substitution effects at\n each QTL locus were sampled from a gamma distribution with a shape of 0.4\n and scale of 1.66 (Meuwissen et al., 2001) and a 50% chance of being\n positive or negative. The second pair of traits (PolyRes and GammaRes) was\n generated in the same way as the first pair except that the candidate QTL\n loci comprised the 9,000 with the restriction that their minor allele\n frequency could not exceed 0.30. Phenotypes with a heritability of 0.25\n were generated for each trait. To ensure that the heritability of the four\n traits remained constant the residual variance was scaled relative to the\n variance of the breeding values of individuals in the base generation,\n which was given by a'a/(n-1), where a is a vector of breeding value\n of individuals in the base generation and n is the number of individuals\n in that generation. Ten replicates of each scenario were simulated.\n Training and validation data sets Subsets of the data were extracted for\n training and validation. The training set comprised the 2000 individuals\n in generations 4 and 5. Three validation sets were extracted. The first\n (Gen6) comprised 500 individuals sampled at random from generation 6. The\n second (Gen8) comprised 500 individuals sampled at random from generation\n 8. The third (Gen10) comprised 500 individuals sampled at random from\n generation 10.SimulatedData_Part5.zip",
      "descriptionType": "Other"
    }
  ],
  "geoLocations": [],
  "fundingReferences": [],
  "relatedIdentifiers": [
    {
      "relationType": "IsCitedBy",
      "relatedIdentifier": "10.1534/g3.111.001297",
      "relatedIdentifierType": "DOI"
    }
  ],
  "relatedItems": [],
  "schemaVersion": "http://datacite.org/schema/kernel-4",
  "providerId": "dryad",
  "clientId": "dryad.dryad",
  "agency": "datacite",
  "state": "findable"
}