Imputation of high-density genotypes in the Fleckvieh cattle population

Genet Sel Evol. 2013 Feb 13;45(1):3. doi: 10.1186/1297-9686-45-3.

Abstract

Background: Currently, genome-wide evaluation of cattle populations is based on SNP-genotyping using ~ 54,000 SNP. Increasing the number of markers might improve genomic predictions and power of genome-wide association studies. Imputation of genotypes makes it possible to extrapolate genotypes from lower to higher density arrays based on a representative reference sample for which genotypes are obtained at higher density.

Methods: Genotypes using 639 214 SNP were available for 797 bulls of the Fleckvieh cattle breed. The data set was divided into a reference and a validation population. Genotypes for all SNP except those included in the BovineSNP50 Bead chip were masked and subsequently imputed for animals of the validation population. Imputation of genotypes was performed with Beagle, findhap.f90, MaCH and Minimac. The accuracy of the imputed genotypes was assessed for four different scenarios including 50, 100, 200 and 400 animals as reference population. The reference animals were selected to account for 78.03%, 89.21%, 97.47% and > 99% of the gene pool of the genotyped population, respectively.

Results: Imputation accuracy increased as the number of animals and relatives in the reference population increased. Population-based algorithms provided highly reliable imputation of genotypes, even for scenarios with 50 and 100 reference animals only. Using MaCH and Minimac, the correlation between true and imputed genotypes was > 0.975 with 100 reference animals only. Pre-phasing the genotypes of both the reference and validation populations not only provided highly accurate imputed genotypes but was also computationally efficient. Genome-wide analysis of imputation accuracy led to the identification of many misplaced SNP.

Conclusions: Genotyping key animals at high density and subsequent population-based genotype imputation yield high imputation accuracy. Pre-phasing the genotypes of the reference and validation populations is computationally efficient and results in high imputation accuracy, even when the reference population is small.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Animals, Inbred Strains
  • Cattle / genetics*
  • Data Interpretation, Statistical
  • Genotype*
  • Genotyping Techniques*
  • Male
  • Pedigree
  • Polymorphism, Single Nucleotide
  • Population / genetics