Information Processing: Search results for additivity

Showing posts sorted by relevance for query additivity. Sort by date Show all posts

Thursday, May 21, 2015

Fifty years of twin studies

The most interesting aspect of these results is that for many traits there is no detectable non-additivity. That is, gene-gene interactions seem to be insignificant, and a simple linear genetic architecture is consistent with the results.

Meta-analysis of the heritability of human traits based on fifty years of twin studies
Nature Genetics (2015) doi:10.1038/ng.3285

Despite a century of research on complex traits in humans, the relative importance and specific nature of the influences of genes and environment on human traits remain controversial. We report a meta-analysis of twin correlations and reported variance components for 17,804 traits from 2,748 publications including 14,558,903 partly dependent twin pairs, virtually all published twin studies of complex traits. Estimates of heritability cluster strongly within functional domains, and across all traits the reported heritability is 49%. For a majority (69%) of traits, the observed twin correlations are consistent with a simple and parsimonious model where twin resemblance is solely due to additive genetic variation. The data are inconsistent with substantial influences from shared environment or non-additive genetic variation. This study provides the most comprehensive analysis of the causes of individual differences in human traits thus far and will guide future gene-mapping efforts.

You may have noticed that I am gradually collecting copious evidence for (approximate) additivity. Far too many scientists and quasi-scientists are infected by the epistasis or epigenetics meme, which is appealing to those who "revel in complexity" and would like to believe that biology is too complex to succumb to equations. ("How can it be? But what about the marvelous incomprehensible beautiful sacred complexity of Nature? But But But ...")

I sometimes explain things this way:

There is a deep evolutionary reason behind additivity: nonlinear mechanisms are fragile and often "break" due to DNA recombination in sexual reproduction. Effects which are only controlled by a single locus are more robustly passed on to offspring. ...

Many people confuse the following statements:

"The brain is complex and nonlinear and many genes interact in its construction and operation."

"Differences in brain performance between two individuals of the same species must be due to nonlinear (non-additive) effects of genes."

The first statement is true, but the second does not appear to be true across a range of species and quantitative traits.

On the genetic architecture of intelligence and other quantitative traits (p.16):

... The preceding discussion is not intended to convey an overly simplistic view of genetics or systems biology. Complex nonlinear genetic systems certainly exist and are realized in every organism. However, quantitative differences between individuals within a species may be largely due to independent linear effects of specific genetic variants. As noted, linear effects are the most readily evolvable in response to selection, whereas nonlinear gadgets are more likely to be fragile to small changes. (Evolutionary adaptations requiring significant changes to nonlinear gadgets are improbable and therefore require exponentially more time than simple adjustment of frequencies of alleles of linear effect.) One might say that, to first approximation, Biology = linear combinations of nonlinear gadgets, and most of the variation between individuals is in the (linear) way gadgets are combined, rather than in the realization of different gadgets in different individuals.

Linear models work well in practice, allowing, for example, SNP-based prediction of quantitative traits (milk yield, fat and protein content, productive life, etc.) in dairy cattle. ...

Sunday, October 11, 2015

Additivity in yeast quantitative traits

A new paper from the Kruglyak lab at UCLA shows yet again (this time in yeast) that population variation in quantitative traits tends to be dominated by additive effects. There are deep evolutionary reasons for this to be the case -- see excerpt below (at bottom of this post). For other examples, including humans, mice, chickens, cows, plants, see links here.

Genetic interactions contribute less than additive effects to quantitative trait variation in yeast (http://dx.doi.org/10.1101/019513)

Genetic mapping studies of quantitative traits typically focus on detecting loci that contribute additively to trait variation. Genetic interactions are often proposed as a contributing factor to trait variation, but the relative contribution of interactions to trait variation is a subject of debate. Here, we use a very large cross between two yeast strains to accurately estimate the fraction of phenotypic variance due to pairwise QTLQTL interactions for 20 quantitative traits. We find that this fraction is 9% on average, substantially less than the contribution of additive QTL (43%). Statistically significant QTL-QTL pairs typically have small individual effect sizes, but collectively explain 40% of the pairwise interaction variance. We show that pairwise interaction variance is largely explained by pairs of loci at least one of which has a significant additive effect. These results refine our understanding of the genetic architecture of quantitative traits and help guide future mapping studies.

Genetic interactions arise when the joint effect of alleles at two or more loci on a phenotype departs from simply adding up the effects of the alleles at each locus. Many examples of such interactions are known, but the relative contribution of interactions to trait variation is a subject of debate1–5. We previously generated a panel of 1,008 recombinant offspring (“segregants”) from a cross between two strains of yeast: a widely used laboratory strain (BY) and an isolate from a vineyard (RM)6. Using this panel, we estimated the contribution of additive genetic factors to phenotypic variation (narrow-sense or additive heritability) for 46 traits and resolved nearly all of this contribution (on average 87%) to specific genome-wide-significant quantitative trait loci (QTL). ...

We detected nearly 800 significant additive QTL. We were able to refine the location of the QTL explaining at least 1% of trait variance to approximately 10 kb, and we resolved 31 QTL to single genes. We also detected over 200 significant QTL-QTL interactions; in most cases, one or both of the loci also had significant additive effects. For most traits studied, we detected one or a few additive QTL of large effect, plus many QTL and QTL-QTL interactions of small effect. We find that the contribution of QTL-QTL interactions to phenotypic variance is typically less than a quarter of the contribution of additive effects. These results provide a picture of the genetic contributions to quantitative traits at an unprecedented resolution.

... One can test for interactions either between all pairs of markers (full scan), or only between pairs where one marker corresponds to a significant additive QTL (marginal scan). In principle, the former can detect a wider range of interactions, but the latter can have higher power due to a reduced search space. Here, the two approaches yielded similar results, detecting 205 and 266 QTL-QTL interactions, respectively, at an FDR of 10%, with 172 interactions detected by both approaches. In the full scan, 153 of the QTL-QTL interactions correspond to cases where both interacting loci are also significant additive QTL, 36 correspond to cases where one of the loci is a significant additive QTL, and only 16 correspond to cases where neither locus is a significant additive QTL.

For related discussion of nonlinear genetic models, see here:

It is a common belief in genomics that nonlinear interactions (epistasis) in complex traits make the task of reconstructing genetic models extremely difficult, if not impossible. In fact, it is often suggested that overcoming nonlinearity will require much larger data sets and significantly more computing power. Our results show that in broad classes of plausibly realistic models, this is not the case.

Determination of Nonlinear Genetic Architecture using Compressed Sensing (arXiv:1408.6583)
Chiu Man Ho, Stephen D.H. Hsu
Subjects: Genomics (q-bio.GN); Applications (stat.AP)

We introduce a statistical method that can reconstruct nonlinear genetic models (i.e., including epistasis, or gene-gene interactions) from phenotype-genotype (GWAS) data. The computational and data resource requirements are similar to those necessary for reconstruction of linear genetic models (or identification of gene-trait associations), assuming a condition of generalized sparsity, which limits the total number of gene-gene interactions. An example of a sparse nonlinear model is one in which a typical locus interacts with several or even many others, but only a small subset of all possible interactions exist. It seems plausible that most genetic architectures fall in this category. Our method uses a generalization of compressed sensing (L1-penalized regression) applied to nonlinear functions of the sensing matrix. We give theoretical arguments suggesting that the method is nearly optimal in performance, and demonstrate its effectiveness on broad classes of nonlinear genetic models using both real and simulated human genomes.

I've discussed additivity many times previously, so I'll just quote below from Additivity and complex traits in mice:

You may have noticed that I am gradually collecting copious evidence for (approximate) additivity. Far too many scientists and quasi-scientists are infected by the epistasis or epigenetics meme, which is appealing to those who "revel in complexity" and would like to believe that biology is too complex to succumb to equations. ...

I sometimes explain things this way:

There is a deep evolutionary reason behind additivity: nonlinear mechanisms are fragile and often "break" due to DNA recombination in sexual reproduction. Effects which are only controlled by a single locus are more robustly passed on to offspring. ...

Many people confuse the following statements:

"The brain is complex and nonlinear and many genes interact in its construction and operation."

"Differences in brain performance between two individuals of the same species must be due to nonlinear (non-additive) effects of genes."

The first statement is true, but the second does not appear to be true across a range of species and quantitative traits. On the genetic architecture of intelligence and other quantitative traits (p.16):

... The preceding discussion is not intended to convey an overly simplistic view of genetics or systems biology. Complex nonlinear genetic systems certainly exist and are realized in every organism. However, quantitative differences between individuals within a species may be largely due to independent linear effects of specific genetic variants. As noted, linear effects are the most readily evolvable in response to selection, whereas nonlinear gadgets are more likely to be fragile to small changes. (Evolutionary adaptations requiring significant changes to nonlinear gadgets are improbable and therefore require exponentially more time than simple adjustment of frequencies of alleles of linear effect.) One might say that, to first approximation, Biology = linear combinations of nonlinear gadgets, and most of the variation between individuals is in the (linear) way gadgets are combined, rather than in the realization of different gadgets in different individuals.

Linear models work well in practice, allowing, for example, SNP-based prediction of quantitative traits (milk yield, fat and protein content, productive life, etc.) in dairy cattle. ...

Thursday, November 20, 2014

Additivity and complex traits in mice

How well can we predict complex phenotypes in mice from genomic data? The figure below, from a recent Nature Genetics paper (Speed and Balding doi:10.1038/nrg3821), shows prediction accuracies for a set of 139 traits -- including behavioral and disease phenotypes. Significant chunks of heritability are easily captured by linear models with additive effects. The population of mice used in the study are derived from crosses of 8 original inbred strains (see photo). For similar predictive results in dairy cows, see here.

This suggests that human population variation in complex traits is also likely to be approximately linear: mostly due to additive genetic effects. You may have noticed that I am gradually collecting copious evidence for additivity. Far too many scientists and quasi-scientists are infected by the epistasis or epigenetics meme, which is appealing to those who "revel in complexity" and would like to believe that biology is too complex to succumb to equations. ("How can it be? But what about the marvelous incomprehensible beautiful sacred complexity of Nature? But But But ...")

I sometimes explain things this way:

There is a deep evolutionary reason behind additivity: nonlinear mechanisms are fragile and often "break" due to DNA recombination in sexual reproduction. Effects which are only controlled by a single locus are more robustly passed on to offspring. ...

Many people confuse the following statements:

"The brain is complex and nonlinear and many genes interact in its construction and operation."

"Differences in brain performance between two individuals of the same species must be due to nonlinear (non-additive) effects of genes."

The first statement is true, but the second does not appear to be true across a range of species and quantitative traits.

See also discussion in section 3 of my paper On the genetic architecture of intelligence and other complex traits.

Compare to the additive heritability estimates below. Note the different K's correspond to different choices of genetic similarity matrices (GSMs; see paper). Just ignore all the dots except the ones with largest r2 or h2 for each phenotype. All of the underlying predictive models are linear. It is possible that some phenotypes have even greater broad sense (including non-additive) heritability and that nonlinear models will be required to capture this variation.

Some examples of behavioral traits measured in this mouse population.

EPM: (maze) distance travelled, time spent, and entries into closed and open arms
FN: time taken to sample a novel foodstuff (overnight food deprivation)
Burrowing: Number of pellets removed from burrow in 1.5 hours
Activity: Activity measured in a home cage in 30 minutes
Startle: Startle to a loud noise
Context freezing: Freezing to the context in which a tone is associated with a foot shock
Cue freezing: Freezing to a tone after association with a foot shock

Monday, January 09, 2012

"Phantom" heritability

The mystery of missing heritability: Genetic interactions create phantom heritability

Or Zuk, Eliana Hechter, Shamil R. Sunyaev, and Eric S. Lander

Human genetics has been haunted by the mystery of “missing heritability” of common traits. Although studies have discovered >1,200 variants associated with common diseases and traits, these variants typically appear to explain only a minority of the heritability. The proportion of heritability explained by a set of variants is the ratio of (i) the heritability due to these variants (numerator), estimated directly from their observed effects, to (ii) the total heritability (denominator), inferred indirectly from population data. The prevailing view has been that the explanation for missing heritability lies in the numerator—that is, in as-yet undiscovered variants. While many variants surely remain to be found, we show here that a substantial portion of missing heritability could arise from overestimation of the denominator, creating “phantom heritability.” Specifically, (i) estimates of total heritability implicitly assume the trait involves no genetic interactions (epistasis) among loci; (ii) this assumption is not justified, because models with interactions are also consistent with observable data; and (iii) under such models, the total heritability may be much smaller and thus the proportion of heritability explained much larger. For example, 80% of the currently missing heritability for Crohn's disease could be due to genetic interactions, if the disease involves interaction among three pathways. In short, missing heritability need not directly correspond to missing variants, because current estimates of total heritability may be significantly inflated by genetic interactions. Finally, we describe a method for estimating heritability from isolated populations that is not inflated by genetic interactions.

This new paper by Eric Lander and collaborators is attracting a fair amount of interest: gnxp , genetic inference , genomes unzipped. The paper is discussed at some length at the links above. I will just make a few comments.

1. The non-additive models analyzed in the paper require significant shared environment correlations to mask non-additivity and be consistent with data that (at face value) support additivity. See Table 7 in the Supplement. This level of environmental effect is, in the cases of height and g, probably excluded by adoption studies, although it may still be allowed for many disease traits. To put this another way, even after reading this paper I do not know of any models consistent with what is known about height and g that do not have a large additive component (e.g., of order 50 percent of total variance).

2. The criticisms in section 11 of Hill, Goddard, and Visscher (2008; also discussed previously here) are, to my mind, rather weak. To quote a string theorist friend: "It is nothing more than the calculus of words" ;-) In particular, I flat out disagree with the following (p.46 of the Supplement):

The problem with this reasoning is: As the population grows (and the typical locus tends toward monomorphism), typical traits involving typical loci become very boring! They not only have low interaction variance VAA, they also have very low total genetic variance VG. That is, the typical trait doesn't vary much in the population! In effect, Hill et al.'s theory thus actually describes what happens for rare traits caused by a few rare variants. Not surprisingly, interactions account for a small proportion of the variance for such traits.

[[ Nope, one could also take the rarity to zero and the number of causal variants to infinity keeping the population variance held fixed! This seems to be what happens in the real world with quantitative traits like height and g having thousands of causal variants, each of small effect. ]]

"Doesn't vary very much" is not well-defined: relative to what? What if the genetic variance in this limit is still much larger than the environmental component? Do height and IQ "vary very much" in human populations? Having only moderately rare variants (e.g., MAF = .1-.2), but many of them, is consistent with normally distributed population variation and small non-additive effects (.2 squared is 4 percent). Below is figure 9 from the Supplement -- click for larger version. As the frequency p approaches zero (or unity) the additive variance (green curve) dominates and the non-additive part becomes small (blue curve). Whether the total genetic variance (red curve) is big or small might be defined relative to the size of environmental effects, which are not shown. Note the green and blue curves are dimensionless ratios of variances, whereas the red curve ultimately (after multiplication by effect size) has real units like cm of height or IQ points.

The essence of Hill et al. is discussed in the earlier post (see comments).

Yes, one of the main points of the paper I cited is that one can have strong epistasis at the level of individual genes, but if variants are rare, the effect in a population will be linear.

"These two examples, the single locus and A x A model, illustrate what turns out to be the fundamental point in considering the impact of the gene frequency distribution. When an allele (say C) is rare, so most individuals have genotype Cc or cc, the allelic substitution or average effect of C vs. c accounts for essentially all the differences found in genotypic values; or in other words the linear regression of genotypic value on number of C genes accounts for the genotypic differences (see [3], p 117)." [p.5]

Note Added: For more on additivity vs epistasis, I suggest this talk by James Crow. Among other things he makes an evolutionary argument for why we should expect to find lots of additive variation at the population or individual level, despite the presence of lots of epistasis at the gene level. It is much more difficult for evolution to act on non-additive variance than on additive variance in a sexually reproducing species.

Thursday, October 16, 2014

Genius (Nautilus Magazine)

The article excerpted below, in the science magazine Nautilus, is an introduction to certain ideas from my paper On the genetic architecture of intelligence and other quantitative traits.

Super-Intelligent Humans Are Coming (Nautilus, special issue: Genius)

Genetic engineering will one day create the smartest humans who have ever lived.

Lev Landau, a Nobelist and one of the fathers of a great school of Soviet physics, had a logarithmic scale for ranking theorists, from 1 to 5. A physicist in the first class had ten times the impact of someone in the second class, and so on. He modestly ranked himself as 2.5 until late in life, when he became a 2. In the first class were Heisenberg, Bohr, and Dirac among a few others. Einstein was a 0.5!

My friends in the humanities, or other areas of science like biology, are astonished and disturbed that physicists and mathematicians (substitute the polymathic von Neumann for Einstein) might think in this essentially hierarchical way. Apparently, differences in ability are not manifested so clearly in those fields. But I find Landau’s scheme appropriate: There are many physicists whose contributions I cannot imagine having made.

I have even come to believe that Landau’s scale could, in principle, be extended well below Einstein’s 0.5. The genetic study of cognitive ability suggests that there exist today variations in human DNA which, if combined in an ideal fashion, could lead to individuals with intelligence that is qualitatively higher than has ever existed on Earth: Crudely speaking, IQs of order 1,000, if the scale were to continue to have meaning.

... Does g predict genius? Consider the Study of Mathematically Precocious Youth, a longitudinal study of gifted children identified by testing (using the SAT, which is highly correlated with g) before age 13. All participants were in the top percentile of ability, but the top quintile of that group was at the one in 10,000 level or higher. When surveyed in middle age, it was found that even within this group of gifted individuals, the probability of achievement increased drastically with early test scores. For example, the top quintile group was six times as likely to have been awarded a patent than the lowest quintile. Probability of a STEM doctorate was 18 times larger, and probability of STEM tenure at a top-50 research university was almost eight times larger. It is reasonable to conclude that g represents a meaningful single-number measure of intelligence, allowing for crude but useful apples-to-apples comparisons.

... Once predictive models are available, they can be used in reproductive applications, ranging from embryo selection (choosing which IVF zygote to implant) to active genetic editing (for example, using CRISPR techniques). In the former case, parents choosing between 10 or so zygotes could improve the IQ of their child by 15 or more IQ points. This might mean the difference between a child who struggles in school, and one who is able to complete a good college degree. Zygote genotyping from single cell extraction is already technically well developed, so the last remaining capability required for embryo selection is complex phenotype prediction. The cost of these procedures would be less than tuition at many private kindergartens, and of course the consequences will extend over a lifetime and beyond.

The corresponding ethical issues are complex and deserve serious attention in what may be a relatively short interval before these capabilities become a reality. Each society will decide for itself where to draw the line on human genetic engineering, but we can expect a diversity of perspectives. Almost certainly, some countries will allow genetic engineering, thereby opening the door for global elites who can afford to travel for access to reproductive technology. As with most technologies, the rich and powerful will be the first beneficiaries. Eventually, though, I believe many countries will not only legalize human genetic engineering, but even make it a (voluntary) part of their national healthcare systems.

The alternative would be inequality of a kind never before experienced in human history.

Note Added: I posted the following in the comments at the Nautilus site and also on Hacker News (ycombinator), which has a big thread.

The question of additivity of genetic effects is discussed in more detail in reference [1] above (sections 3.1 and also 4): http://arxiv.org/pdf/1408.3421...

In plant and animal genetics it is well established that the majority of phenotype variance (in complex traits) which is under genetic control is additive. (Linear models work well in species ranging from corn to cows; cattle breeding is now done using SNP genotypes and linear models to estimate phenotypes.) There are also direct estimates of the additive / non-additive components of variance for human height and IQ, from twin and sibling studies. Again, the conclusion is the majority of variance is due to additive effects.

There is a deep evolutionary reason behind additivity: nonlinear mechanisms are fragile and often "break" due to DNA recombination in sexual reproduction. Effects which are only controlled by a single locus are more robustly passed on to offspring. Fisher's fundamental theorem of natural selection says that the rate of change of fitness is controlled by additive variance in sexually reproducing species under relatively weak selection.

Many people confuse the following statements:

"The brain is complex and nonlinear and many genes interact in its construction and operation."

"Differences in brain performance between two individuals of the same species must be due to nonlinear effects of genes."

The first statement is true, but the second does not appear to be true across a range of species and quantitative traits.

Final technical comment: even the nonlinear part of the genetic architecture can be deduced using advanced methods in high dimensional statistics (see section 4.2 in [1] and also http://arxiv.org/abs/1408.6583....

##################

I just realized I've said all of this already in http://arxiv.org/pdf/1408.3421... (p.16):

... The preceding discussion is not intended to convey an overly simplistic view of genetics or systems biology. Complex nonlinear genetic systems certainly exist and are realized in every organism. However, quantitative differences between individuals within a species may be largely due to independent linear effects of specific genetic variants. As noted, linear effects are the most readily evolvable in response to selection, whereas nonlinear gadgets are more likely to be fragile to small changes. (Evolutionary adaptations requiring significant changes to nonlinear gadgets are improbable and therefore require exponentially more time than simple adjustment of frequencies of alleles of linear effect.) One might say that, to first approximation, Biology = linear combinations of nonlinear gadgets, and most of the variation between individuals is in the (linear) way gadgets are combined, rather than in the realization of different gadgets in different individuals.

Linear models work well in practice, allowing, for example, SNP-based prediction of quantitative traits (milk yield, fat and protein content, productive life, etc.) in dairy cattle. ...

Sunday, October 19, 2014

ASHG 2014

Let me know if you're in SD and want to meet up! Yaniv Erlich is talking about additivity vs epistasis right now :-)

Tuesday, December 30, 2014

Measuring missing heritability: Inferring the contribution of common variants

This recent paper from Eric Lander proposes an alternative to GCTA. There is an interesting change in tone vis a vis an earlier paper with Zuk. Instead of speculating about explanations of missing heritability (beyond the existence of yet undiscovered common variants of small effect), the paper focuses on the claim that REML/GCTA underestimates the heritability due to common variants in case-control designs. The proposed alternative methodology, called phenotype correlation–genetic correlation (PCGC) regression, estimates heritability by directly regressing phenotype correlation vs genotype correlation across all pairs in the sample. (This is how I usually explain the concept behind GCTA when I don't want to get into details of REML, LMMs, etc.)

Personally, I am not especially concerned about the precise value of heritability estimates from REML/GCTA or PCGC, as there are significant uncertainties that go beyond the simple additive model assumed in both of these methods (e.g., due to nonlinear genetic architecture). For me it is sufficient that the results of both are consistent with classical estimates from twin and adoption studies, and yield h2 ~ 0.5 or higher for many interesting traits.

Measuring missing heritability: Inferring the contribution of common variants (PNAS)

D. Golan, E. Lander and S. Gosset

Studies have identified thousands of common genetic variants associated with hundreds of diseases. Yet, these common variants typically account for a minority of the heritability, a problem known as “missing heritability.” Geneticists recently proposed indirect methods for estimating the total heritability attributable to common variants, including those whose effects are too small to allow identification in current studies. Here, we show that these methods seriously underestimate the true heritability when applied to case–control studies of disease. We describe a method that provides unbiased estimates. Applying it to six diseases, we estimate that common variants explain an average of 60% of the heritability for these diseases. The framework also may be applied to case–control studies, extreme-phenotype studies, and other settings.

From the conclusion:

... Our results suggest that larger CVASs [GWAS] will identify many additional common variants related to common diseases, although many additional common variants likely still will have effect sizes that fall below the limits of detection given practically achievable sample sizes. Still, common variants clearly will not explain all heritability. As discussed in the first two papers in this series (2,3), rare genetic variants and genetic interactions likely will make important contributions as well. Fortunately, advances in DNA sequencing technology should make it possible in the coming years to carry out comprehensive studies of both common and rare genetic variants in tens (and possibly hundreds) of thousands of cases and controls, resulting in a fuller picture of the genetic architecture of common diseases.

Hopefully, more papers like this one will help the field of genomics to update its priors: the most reasonable hypothesis concerning "missing heritability" is simply that larger sample size is required to find the many remaining alleles of small effect. Fisher's infinitesimal model will turn out to be a good first approximation for most human traits. See also Additivity and complex traits in mice.

Thursday, July 22, 2010

Assortative mating, regression and all that: offspring IQ vs parental midpoint

In an earlier post I did a lousy job of trying to estimate the effect of assortative mating on the far tail of intelligence.

Thankfully, James Lee, a real expert in the field, sent me a current best estimate for the probability distribution of offspring IQ as a function of parental midpoint (average between the parents' IQs). James is finishing his Ph.D. at Harvard under Steve Pinker -- you might have seen his review of R. Nesbitt's book Intelligence and how to get it: Why schools and cultures count.

The results are stated further below. Once you plug in the numbers, you get (roughly) the following:

Assuming parental midpoint of n SD above the population average, the kids' IQ will be normally distributed about a mean which is around +.6n with residual SD of about 12 points. (The .6 could actually be anywhere in the range (.5, .7), but the SD doesn't vary much from choice of empirical inputs.)

So, e.g., for n = 4 (parental midpoint of 160 -- very smart parents!), the mean for the kids would be 136 with only a few percent chance of any kid to surpass 160 (requires +2 SD fluctuation). For n = 3 (parental midpoint of 145) the mean for the kids would be 127 and the probability of exceeding 145 less than 10 percent.

No wonder so many physicist's kids end up as doctors and lawyers. Regression indeed! ;-)

Below are some more details; see here for calculations. In my earlier post I arrived at the same formulae as below, but I had rho = 0.

Assuming bivariate normality (and it appears that IQ has been successfully scaled to produce this), the offspring density function is normal with mean n*h^2 and variance 1-(1/2)(1+rho)h^2, where rho is the correlation between mates attributable to assortative mating and h^2 is the narrow-sense heritability.

I put h^2 between .5 and .7. Bouchard and McGue found a median correlation between husband and wife of .33 in their review many years back, but not all of that may be attributable to assortative mating. So anything in (.20, .25) may be a reasonable guesstimate for rho.

In discussing this topic with smart and accomplished parents (e.g., at foo camp, in academic science, or on Wall Street), I've noticed very strong interest in the results ...

See related posts mystery of non-shared environment , regression to the mean

Note: Some people are confused that the value of h^2 = narrow sense (additive) heritability is not higher than (.5 - .7). You may have seen *broad sense* heritability H^2 estimated at values as large as .8 or .9 (e.g., from twin studies). But H^2 includes genetic sources of variation such as dominance and epistasis (interactions between genes, which violate additivity). Because children are not clones of their parents (they only get half of their genes from each parent, and in a random fashion), the correlation between midparent IQ and offspring IQ is not as large as the correlation between the IQs of identical twins. See here and here for more.

Thursday, September 01, 2011

Epistasis vs additivity

Continuing the discussion from my previous post: strong interactions at the level of individual genes do not preclude a linear (additive) analysis of population variation and natural selection.

On epistasis: why it is unimportant in polygenic directional selection

[Phil. Trans. R. Soc. B (2010) 365, 1241–1244 doi:10.1098/rstb.2009.0275]

James F. Crow*
Genetics Laboratory, University of Wisconsin, Madison, WI 53706, USA

There is a difference in viewpoint of developmental and evo-devo geneticists versus breeders and students of quantitative evolution. The former are interested in understanding the developmental process; the emphasis is on identifying genes and studying their action and interaction. Typically, the genes have individually large effects and usually show substantial dominance and epistasis. The latter group are interested in quantitative phenotypes rather than individual genes. Quantitative traits are typically determined by many genes, usually with little dominance or epistasis. Furthermore, epistatic variance has minimum effect, since the selected population soon arrives at a state in which the rate of change is given by the additive variance or covariance. Thus, the breeder’s custom of ignoring epistasis usually gives a more accurate prediction than if epistatic variance were included in the formulae.

Why did Crow have to write this 2010 paper? Don't evo-devo folks understand population genetics? Why do they find the dominance of additive heritability to be so counter-intuitive? Which of the two groups of scientists has a better understanding of how evolution works? Evo-devo folks seem to be from the traditional "revel in complexity" branch of biology: perfectly happy to find that living creatures are too complicated to be modeled by equations. (But are they?)

Some excerpts from the paper:

... Recent years have seen an increased emphasis on epistasis (e.g. Wolf et al. 2000; Carlborg & Haley 2004). Students of development and evo-devo, as well as some human geneticists, have paid particular interest to interactions. For those in these fields, epistasis is an interesting phenomenon on its own and studying it gives deeper insights into developmental and evolutionary processes. Ultimately one wants to know which individual genes are involved, and if one is studying the effects of such genes, it is natural to con- sider the ways in which they interact. Historically, among many other uses, epistasis has provided a means for identifying steps in biochemical and developmental sequences. More generally, including epistasis is part of the description of gene effects. So epistasis, despite methodological challenges, is usually welcomed as providing further insights. Students of development or evo-devo typically study genes of major effect. Of course, genes with major effects are more easily discovered, so they may be providing a biased sample. But we can say that at least some of the genes involved have large effects. And such genes typically show considerable dominance and epistasis.

In contrast, animal and plant breeders have traditionally regarded epistasis as a nuisance, akin to noise in impeding or obscuring the progress of selection. It may seem surprising that the traditional practice of ignoring epistasis has not led to errors in prediction equations. Why? It is this seeming paradox that I wish to discuss.

Continuously distributed quantitative traits typically depend on a large number of factors, each making a small contribution to the quantitative measurement. In general, the smaller the effects, the more nearly additive they are. Experimental evidence for this is abundant. This is expected for reasons analogous to those for which taking only the first term of a Taylor series provides a good estimate. ...

The most extensive selection experiment, at least the one that has continued for the longest time, is the selection for oil and protein content in maize (Dudley 2007). These experiments began near the end of the nineteenth century and still continue; there are now more than 100 generations of selection. Remarkably, selection for high oil content and similarly, but less strikingly, selection for high protein, continue to make progress. There seems to be no diminishing of selectable variance in the population. The effect of selection is enormous: the difference in oil content between the high and low selected strains is some 32 times the original standard deviation.

... Students of development, evo-devo and human genetics often place great emphasis on epistasis. Usually they are identifying individual genes, and naturally the interactions among these are of the very essence of understanding. The individual gene effects are usually large enough for considerable epistasis to be expected.

Quantitative genetics has a contrasting view. The foregoing analysis shows that, under typical conditions, the rate of change under selection is given by the additive genetic variance or covariance. Any attempt to include epistatic terms in prediction formulae is likely to do more harm than good. Animal and plant breeders who ignored epistasis, for whatever reasons, good or bad, were nevertheless on the right track. And prediction formulae based on simple heritability measurements are appropriate.

The power of using microscopic knowledge (genes) to develop macroscopic theory (phenotypes), whereby phenotypic measurements are used to develop prediction formulae, is beautifully illustrated by quantitative genetics theory.

Can we understand evolution without mathematics? Two more useful references:

Statistical Mechanics and the Evolution of Polygenic Quantitative Traits

The Evolution of Multilocus Systems Under Weak Selection

Note I am at BGI right now so there may be some latency in communication.

Friday, February 12, 2016

Epistasis and Complex Traits

Short summary: To first approximation we can ignore gene-gene interactions in the prediction of complex traits. This paper examines specifically how non-additive variance is driven to zero as the number of loci involved becomes large, assuming some dispersion in allele frequencies.

Earlier paper of Hill, Goddard, and Visscher and the simpler 2 locus case is discussed here.

Influence of Gene Interaction on Complex Trait Variation with Multilocus Models

Asko Mäki-Tanila, William G. Hill
GENETICS September 18, 2014 vol. 198 no. 1 355-367; DOI: 10.1534/genetics.114.165282

Although research effort is being expended into determining the importance of epistasis and epistatic variance for complex traits, there is considerable controversy about their importance. Here we undertake an analysis for quantitative traits utilizing a range of multilocus quantitative genetic models and gene frequency distributions, focusing on the potential magnitude of the epistatic variance. All the epistatic terms involving a particular locus appear in its average effect, with the number of two-locus interaction terms increasing in proportion to the square of the number of loci and that of third order as the cube and so on. Hence multilocus epistasis makes substantial contributions to the additive variance and does not, per se, lead to large increases in the nonadditive part of the genotypic variance. Even though this proportion can be high where epistasis is antagonistic to direct effects, it reduces with multiple loci. As the magnitude of the epistatic variance depends critically on the heterozygosity, for models where frequencies are widely dispersed, such as for selectively neutral mutations, contributions of epistatic variance are always small. Epistasis may be important in understanding the genetic architecture, for example, of function or human disease, but that does not imply that loci exhibiting it will contribute much genetic variance. Overall we conclude that theoretical predictions and experimental observations of low amounts of epistatic variance in outbred populations are concordant. It is not a likely source of missing heritability, for example, or major influence on predictions of rates of evolution.

Wednesday, October 02, 2013

Beanbag genetics: blood pressure

As I wrote in an earlier post Beanbags and Causal Variants:

Not only do these results implicate common causal variants as the source of heritability in disease susceptibility, but they also suggest that gene-gene (epistasis) and gene-environment interactions are of limited impact. Both the genetic and environmental backgrounds for a particular allele vary across Eurasia, so replicability puts an upper limit on their influence. See also Epistasis vs Additivity.

How can it be? But what about the marvelous incomprehensible beautiful sacred complexity of Nature? But But But ...

In the blood pressure (BP) study cited below, the data include East and South Asians, African Americans and Europeans. The effect sizes of variants in one population are well correlated with effect sizes in other populations, despite changes in the genetic background (i.e. other genes) and environments with which they interact. This suggests the interaction effects are small.

Genome-wide Association Analysis of Blood-Pressure Traits in African-Ancestry Individuals Reveals Common Associated Genes in African and Non-African Populations

Abstract: ... We also demonstrate that validated EA BP GWAS loci, considered jointly, show significant effects in AA samples. Consequently, these findings suggest that BP loci might have universal effects across studied populations, demonstrating that multiethnic samples are an essential component in identifying, fine mapping, and understanding their trait variability.

(COGENT = African Americans, ICBP = European Americans)

Long live "beanbag genetics"! :-)

A Defense of Beanbag Genetics

JBS Haldane

My friend Professor Ernst Mayr, of Harvard University, in his recent book Animal Species and Evolution1, which I find admirable, though I disagree with quite a lot of it, has the following sentences on page 263.

The Mendelian was apt to compare the genetic contents of a population to a bag full of colored beans. Mutation was the exchange of one kind of bean for another. This conceptualization has been referred to as “beanbag genetics”. Work in population and developmental genetics has shown, however, that the thinking of beanbag genetics is in many ways quite misleading. To consider genes as independent units is meaningless from the physiological as well as the evolutionary viewpoint. [Italics mine]
... In another place2 Mayr made a more specific challenge. He stated that Fisher, Wright, and I “have worked out an impressive mathematical theory of genetical variaion and evolutionary change. But what, precisely, has been the contribution of this mathematical school to evolutionary theory, if I may be permitted to ask such a provocative question?” “However,” he continued in the next paragraph, “I should perhaps leave it to Fisher, Wright, and Haldane to point out what they consider their major contributions.” ...

Now, in the first place I deny that the mathematical theory of population genetics is at all impressive, at least to a mathematician. On the contrary, Wright, Fisher, and I all made simplifying assumptions which allowed us to pose problems soluble by the elementary mathematics at our disposal, and even then did not always fully solve the simple problems we set ourselves. Our mathematics may impress zoologists but do not greatly impress mathematicians. Let me give a simple example. ...

Thursday, May 07, 2015

Peter Visscher: Genomics, Big Data, Medicine, and Complex Traits

Another good talk from the Genomics, Big Data, and Medicine Seminar Series at the Icahn School of Medicine (Mt. Sinai). Peter starts his talk by discussing height as a classical model trait, giving credit to Galton for first investigating heritability and related ideas, and noting the approximate additivity of genetic effects. @16min, state of the art genomic prediction of height from GIANT collaboration.

Interestingly, Visscher is Dutch for Fisher -- as in Ronald Fisher (the father of population genetics and early pioneer in statistics).

See Maxwell's demon and genetic engineering.

Ronald Fisher on positive alleles for intelligence, in Mendelism and Biometry (1911):

Suppose we knew, for example, 20 pairs of mental characters [loci in the genome]. These would combine in over a million pure mental types; [some of] these would naturally occur rather less frequently than once in a billion; or in a country like England about once in 20,000 generations [assuming the positive variants are somewhat rare]; it will give some idea as to the excellence of the best of these types when we consider that the Englishmen from Shakespeare to Darwin have occurred within 10 generations; the thought of a race of men combining the illustrious qualities of these giants, and breeding true to them, is almost too overwhelming, but such a race will inevitably arise in whatever country first sees the inheritance of mental characters elucidated.

Sunday, July 19, 2015

Technically Sweet

Regular readers will know that I've been interested in the so-called Teller-Ulam mechanism used in thermonuclear bombs. Recently I read Kenneth Ford's memoir Building the H Bomb: A Personal History. Ford was a student of John Wheeler, who brought him to Los Alamos to work on the H-bomb project. This led me to look again at Richard Rhodes's Dark Sun: The Making of Hydrogen Bomb. There is quite a lot of interesting material in these two books on the specific contributions of Ulam and Teller, and whether the Soviets came up with the idea themselves, or had help from spycraft. See also Sakharov's Third Idea and F > L > P > S.

The power of a megaton device is described below by a witness to the Soviet test.

The Soviet Union tested a two-stage, lithium-deuteride-fueled thermonuclear device on November 22, 1955, dropping it from a Tu-16 bomber to minimize fallout. It yielded 1.6 megatons, a yield deliberately reduced for the Semipalatinsk test from its design yield of 3 MT. According to Yuri Romanov, Andrei Sakharov and Yakov Zeldovich worked out the Teller-Ulam configuration in conversations together in early spring 1954, independently of the US development. “I recall how Andrei Dmitrievich gathered the young associates in his tiny office,” Romanov writes, “… and began talking about the amazing ability of materials with a high atomic number to be an excellent reflector of high-intensity, short-pulse radiation.” ...

Victor Adamsky remembers the shock wave from the new thermonuclear racing across the steppe toward the observers. “It was a front of moving air that you could see that differed in quality from the air before and after. It came, it was really terrible; the grass was covered with frost and the moving front thawed it, you felt it melting as it approached you.” Igor Kurchatov walked in to ground zero with Yuli Khariton after the test and was horrified to see the earth cratered even though the bomb had detonated above ten thousand feet. “That was such a terrible, monstrous sight,” he told Anatoli Alexandrov when he returned to Moscow. “That weapon must not be allowed ever to be used.”

The Teller-Ulam design uses radiation pressure (reflected photons) from a spherical fission bomb to compress the thermonuclear fuel. The design is (to quote Oppenheimer) "technically sweet" -- a glance at the diagram below should convince anyone who understands geometrical optics!

In discussions of human genetic engineering (clearly a potentially dangerous future technology), the analogy with nuclear weapons sometimes arises: what role do moral issues play in the development of new technologies with the potential to affect the future of humanity? In my opinion, genetic engineering of humans carries nothing like the existential risk of arsenals of Teller-Ulam devices. Genomic consequences will play out over long (generational) timescales, leaving room for us to assess outcomes and adapt accordingly. (In comparison, genetic modification of viruses, which could lead to pandemics, seems much more dangerous.)

It is my judgment in these things that when you see something that is technically sweet, you go ahead and do it and you argue about what to do about it only after you have had your technical success. -- Oppenheimer on the Teller-Ulam design for the H-bomb.

What is technically sweet about genomics? (1) the approximate additivity (linearity) of the genetic architecture of key traits such as human intelligence (2) the huge amounts of extant variance in the human population, enabling large improvements (3) matrices of human genomes are good compressed sensors, and one can estimate how much data is required to "solve" the genetic architecture of complex traits. See, e.g., Genius (Nautilus Magazine) and Genetic architecture and predictive modeling of quantitative traits.

More excerpts from Dark Sun below.

Enthusiasts of trans-generational epigenetics would do well to remember the danger of cognitive bias and the lesson of Lysenko. Marxian notions of heredity are dangerous because, although scientifically incorrect, they appeal to our egalitarian desires.

A commission arrived in Sarov one day to make sure everyone agreed with Soviet agronomist Trofim Lysenko's Marxian notions of heredity, which Stalin had endorsed. Sakharov expressed his belief in Mendelian genetics instead. The commission let the heresy pass, he writes, because of his “position and reputation at the Installation,” but the outspoken experimentalist Lev Altshuler, who similarly repudiated Lysenko, did not fare so well ...

The transmission of crucial memes from Szilard to Sakharov, across the Iron Curtain.

Andrei Sakharov stopped by Victor Adamsky's office at Sarov one day in 1961 to show him a story. It was Leo Szilard's short fiction “My Trial as a War Criminal,” one chapter of his book The Voice of the Dolphins, published that year in the US. “I'm not strong in English,” Adamsky says, “but I tried to read it through. A number of us discussed it. It was about a war between the USSR and the USA, a very devastating one, which brought victory to the USSR. Szilard and a number of other physicists are put under arrest and then face the court as war criminals for having created weapons of mass destruction. Neither they nor their lawyers could make up a cogent proof of their innocence. We were amazed by this paradox. You can't get away from the fact that we were developing weapons of mass destruction. We thought it was necessary. Such was our inner conviction. But still the moral aspect of it would not let Andrei Dmitrievich and some of us live in peace.” So the visionary Hungarian physicist Leo Szilard, who first conceived of a nuclear chain reaction crossing a London street on a gray Depression morning in 1933, delivered a note in a bottle to a secret Soviet laboratory that contributed to Andrei Sakharov's courageous work of protest that helped bring the US-Soviet nuclear arms race to an end.

Thursday, May 05, 2022

Raghuveer Parthasarathy: Four Physical Principles and Biophysics -- Manifold podcast #11

Raghu Parthasarathy is the Alec and Kay Keith Professor of Physics at the University of Oregon. His research focuses on biophysics, exploring systems in which the complex interactions between individual components, such as biomolecules or cells, can give rise to simple and robust physical patterns.

Raghu is the author of a recent popular science book, So Simple a Beginning: How Four Physical Principles Shape Our Living World.

Steve and Raghu discuss:

0:00 Introduction

1:34 Early life, transition from Physics to Biophysics

20:15 So Simple a Beginning: discussion of the Four Physical Principles in the title, which govern biological systems

26:06 DNA prediction

37:46 Machine learning / causality in science

46:23 Scaling (the fourth physical principle)

54:12 Who the book is for and what high schoolers are learning in their bio and physics classes

1:05:41 Science funding, grants, running a research lab

1:09:12 Scientific careers and radical sub-optimality of the existing system

Transcript.

Resources:

Book - https://press.princeton.edu/books/hardcover/9780691200408/so-simple-a-beginning

Raghuveer Parthasarathy's lab at the University of Oregon - https://pages.uoregon.edu/raghu/

Raghuveer Parthasarathy's blog the Eighteenth Elephant - https://eighteenthelephant.com/

Added from comments:

key holez • 2 days ago

It was a fascinating episode, and I immediately went out and ordered the book! One question that came to mind: given how much of the human genome is dedicated to complex regulatory mechanisms and not proteins as such, it seems unintuitive to me that so much of heritability seems to be additive. I would have thought that in a system with lots of complicated,messy on/off switches, small genetic differences would often lead to large phenotype differences -- but if what I've heard about polygenic prediction is right, then, empirically, assuming everything is linear seems to work just fine (outside of rare variants, maybe). Is there a clear explanation for how complex feedback patterns give rise to linearity in the end? Is it just another manifestation of the central limit theorem...?

steve hsu

This is an active area of research. It is somewhat surprising even to me how well linearity / additivity holds in human genetics. Searches for non-linear effects on complex traits have been largely unsuccessful -- i.e., in the sense that most of the variance seems to be controlled by additive effects. By now this has been investigated for large numbers of traits including major diseases, quantitive traits such as blood biomarkers, height, cognitive ability, etc.

One possible explanation is that because humans are so similar to each other, and have passed through tight evolutionary bottlenecks, *individual differences* between humans are mainly due to small additive effects, located both in regulatory and coding regions.

To genetically edit a human into a frog presumably requires many changes in loci with big nonlinear effects. However, it may be the case that almost all such genetic variants are *fixed* in the human population: what makes two individuals different from each other is mainly small additive effects.

Zooming out slightly, the implications for human genetic engineering are very positive. Vast pools of additive variance means that multiplex gene editing will not be impossibly hard...

This topic is discussed further in the review article: https://arxiv.org/abs/2101.05870

Information Processing

About Me