A diffusion model for the fate of tandem gene duplicates in diploids

doi:10.1016/j.tpb.2007.01.003

Theoretical Population Biology

Volume 71, Issue 4, June 2007, Pages 491-501

https://doi.org/10.1016/j.tpb.2007.01.003 Get rights and content

Abstract

Suppose one chromosome in one member of a population somehow acquires a duplicate copy of the gene, fully linked to the original gene's locus. Preservation is the event that eventually every chromosome in the population is a descendant of the one which initially carried the duplicate. For a haploid population in which the absence of all copies of the gene is lethal, the probability of preservation has recently been estimated via a diffusion approximation. That approximation is shown to carry over to the case of diploids and arbitrary strong selection against the absence of the gene. The techniques used lead to some new results. In the large population limit, it is shown that the relative probability that descendants of a small number of individuals carrying multiple copies of the gene fix in the population is proportional to the number of copies carried. The probability of preservation is approximated when chromosomes carrying two copies of the gene are subject to additional, fully non-functionalizing mutations, thereby modelling either an additional cost of replicating a longer genome, or a partial duplication of the gene. In the latter case the preservation probability depends only on the mutation rate to null for the duplicated portion of the gene.

Introduction

In O’Hely (2006) (denoted hereafter as (I)) I used a diffusion approximation to model the haploid version of a gene duplication scenario previously analysed via simulations in Lynch et al. (2001). This note indicates a way to treat the fully diploid model.

Both (I) and this paper consider tandem duplications, that is, a duplication where the locus of the duplicate gene is fully linked to the original locus. Thus the haplotype of a chromosome across these two loci may be represented by a meta-allele. In (I) these alleles were written as A, AA and Aa representing chromosomes not descended from a duplicate, those with intact copies of the gene at both loci, and those carrying one intact and one mutated copy of the gene. It was assumed that the absence of an intact copy of the gene was lethal, so types that might have been represented by a and aa were never present in the population. The technique used was to write down an overlapping-generations Markov chain for a finite population and analyse its diffusion limit. This resulted in an ODE for the probability of the fixation of the Aa type given that the initial frequency of Aa's was zero.

In this paper I maintain the requirement that an individual carry at least one intact copy of the gene, or else be subject to strong (but not necessarily lethal) selection against it. For such a diploid system, even with lethal selection, I can no longer assume that the frequencies of a and aa haplotypes remain deterministically zero since these can now be shielded from selection by pairing with other, functional, types. However, the frequencies of these haplotypes are maintained, in a diffusion limit, at levels corresponding to mutation-selection balance. Aspects of the present analysis are different from (I): most notably I begin here with a finite state space Markov model with non-overlapping generations (i.e. a Wright–Fisher model). I also use a different notation for the alleles, since descriptions of diploid genotypes in the old notation become unwieldy. Finally, while previously I considered the probabilities of reaching “absorbing” states of the process corresponding to fixations of particular (meta-) alleles, in the diploid case it will be seen that there is no reasonable concept of fixation and instead I concentrate on absorbing sets which correspond to loss of particular alleles. In particular I use the term preservation to refer to the event that, starting from a small initial quantity of duplicates among an otherwise wild type population with nulls present at frequencies given by mutation-selection balance, the wild alleles are eventually lost (or equivalently, the entire population is eventually descendants of the initial duplicate chromosomes). I will also consider the relative probability of preservation which is the probability of preservation divided by the probability that a neutral variant of the wild type allele, starting from the same small initial frequency, eventually fixes in the population. Note that by “preservation” I mean not that the duplicate eventually replaces the original gene in providing its function, but that evidence of the duplication eventually remains permanently in the genome of every individual in the population.

The paper is organized as follows. Section 2 presents the transformations of allele frequencies which make up the deterministic part of a generation. In doing this I find a function of the allele frequencies is unchanged over the passing of a generation. This permits the calculation of the probability of preservation in the limiting large-population case. The argument is closely related to similar large population limiting cases in (I) and Lynch et al. (2001). Section 3 then provides three generalizations of the deterministic model. First is the situation of higher gene copy numbers, for which a similar argument provides the relative probability of preservation for the large population limit. Then I consider mutation structures arising from situations other than simple gene duplications. In a model closely related to the duplication scenario I identify the invariant function, and in a case generalizing a K-plication, I determine conditions under which such an invariant function exists.

The stochastic model is introduced in Section 4 using multinomial sampling of the allele frequencies. I find a change of variables which allows the means, variances and covariances among the allele frequencies to be written in a convenient manner. Ordinarily one would derive a diffusion approximation for the allele frequencies, however, as I treat strong selection this is not possible. Instead I consider ratios of the allele frequencies to a linear combination of them, which has been specially chosen based on the work in Section 3. A careful treatment of the moments of these ratios does allow a diffusion approximation to be made, once the “faster” evolution driven by strong selection has been accounted for. This accounting comprises Section 5. Section 6 is then devoted to analysing the approximating diffusion. This turns out to be relatively simple since I arrive quickly at an equation which is effectively identical to one found in (I). Section 7 then sets out the main steps in analysing preservation probabilities in a more general mutation structure. I conclude with a discussion which includes mention of some of the subtleties of the technique, and considers two applications of the technique and results.

Section snippets

The deterministic model

Consider a mutational system with four alleles H₂, H₁, $H_{w}$ and H₀ in a diploid population. A “normal” population only contains $H_{w}$ and H₀ alleles but a duplicated allele H₂ arises via some (unspecified) process. Both its gene copies are subject to mutations, as is the single copy of the gene present in each $H_{w}$ allele; an allele which is derived from the H₂ but which has one of its loci inactivated due to mutations is an H₁, and is subject to mutations at the remaining copy just like an $H_{w}$ allele.

Generalization to higher gene copy numbers

This result concerning $(2 p_{2} + p_{1}) / p_{w}$ is robust to the degree of selection against the H₀ homozygote and is very easily extended to higher gene copy numbers. Consider a system with $K + 2$ alleles, denoted $H_{0}, H_{1}, \dots, H_{K}$ and $H_{w}$ . $H_{w}$ is a “wild type” chromosome, with one copy of the gene. $H_{K}$ is a chromosome that, through some unspecified means, carries K fully linked copies of the gene. For $i = 1, \dots, K - 1$ , $H_{i}$ is a chromosome descended from an $H_{K}$ but in which precisely $K - i$ copies of the gene have been rendered

The stochastic model

Stochastic analogues of the preceding models arise by taking, at some stage in the deterministic transformations comprising a generation, a multinomial sample of size either N or $2 N$ where N is the number of individuals in the population (the choice depends on which phase of the deterministic system the sampling occurs in). I follow Ethier and Nagylaki (1989) and sample right at the end of the generation (as described above), taking $(p_{i}^{‴})_{i = 0}^{2}$ as a draw from a multinomial- $((p_{i}^{″})_{i = 0}^{2}; 2 N)$ divided

Mutation-selection balance in the model

I first consider a direct analysis of mutation-selection balance in terms of the allele frequencies. Based on this I consider the implications for the $A_{i}$ : how the balance is described by these, and the dynamics of the approach to it.

The diffusion approximation and the preservation probability

Equations (4.3)–(4.5), (5.1) and (5.2) show that the hypotheses of Ethier and Nagylaki (1980) are satisfied in any compact neighbourhood of $A_{0} + A_{2} = 1$ in which $A_{0}$ is bounded away from zero uniformly in N, with fast and slow timescales measured in units of generations and N generations, respectively. Thus in the limit as $N \to \infty$ , the process $(A_{1}, A_{2})_{tN}$ approaches in distribution that of a diffusion process $(X_{1}, X_{2})_{t}$ with moments given by $E (δ X_{1}) = 2 X_{2} (θ + X_{1} (1 - X_{2})) δ t,$ $E (δ X_{2}) = - X_{2} (θ + (1 - X_{2}) (1 - 2 X_{2})) δ t,$ $Var (X_{1}) = (X_{1} (1$

The stochastic model for general mutation structures

This section states the changes that must be made to the analysis if, instead of the specific mutation rates dealt with up to now, the mutation rates are the most general possible while still retaining tractability in the deterministic case, as determined in Section 3.2. There I found that $(α p_{2} + p_{1}) / p_{w}$ is invariant as generations pass in the deterministic model, where $α = b / (b + c - d)$ is 2 in the case treated above. Thus I start by redefining $D ≔ p_{w} + p_{1} + α p_{2}$ . The definitions of $A_{i}$ in terms of D (4.2)

Discussion

The goal of this paper was to extend the results obtained in (I) to the case of a diploid population. The chief reason why this is not a straightforward generalization is that even when one considers the duplication of a gene which is essential to an organism, diploidy allows chromosomes without a functional copy of the gene to persist in the population, sheltered by pairing with functional chromosomes. Consequently an extra dimension $(p_{0})$ is added to the problem. The strategy I have used

Acknowledgments

I thank Josh Ross for comments on an earlier version of the manuscript and Thierry Wirth for useful discussions. Two anonymous referees are thanked for comments which have improved the manuscript. The support of the Australian Research Council Centre of Excellence for Mathematics and Statistics of Complex Systems is gratefully acknowledged.

References (8)

F.A. Kondrashov et al.
Role of selection in fixation of gene duplications
J. Theor. Biol.
(2006)
N.H. Barton et al.
Evolution of recombination due to random drift
Genetics
(2005)
Buchholz, H., 1969. The Confluent Hypergeometric Function. Springer, Berlin, Translated from German by H. Lichtblau and...
S.N. Ethier et al.
Diffusion approximations of Markov chains with two time scales and applications to population genetics
Adv. Appl. Probab.
(1980)

There are more references available in the full text version of this article.

Cited by (0)

View full text