A diffusion model for the fate of tandem gene duplicates in diploids
Introduction
In O’Hely (2006) (denoted hereafter as (I)) I used a diffusion approximation to model the haploid version of a gene duplication scenario previously analysed via simulations in Lynch et al. (2001). This note indicates a way to treat the fully diploid model.
Both (I) and this paper consider tandem duplications, that is, a duplication where the locus of the duplicate gene is fully linked to the original locus. Thus the haplotype of a chromosome across these two loci may be represented by a meta-allele. In (I) these alleles were written as A, AA and Aa representing chromosomes not descended from a duplicate, those with intact copies of the gene at both loci, and those carrying one intact and one mutated copy of the gene. It was assumed that the absence of an intact copy of the gene was lethal, so types that might have been represented by a and aa were never present in the population. The technique used was to write down an overlapping-generations Markov chain for a finite population and analyse its diffusion limit. This resulted in an ODE for the probability of the fixation of the Aa type given that the initial frequency of Aa's was zero.
In this paper I maintain the requirement that an individual carry at least one intact copy of the gene, or else be subject to strong (but not necessarily lethal) selection against it. For such a diploid system, even with lethal selection, I can no longer assume that the frequencies of a and aa haplotypes remain deterministically zero since these can now be shielded from selection by pairing with other, functional, types. However, the frequencies of these haplotypes are maintained, in a diffusion limit, at levels corresponding to mutation-selection balance. Aspects of the present analysis are different from (I): most notably I begin here with a finite state space Markov model with non-overlapping generations (i.e. a Wright–Fisher model). I also use a different notation for the alleles, since descriptions of diploid genotypes in the old notation become unwieldy. Finally, while previously I considered the probabilities of reaching “absorbing” states of the process corresponding to fixations of particular (meta-) alleles, in the diploid case it will be seen that there is no reasonable concept of fixation and instead I concentrate on absorbing sets which correspond to loss of particular alleles. In particular I use the term preservation to refer to the event that, starting from a small initial quantity of duplicates among an otherwise wild type population with nulls present at frequencies given by mutation-selection balance, the wild alleles are eventually lost (or equivalently, the entire population is eventually descendants of the initial duplicate chromosomes). I will also consider the relative probability of preservation which is the probability of preservation divided by the probability that a neutral variant of the wild type allele, starting from the same small initial frequency, eventually fixes in the population. Note that by “preservation” I mean not that the duplicate eventually replaces the original gene in providing its function, but that evidence of the duplication eventually remains permanently in the genome of every individual in the population.
The paper is organized as follows. Section 2 presents the transformations of allele frequencies which make up the deterministic part of a generation. In doing this I find a function of the allele frequencies is unchanged over the passing of a generation. This permits the calculation of the probability of preservation in the limiting large-population case. The argument is closely related to similar large population limiting cases in (I) and Lynch et al. (2001). Section 3 then provides three generalizations of the deterministic model. First is the situation of higher gene copy numbers, for which a similar argument provides the relative probability of preservation for the large population limit. Then I consider mutation structures arising from situations other than simple gene duplications. In a model closely related to the duplication scenario I identify the invariant function, and in a case generalizing a K-plication, I determine conditions under which such an invariant function exists.
The stochastic model is introduced in Section 4 using multinomial sampling of the allele frequencies. I find a change of variables which allows the means, variances and covariances among the allele frequencies to be written in a convenient manner. Ordinarily one would derive a diffusion approximation for the allele frequencies, however, as I treat strong selection this is not possible. Instead I consider ratios of the allele frequencies to a linear combination of them, which has been specially chosen based on the work in Section 3. A careful treatment of the moments of these ratios does allow a diffusion approximation to be made, once the “faster” evolution driven by strong selection has been accounted for. This accounting comprises Section 5. Section 6 is then devoted to analysing the approximating diffusion. This turns out to be relatively simple since I arrive quickly at an equation which is effectively identical to one found in (I). Section 7 then sets out the main steps in analysing preservation probabilities in a more general mutation structure. I conclude with a discussion which includes mention of some of the subtleties of the technique, and considers two applications of the technique and results.
Section snippets
The deterministic model
Consider a mutational system with four alleles H2, H1, and H0 in a diploid population. A “normal” population only contains and H0 alleles but a duplicated allele H2 arises via some (unspecified) process. Both its gene copies are subject to mutations, as is the single copy of the gene present in each allele; an allele which is derived from the H2 but which has one of its loci inactivated due to mutations is an H1, and is subject to mutations at the remaining copy just like an allele.
Generalization to higher gene copy numbers
This result concerning is robust to the degree of selection against the H0 homozygote and is very easily extended to higher gene copy numbers. Consider a system with alleles, denoted and . is a “wild type” chromosome, with one copy of the gene. is a chromosome that, through some unspecified means, carries K fully linked copies of the gene. For , is a chromosome descended from an but in which precisely copies of the gene have been rendered
The stochastic model
Stochastic analogues of the preceding models arise by taking, at some stage in the deterministic transformations comprising a generation, a multinomial sample of size either N or where N is the number of individuals in the population (the choice depends on which phase of the deterministic system the sampling occurs in). I follow Ethier and Nagylaki (1989) and sample right at the end of the generation (as described above), taking as a draw from a multinomial- divided
Mutation-selection balance in the model
I first consider a direct analysis of mutation-selection balance in terms of the allele frequencies. Based on this I consider the implications for the : how the balance is described by these, and the dynamics of the approach to it.
The diffusion approximation and the preservation probability
Equations (4.3)–(4.5), (5.1) and (5.2) show that the hypotheses of Ethier and Nagylaki (1980) are satisfied in any compact neighbourhood of in which is bounded away from zero uniformly in N, with fast and slow timescales measured in units of generations and N generations, respectively. Thus in the limit as , the process approaches in distribution that of a diffusion process with moments given by
The stochastic model for general mutation structures
This section states the changes that must be made to the analysis if, instead of the specific mutation rates dealt with up to now, the mutation rates are the most general possible while still retaining tractability in the deterministic case, as determined in Section 3.2. There I found that is invariant as generations pass in the deterministic model, where is 2 in the case treated above. Thus I start by redefining . The definitions of in terms of D (4.2)
Discussion
The goal of this paper was to extend the results obtained in (I) to the case of a diploid population. The chief reason why this is not a straightforward generalization is that even when one considers the duplication of a gene which is essential to an organism, diploidy allows chromosomes without a functional copy of the gene to persist in the population, sheltered by pairing with functional chromosomes. Consequently an extra dimension is added to the problem. The strategy I have used
Acknowledgments
I thank Josh Ross for comments on an earlier version of the manuscript and Thierry Wirth for useful discussions. Two anonymous referees are thanked for comments which have improved the manuscript. The support of the Australian Research Council Centre of Excellence for Mathematics and Statistics of Complex Systems is gratefully acknowledged.
References (8)
- et al.
Role of selection in fixation of gene duplications
J. Theor. Biol.
(2006) - et al.
Evolution of recombination due to random drift
Genetics
(2005) - Buchholz, H., 1969. The Confluent Hypergeometric Function. Springer, Berlin, Translated from German by H. Lichtblau and...
- et al.
Diffusion approximations of Markov chains with two time scales and applications to population genetics
Adv. Appl. Probab.
(1980)