Fine Mapping of Complex Trait Genes Combining Pedigree and Linkage Disequilibrium Information: A Bayesian Unified Framework
 Miguel PérezEnciso1
 Institut National de la Recherche Agronomique, Station d’Amélioration Génétique des Animaux, 31326 CastanetTolosan, France
 1 Address for correspondence: INRA, Station d’Amélioration Génétique des Animaux, BP 27, Cedex 31326 CastanetTolosan, France. Email: mperez{at}toulouse.inra.fr
Abstract
We present a Bayesian method that combines linkage and linkage disequilibrium (LDL) information for quantitative trait locus (QTL) mapping. This method uses jointly all marker information (haplotypes) and all available pedigree information; i.e., it is not restricted to any specific experimental design and it is not required that phases are known. Infinitesimal genetic effects or environmental noise (“fixed”) effects can equally be fitted. A diallelic QTL is assumed and both additive and dominant effects can be estimated. We have implemented a combined Gibbs/MetropolisHastings sampling to obtain the marginal posterior distributions of the parameters of interest. We have also implemented a Bayesian variant of usual disequilibrium measures like D′ and r^{2} between QTL and markers. We illustrate the method with simulated data in “simple” (twogeneration fullsib families) and “complex” (fourgeneration) pedigrees. We compared the estimates with and without using linkage disequilibrium information. In general, using LDL resulted in estimates of QTL position that were much better than linkageonly estimates when there was complete disequilibrium between the mutant QTL allele and the marker. This advantage, however, decreased when the association was only partial. In all cases, additive and dominant effects were estimated accurately either with or without disequilibrium information.
AN ultimate goal of quantitative trait loci (QTL) studies is to clone the gene(s) responsible for the genetic differences between individuals and, eventually, identify the causal mutation(s). Certainly, this is a daunting task that will be accomplished only gradually. One of the most severe limitations, at the moment, is that the QTL position is estimated with too large an error to allow positional cloning when a classical linkage analysis is employed. The 95% confidence interval for the QTL position usually spans over 520 cM, at a minimum. The wide confidence interval occurs because the number of meioses in the genotyped pedigree is usually very small; only between two and three generations are generally employed. Linkage disequilibrium (LD)based methods, in contrast, capitalize on the number of generations that occurred since the appearance of mutation and can produce extremely accurate estimates of the gene position, within kilobases in some instances (Hastbackaet al. 1994). Nevertheless, the chance of success of the LD strategy depends on a number of population parameters, such as the degree of admixture in the sampled population, the actual level of association between the causal mutation and the polymorphisms, or the correct ascertainment of phases and of genotypes at the QTL. Of course these parameters are usually unknown but do dramatically affect the results (Terwilliger and Weiss 1998). In fact, a pure LD analysis is likely to result in a large number of false positives as illustrated recently, e.g., in Alzheimer’s disease (Emahazionet al. 2001).
A promising approach is thus to combine both linkage and linkage disequilibrium (LDL) methods to add their advantages in a single unified theoretical framework. More specifically, there is an urgent need for robust methods that provide accurate estimation of the QTL position. Consider for the sake of illustration a simple design where a number of nuclear families are typed, i.e., parents and offspring. The theoretical advantages of combining linkage disequilibrium and pedigree (linkage) information in QTL analysis are manifold: (i) A marker for which a parent is homozygous does not contribute information in a linkage analysis, yet it does in LD analysis; (ii) conversely, two parents may share the same haplotype but not necessarily the same QTL genotypes, and a pure LD analysis would be misleading but the phenotype of offspring together with the ascertainment of alleles transmitted can be used to determine which are the most likely QTL genotypes of the parents; (iii) an individual without relatives but with phenotype records can be included in the LD analysis, in contrast to a pure linkage study; and (iv) a comparison of the analyses including or not the LD information can assess the validity of the LD model assumptions (i.e., one mutation t generations ago).
Several authors have addressed the problem of combining LD and linkage mapping for quantitative trait loci (Zhaoet al. 1998; Allisonet al. 1999; Almasyet al. 1999; Fulkeret al. 1999; Wu and Zeng 2001; Farniret al. 2002; Meuwissenet al. 2002), whereas Xiong and Jin (2000) proposed a method suited to disease susceptibility genes. Zhao et al. (1998) developed a semiparametric procedure based on the scoreestimating equation approach and that addressed the particular case of singlenucleotide polymorphisms. This is one of the first articles to provide a theoretical framework for LDL mapping but the estimating equation approaches are difficult to implement in practice; they require complex computations adapted to each family structure. For instance, the method sums over all possible phases and computes their probabilities, which is extremely complex to do in practice beyond a few markers. The statistical properties of these estimators are also unknown.
Fulker et al. (1999) developed a sibpair analysis in a likelihood framework. The approach followed by Allison et al. (1999) is a generalization of the transmission disequilibrium test (TDT) for quantitative traits (Allison 1997), where a between and withinfamily association parameter is modeled via a mixed model. Neither the Fulker et al. (1999) nor Allison et al. (1999) methods are very suited to analyzing complex pedigrees as they consider sib pairs (Fulkeret al. 1999) or parentoffspring trios (Allisonet al. 1999) and their theoretical framework is difficult to generalize to more complex settings. TDT in particular is not an optimum choice to deal with very polymorphic markers like microsatellites and makes use of only a limited amount of the total information contained in a typical pedigree. Meuwissen et al. (2002), in turn, proposed to model the QTL alleles as a random variable, where the covariance between base population haplotypes allows the inclusion of the LD information (Meuwissen and Goddard 2000), and the covariance between nonbase population haplotypes was computed as in Fernando and Grossman (1989) and Goddard (1992). They estimated the position via maximum likelihood. The model followed by these authors is different from the usual LD, where a diallelic QTL is assumed. The key issue in their method is to compute the identitybydescent probabilities between the base population haplotypes, and this was done by considering the number of identitybystate alleles shared by any two haplotypes, along the lines also suggested by McPeek and Strahs (1999). They assumed that phases are known, which is a reasonable assumption only if families are very large, e.g., as in dairy cattle. Otherwise, QTL positioning can be dramatically affected if a phase is incorrectly specified. Farnir et al. (2002) developed an analytical approach for combining linkage and LD in halfsib families, where the disequilibrium information is incorporated via Terwilliger’s (1995) approach. Their method would be very cumbersome to generalize to more complex populations; in addition, phases are assumed to be known and it is not a true multipoint method. The method of Wu and Zeng (2001), intended for natural populations, is also difficult to apply to complex pedigrees.
Here we present a Bayesian method that combines linkage and LD information for QTL mapping within a unified theoretical framework. Our LDL method uses jointly all marker information, as well as all available pedigree information; i.e., it is not restricted to any specific experimental design and it is not required that phases be known. If desired, infinitesimal genetic effects or environmental noise (fixed) effects can also be fitted. A diallelic QTL is assumed and both additive and dominant effects can be estimated. We have implemented a combined Gibbs/MetropolisHastings sampling to obtain the marginal posterior distributions of the parameters of interest. We illustrate the method with simulated data.
THEORY
We assume that the goal of the analysis is to fine map a QTL that has been previously located within a given genome region. The genetic model presupposes that a single mutation occurred t generations ago on a gene affecting the trait studied. Thus, initially, a single ancestral (founder) haplotype harbored the mutation. The number of haplotypes carrying the mutation increases in successive generations provided that the mutation is not lost and, due to recombination, the initial allele combination is eroded. The amount of disequilibrium between markers and QTL decreases proportionally to genetic distance and to the number of generations elapsed since mutation. Here we use the population model for linkage disequilibrium decay described in Morris et al. (2000), with modifications described below. Briefly, a binary variable S_{ki} is defined such that, at any kth marker locus and ith individual, the locus will be either identical by descent (IBD) with the original haplotype carrying the mutation (S_{ki} =) or not (S_{ki} = +), with minus and plus signs standing for the mutant and wild haplotype alleles, respectively. By convention we denote the QTL by locus 0. A Markov chain Monte Carlo (MCMC) method was provided by Morris et al. (2000) to obtain the transition probabilities of a locus being IBD or not at locus k + 1 conditional on being IBD or not at locus k.
Now suppose that the QTL additive and dominance effects are a and d, respectively; i.e., the mean phenotype of the individuals homozygous for the wild allele (+/+) minus that of individuals homozygous for the mutant allele (/) is 2a, whereas the mean phenotype of heterozygous individuals, (+/) or (/+), is d. Suppose further that a number m of individuals have been typed for DNA markers, contained in matrix M, and that phenotypic measurements (y) are available on a subset of n individuals. The linear explicative model is
The goal of the analysis is to obtain estimates of the set of parameters,
The Bayesian inference is based upon the posterior distribution of the parameters,
The rest of this section is devoted to presenting the main conditional distributions to sample from to obtain the posterior distribution of the parameters of interest. For the reader less interested in the mathematical details, this part can be summarized as follows. For the base population individuals (those without ancestors genotyped) we use their marker haplotypes and the phenotypic information of their descendants, in addition to the prior allele frequencies, to ascertain the more likely QTL genotypes. The LD signal is incorporated into the model via the distribution p(Mθ), which quantifies the probability of an individual carrying a certain marker haplotype conditional on its QTL genotype and other population parameters, like the age of the mutation. We assume a starshaped genealogy. We suppose that base population individuals are genotyped for most of the markers but not that phases are known; they are inferred from the offspring genotypes. LD or allele frequency priors do not contribute any information to obtain the genotypes of the descendant individuals (conditionally on the genotypes of the base population) and are sampled following the most likely recombinants as inferred from marker information. Once the QTL alleles are sampled, most of the remaining parameters are obtained via a classical Gibbs sampling within the mixedmodel context (Sorensen and Gianola 2002). In contrast, MetropolisHastings is required for the QTL position; here we identify where recombinants have occurred at two alternative positions and the resulting likelihoods using available phenotypic information are compared (Uimari and Sillanpää 2001).
Base population QTL genotypes (S_{0}): In the absence of LD information, only the phenotypes of the individuals that have received a given base population allele provide information about the likely value of that allele. This is illustrated in the simple pedigree of Figure 1; the solid lines represent the transmitted alleles, stored in T. Suppose that we are sampling the IBD status of first individual and first allele (S_{011}), conditional on all other parameters including the S_{0} of the remaining individuals (denoted by θ_). The phenotypes of individuals 1, 5, 6, and 7 influence the probability p(S_{011}θ_, y, M). In contrast, p(S_{012}θ_, y, M), corresponding to the second QTL allele, involves only the phenotype of individual 1, as this allele was not transmitted. If that individual does not have phenotype recorded, p(S_{012}θ_, y, M) is strictly proportional to the prior frequencies for each QTL allele, when LD information is not being used. We denote by ψ_{i} the set of individuals that have received at least one allele for individual i and have phenotypes, i.e., ψ_{1} = {1, 5, 6, 7}, ψ_{2} = {5, 6}, and ψ_{3} =ψ_{4} = {3, 4, 8}. Note that the set ψ may vary from iteration to iteration as a new T is sampled. If LD information is being used, p(S_{0}θ_, y, M) also depends on the marker alleles of the base population individuals. Using all sources of information, the QTL IBD status of the ith base population individual can be sampled from the fully conditional distribution,
The distribution p(M_{i}θ) in (4) is the probability of having marker alleles linked in haplotype 1 or 2 (say M_{i}_{1} or M_{i}_{2}) conditional on a given QTL genotype, its position relative to DNA markers, and the parameter governing the LD decay (t). Both haplotypes are conditionally independent; thus p(M_{i}θ) = p(M_{i}_{1}, M_{i}_{2}S_{0}_{i}_{1}, S_{0}_{i}_{2}, t, H_{i}, δ) = p(M_{i}_{1}S_{0}_{i}_{1}, t, H_{i}, δ)p(M_{i}_{2}S_{0}_{i}_{2}, t, H_{i}, δ), where M_{i}_{1} contains the marker alleles received from the father and M_{i}_{2}, those of mother’s origin. Consider the marker alleles of a given individual i at haplotype h (M_{ih}); in our notation L markers are to the left and R markers to the right of the current QTL position. Then,
At any marker locus, k, the locus will be either IBD with the original haplotype carrying the mutation (S_{k} =) or not (S_{k} =+). The term p(M_{k}S_{k}) contains the marker allele probabilities conditional on S_{k}; p(M_{k}S_{k} =+) is simply given by the population allele frequencies. In contrast, p(M_{k}S_{k} =) will be 1 for the allele that carried the mutant haplotype and 0 for the remaining alleles. The vector p(MS_{L} =... S_{1} = S_{1} = S_{R} =) is the original haplotype that carried the mutation. Of course this haplotype is unknown but can be inferred as shown by Morris et al. (2000). Here we have preferred to consider both S_{k} and p(M_{k}S_{k}) as nuisance parameters; i.e., we are not usually interested directly in them, and thus we integrate them out in (5). As a result, p(M_{k}S_{k} =) is no longer 0’s and 1’s but can take any value between the two extremes. The appendix shows how p(M_{k}S_{k}) is updated.
The transition probabilities p(S_{k}S_{k}_{1}) can be obtained as detailed in Morris et al. (2000) and depend on the effective size and time since mutation. Four transition probabilities need to be specified, which are
Expression (5) is extremely difficult to compute. However, we can rearrange as
Finally, p(S_{0}_{i}_{1}, S_{0}_{i}_{2}) in Equation 4 is the a priori probability of the IBD state of the two QTL alleles with the original mutant haplotype. When the individual is not inbred, p(S_{0}_{i}_{1}, S_{0}_{i}_{2}) = p(S_{0}_{i}_{1})p(S_{0}_{i}_{2}), where the prior probabilities are the same for any base population allele. If the ith individual is known to be inbred from the available pedigree with inbreeding coefficient f_{i}, p(S_{0}_{i}_{1}, S_{0}_{i}_{2}) = (1  f_{i})p(S_{0}_{i}_{1})p(S_{0}_{i}_{2}) + f_{i}p(S_{0}_{i}_{2})η(S_{0}_{i}_{1}S_{0}_{i}_{2}), with η being an indicator 1/0 function that makes S_{0}_{i}_{1} take the same value as S_{0}_{i}_{2}. The prior probability of an allele being identical by descent with the original mutant haplotype is α if the base population individuals have been sampled at random from the population, i.e., p(S_{0}_{i} =) = α and p(S_{0}_{i} =+) = 1 α for every individual. Otherwise, e.g., case/control study or selective genotyping, the probabilities have to be modified accordingly (Morriset al. 2000).
In summary, to sample the IBD states at the QTL position we evaluate Equation 4 at all four possible QTL genotypes, i.e., (+/+), (+/), (/+), (/), for each base population individual in turn, and we take a random number according to the genotype probabilities. Both alleles are thus sampled simultaneously. Nevertheless, this strategy can be ameliorated by sampling larger blocks of base population IBD states. Suppose IBD states of base population individuals 1 through c are sampled; then
The rest of the sampling distributions required are detailed in the appendix. Once all variables are initialized, the Markov chain Monte Carlo (MCMC) chain consists of iterating successively via Equations 4 or 6 and A2A7a plus updating the phases (H), p(MS), and the transmission indicators (T). Obviously, in a linkageonly approach,(6b), the sampling is simplified by not sampling p(MS) and time since mutation (t). The procedure is otherwise identical.
Twomarker disequilibrium measures: LD measurements like D′ (Hedrick 1987; Lewontin 1988) rely on the possibility of ascertaining the linkage phases and the alleles themselves, which is not possible with quantitative traits because the QTL genotypes are not known. Nevertheless, phases and QTL alleles are generated each iteration so we can define a Bayesian estimate of D′ between any marker and the QTL, computing D′ at the current configuration using the formula
SIMULATION
Two population types that can typically be found in livestock, with “simple” and “complex” pedigrees, were simulated. The simple population consisted of 40 unrelated fullsib families, 10 offspring per family. The complex population was a fourgeneration pedigree, with a base population of 80 unrelated parents that produced 40 fullsib families of size 5 (generation 2), whereas generations 3 and 4 consisted of 20 fullsib families (5 offspring per family). Parents were chosen at random except in generation 1, where all parents had an equal number of offspring. Both simple and complex pedigrees had a total of 480 individuals. The explored region spanned 25 cM and contained six microsatellites at positions 0, 5, 10, 15, 20, and 25 cM, together with 10 singlenucleotide polymorphisms (SNPs) located at positions 11, 12, 13, 14, 16, 17, 18, 19, 21, and 22 cM. SNP allele frequencies were 0.3 and 0.7, whereas there were six alleles at equal frequencies for each microsatellite. The QTL was located in position 18 cM, its additive effect was a = 1, there was no dominance (d = 0), and the residual variance was
Three replicates of each case were run, resulting in 12 analyses in total. The only fixed effect included in the analyses was the general mean. The maximum change in QTL position was set to 0.5 cM in each direction. We ran 50,000 iterations of the MCMC chain, discarding the first 4000 iterations. Eight origins were sampled jointly; thus p(S_{0}_{i}_{1}, S_{0}_{i}_{2},..., S_{0}_{c}_{2}y, M, θ_{}) can take 2^{8} = 256 values because the QTL is assumed to be diallelic. Phases were updated in blocks of six. Each complete iteration took ∼3.5 sec on an alpha workstation with processor 21164A. The computing time per iteration is highly dependent on the number of paths and phases updated simultaneously.
RESULTS
Table 2 presents the mean and SD of the marginal posterior distributions for the main parameters in the case of complete association. The posterior distributions for the additive and dominant effects in the first replicate are plotted in Figure 2a and provide a whole picture about the uncertainty regarding these parameters. Results were very similar for all replicates so only one is presented. The estimates of the genetic effects and the residual variance were quite accurate, and the SDs of their posterior distributions were small, indicating that there is enough information in the data to estimate these parameters. The 95% highest density region contained the true values of a, d, and
Results concerning the incomplete association scenario are presented in Table 3 and Figure 4. As expected, the estimates of the QTL effects were similar to those in Table 2, albeit the SDs were somewhat larger in particular for the dominance effect. Replicate 2 of the complex pedigree had unusually large SD of the posterior distributions of a and d. But more importantly, the accuracy of the QTL position was generally much smaller with incomplete than with complete association (note that the scales of the yaxes are different in Figures 3 and 4). It is also apparent that the mode of the posterior distribution coincided with the true position only once (replicate 1, complex pedigree) although it was close, positions 0.160.17 M, in the remaining replicates with the LDL approach. In some instances (replicate 1, simple pedigree) the posterior density was very flat and covered almost the whole region under study. In principle, linkageonly estimates should not be greatly affected by either complete or incomplete association, because the accuracy depends mainly on the informativity of markers to identify recombinant haplotypes. This seems to be the case if we exclude the rather outlying replicate 1 (simple pedigree, Figure 4). The average SD of the QTL position posterior density was 4 cM in the linkageonly approach for both complete and incomplete association scenarios. In contrast, it was 2.2 and 3 cM using LDL in the complete and incomplete scenarios, respectively.
Contrary to the estimates of QTL genetic effects or position, the LD decay parameter t was loosely estimated (Tables 2 and 3). This means that there is little information in the data to estimate them. In fact, we observed that p(Mθ) was quite flat for different values of t. A positive reading is that the exact figures for t did not affect the final results to a large extent, as we found similar output when we fitted these parameters to a variety of values, in agreement with previous results (Meuwissen and Goddard 2000).
Finally, Figure 5 draws a plot of the simple disequilibrium measures between each marker and the QTL, D′ and r, for the three simple pedigrees. D′ and r measures obtained under both statistical methods LDL and linkageonly are plotted. The two top and bottom plots correspond to the complete and incomplete LD scenarios, respectively. The most striking feature is, perhaps, the extreme differences in behavior between D′ and r. Under complete LD, the pattern of r was much more stable behavior than that of D′, as there was very little variation between replicates and r peaked clearly at the QTL position (18 cM). In contrast, D′ had a much larger variability between replicates and was clearly multimodal in several instances. Nevertheless, these two measures showed clear maxima at or close to the true QTL position under complete disequilibrium. The picture changes dramatically in the incomplete LD scenario. Here r had maxima only at the nearest microsatellites (15 and 20 cM) but a very flat curve was apparent in clear contrast with the complete LD case. The pattern for D′ was not as affected by incomplete LD (Figure 5, bottom left) although the profile was somewhat flatter than that with complete LD. Again, we observed a large variability between replicates. It is apparent that the LD statistics D′ and r were higher when using LDL than when using linkageonly methods, although the general pattern was comparable (compare thick solid lines vs. thin shaded lines in Figure 5).
DISCUSSION
We have provided a coherent and unified theoretical framework to combine linkage and LD information, as exemplified in Equations 4, 6a, and 6b. The method worked well with simulated data. Here we have used the exponential growth model as described by Morris et al. (2000) but the Bayesian framework is flexible and other population models can be incorporated by modifying p(Mθ) appropriately in Equation 4 or 6. An important feature of the method presented here is that it provides the joint haplotype probability conditional on the QTL genotype, i.e., p(M_{}_{L},... M_{R},  S_{0}, θ_{}), whereas Morris et al. (2000) wrote the likelihood as p(MS_{0}, θ_) = Π_{k}p(M_{k}θ), which differs from that used here, Equation 5. Take, without loss of generality, two markers. Morris et al. (2000, p. 162, bottom) used
Our results show that it is indeed possible to go beyond the 20cM confidence interval to locate QTL in populations of reasonable size with moderate family sizes and without an extremely dense genotyping. But they also point out that the advantages of combining LD information into the usual linkage framework should not be overemphasized and that its impact may vary dramatically depending on a number of factors. First, the usefulness of LDL over linkageonly methods is heavily dependent on the nature of the association, e.g., on whether there is complete LD between the marker and the QTL allele. Second, in the population structure, for accurate LD mapping it is extremely important to determine correctly the phases and the QTL genotypes. Having a small number of base population individuals with large families seems a better option than having a complex pedigree spanning several generations, although the optimum structure will depend on the strength of LD; e.g., if LD is extreme, a large number of base populations animals will be better because we will have more “independent” haplotypes. Finally, chance will affect the results: Mendelian transmission, recombination, and environmental noise are stochastic processes that may result in very different data sets starting from identical initial conditions. A sample of this variability is in Figures 3 and 4, and very interesting experimental results are presented, e.g., in Emahazion et al. (2001).
Our relatively pessimistic conclusions contrast with much more optimistic views of the advantages of LDL mapping in livestock, more specifically in dairy cattle (Farniret al. 2002; Meuwissenet al. 2002). Of course parts of the discrepancies are due to the different methodological approaches. It should also be mentioned that the accuracy of QTL estimates may also be affected by the method of computing the posterior distribution from the MCMC samples (Hotiet al. 2002). However, the dairy cattle population structure is ideally suited for LD mapping; very large families and small effective population sizes make it possible to accurately estimate phases and QTL genotypes and reduce genetic heterogeneity. This is not the case for most livestock species and certainly not the case in humans. Results from the group of M. Georges are very illustrative (Riquetet al. 1999; Farniret al. 2002). Initially, Riquet et al. (1999) located a QTL using only LD information, but that position was shifted to a significantly different position in a later analysis that combined LD and linkage. The primary reason was that sires had different genotypes assigned in each analysis. The population sizes that we used here prevented us from an accurate estimation of both the QTL genotypes of base populations and of some of the phases; these two facts together make it that no onetoone correspondence between haplotype and QTL genotype can be established unequivocally. As a result, linkageonly methods do not compare too badly with the LDL strategy. MCMC methods take care of the uncertainty but at the price of increasing the variance of the posterior density and thus the accuracy.
In this work, we have also proposed Bayesian equivalents for the classical LD measures D′ and
Certainly, further extensions and testing of this approach are warranted, particularly to overcome some of the potential risks of using LD. First of all, stratification may cause spurious disequilibrium. In principle, a LDL methodology should be more robust than a pure LD strategy but this remains to be tested and it is uncertain whether stratification has such a large impact on quantitative traits mapping as it does with binary traits. Genetic heterogeneity is also a major problem in quantitative trait loci mapping. In this case there will be a number n_{f} of original haplotypes carrying a distinct or the same mutation affecting the trait. In our model, this amounts to considering more than either + or  IBD states; an IBD indicator variable should be included and probabilities p(MS_{0} = k, k = 1, n_{f}) should be estimated. In the likely case that n_{f} is not known, a reversiblejump MCMC strategy could be used. Liu et al. (2001) and Morris et al. (2002) have recently presented an alternative approach to allow for multiple mutations in a pure LDmapping strategy. Missing markers are dealt with by using only available information for computing phases and segregation indicators. This is a reasonable approximation if the percentage of missing genotypes and the pedigree’s complexity are not large; otherwise the transmission coefficients T are not properly calculated. This should not be too much of a concern in the special case of fine mapping, where one is usually analyzing a few generations and very dense genotyping. However, this is a much more important limitation in markerassisted selection or in linkage analysis of complex populations. Here we have implicitly assumed a starshaped genealogy, which is not realistic in many instances. The dependence among sampled base population haplotypes, i.e., the fact that recombination histories are correlated, can be included in the model via, e.g., coalescent techniques assuming a given effective size (Meuwissen and Goddard 2001). A simple strategy is to consider that prior allele states in any two haplotypes are not independent, i.e., p(S_{0}_{i}, S_{0}_{i}_{′}) ≠ p(S_{0}_{i})p(S_{0}_{i}_{′}), but rather use the additive relationship coefficient (ρ_{i,i}_{′}), computed using all available pedigrees as a measure of association; then p(S_{0}_{i}, S_{0}_{i}_{′}) = (1 ρ_{ii}_{′})p(S_{0}_{i})p(S_{0}_{i}_{′}) +ρ_{ii}_{′} p(S_{0}_{i})η(S_{0}_{i}S_{0}_{i}_{′}), as explained in the theory section. Much more complicated is the issue of conditioning on the actual known pedigree previous to the first genotyped individuals and their observed marker alleles.
To conclude, fine mapping complex trait genes is a topic of very active research and a major challenge in both human and animal genetics. Given the diversity of genetic architectures and population histories, it is unlikely that a single statistical approach will be valid for all cases. One of the advantages of the Bayesian approach presented here is that the different sources of knowledge are conditionally independent (Equations 4 and 6) so that we can consider, e.g., different population genetic models to model LD simply by changing equation p(Mθ) appropriately. Additionally, the degree of uncertainty about the parameters can be fully described via the marginal posterior distribution.
APPENDIX: SAMPLING DISTRIBUTIONS
Mixedmodel effects (a, d, u, and β): The mixedmodel equations (Henderson 1984) are, conditional on w_{a}, w_{d},
Variance components (
Linkage disequilibrium parameters [t, p(M_{k}_{,}_{j}S_{k})]: The fully conditional distribution of t is not a known distribution and, thus, we resort to MetropolisHastings sampling. A new proposed age of mutation t^{new} is accepted with probability
Phase sampling (H): Phases that could not be determined unambiguously were sampled using a block Gibbs sampling algorithm. A parameterizable number of marker phases were sampled jointly for each individual in turn. The algorithm works as follows. First, unknown phases for a given individual are identified, say n_{h} unknown phases. Second, an indicator variable is constructed taking all possible values (2^{nh}). For instance, suppose that there are four markers and that the phases of first and last markers are known or not sampled (i.e., missing marker), then the indicator variable may take values 00, 01, 10, and 11, where “” stands for not sampled, “0” for paternal, and “1” for maternal origin. Finally, the probability associated with each value is calculated using all available marker information and current phases in parents and offspring and a new phase block is sampled. Here a maximum of six phases were sampled jointly.
Segregation indicators (T): T was usually updated together with the QTL position, as explained below. A new proposal for T was sampled conditioning on marker and phase information using Mendelian rules.
QTL position (δ): This is one of the most critical steps of the Bayesian procedure. A variety of strategies have been proposed in the literature (Satagopanet al. 1996; Heath 1997; Uimari and Hoeschele 1997; Sillanpaa and Arjas 1998). In a typical sampling scheme, individual S_{0} would be updated conditional on the other genotypes, but this is a risky option as the chain will get stuck easily (Jansset al. 1995). Uimari and Sillanpää (2001) proposed a dual sampling scheme. In some iterations, δ is updated using the acceptance ratio
Acknowledgments
We are thankful for helpful discussions with M. Sillanpää, D. Milan, J. M. Elsen, B. Goffinet, and L. L. G. Janss. A referee is thanked for the suggestions. This work was funded by the Bureau des Ressources Génétiques, project no. 20, and Action en Bioinformatique (France).
Footnotes

Communicating editor: C. Haley
 Received September 4, 2002.
 Accepted December 19, 2002.
 Copyright © 2003 by the Genetics Society of America