next up previous contents index
Next: Technical capabilities and limitations Up: Overview Previous: Philosophy   Contents   Index

Overview of the algorithm

In the following section, a quick overview of the application flow will be given; a more detailed description on how to run TreeLD is presented in section 6. Figure 2 also provides a flow-diagram of an analysis with TreeLD.

Figure 2: Flowchart of analysis with TreeLD.
\begin{figure}
% latex2html id marker 37\setlength{\unitlength}{1cm}
\begin{p...
...13){\line(0,1){1.5}}
\put(15,14.5){\vector(-1,0){9.5}}
\end{picture}\end{figure}
TreeLD organizes data analysis into projects; each project is associated with a single dataset. The analysis performed in each project usually consists of two steps that are controlled by the user. First the ancestry of the dataset is estimated based on the information in the input file, then the ancestry is analyzed for evidence about the locus of disease mutation.

The first step, estimating the ancestry, is performed by focusing on individual loci (referred to as focal points) along the sequence and inferring the ancestry of each focal point individually with a Markov Chain Monte Carlo (MCMC) algorithm. To account for the uncertainty in this estimate, multiple trees that provide feasible ancestries are generated for each focal point. The performance of the MCMC algorithm depends on the user's choice of parameters for the burn-in and the number of trees that are generated. Suitable choices for these parameters are discussed in section 8. The output of one run of the MCMC for all focal points is called a tree set.

As a second step, the trees sampled at each focal point are analyzed for a signal of the presence of one or more disease mutations. By combining the resulting posterior likelihoods over multiple focal points, the posterior distribution for the locus of disease mutation is generated and an estimate for the position of the disease mutation(s) together with the credible region is obtained. Based on the same tree set, a test for the presence of a disease mutation can also be performed.

The resolution of the posterior distribution depends on the number of focal points that have been selected, as does the power of the test for association.


next up previous contents index
Next: Technical capabilities and limitations Up: Overview Previous: Philosophy   Contents   Index
Sebastian Zoellner 2005-01-27