next up previous contents index
Next: Density of focal points Up: Choosing parameters for the Previous: Choosing burn-in   Contents   Index


Effect of outliers among the sampled trees

Due to computational restraints, only a limited number of trees can be drawn from the posterior distribution. This makes the posterior likelihood that is calculated at each location susceptible to being influenced by outliers with unusually high likelihoods. These occur when individual trees with a low posterior likelihood but high support for the presence of the disease location are sampled. This may result in the posterior likelihood being overestimated at that location. If the map of focal points is sufficiently dense (see 8.3, influence of an outlier may be indicated by a single spike in the posterior distribution, where focal point $i$ has a high posterior likelihood, but neither focal point $i-1$ nor $i+1$ have an elevated signal. If a focal point is really close to the locus of disease mutation, then the neighboring focal points will also be in the proximity of the disease mutation and therefore also show a increased posterior likelihood. On the other hand, if the increased likelihood at a focal point is due to an outlier, adjacent focal points will not show an increased likelihood for the presence of a disease mutation. While the impact of outliers can be reduced by sampling more trees from the posterior distribution, some outliers may have a likelihood that is orders of magnitude higher than the likelihood of all other trees that are sampled at the same focal point. Thus it may be not computationally viable to sample enough trees to control for the effect of outliers.

On the other hand, a signal that is generated by an outlier in the tree-distribution will usually not be repeated if a second set of trees is generated and analyzed. Therefore, it is advisable to verify any peak in the posterior distribution by generating additional trees at locations where a signal is generated. To make sure that this new set of trees is independent from the first set of trees, it may be necessary to restart the analysis from a random tree and to repeat the MCMC, including burn-in.


next up previous contents index
Next: Density of focal points Up: Choosing parameters for the Previous: Choosing burn-in   Contents   Index
Sebastian Zoellner 2005-01-27