next up previous contents index
Next: TDT-data Up: Input format Previous: Genetic map distances   Contents   Index


Missing information and unknown phase

This release of TreeLD does not support the analysis of sequences with missing data or unknown phase. To impute missing data and phase information, we recommend using PHASE 2.1 (Stephens et al., 2001), which can be obtained from http://www.stat.washington.edu/stephens/. To obtain the best results from PHASE, it is advisable to run the program on one combined dataset of cases and controls. The perlscript PH2TL.pl included with the TreeLD package transforms the output of PHASE into an input file for TreeLD. Usage is

\begin{displaymath}
PH2TL.pl <PHASE.out> <Phenotypes.txt> <TreeLD.in>
\end{displaymath}

where$ <PHASE.out>$ designates the name of the output file that is generated by PHASE, $<TreeLD.in>$ is the name of the input file for TreeLD and $<Phenotypes.txt>$ is a text file that contains the phenotype information,comprised of one phenotype per line in the same order as the input file for PHASE.

Note that while PHASE estimates the uncertainty for each estimated state, the script PH2TM.pl selects for each marker the allele that has the highest posterior probability according to PHASE and assigns this result as the state of this marker. Thus, using PHASE in this manner will reduce the uncertainty that is present in the data and therefore generate confidence intervals that are somewhat anti-conservative.


next up previous contents index
Next: TDT-data Up: Input format Previous: Genetic map distances   Contents   Index
Sebastian Zoellner 2005-01-27