next up previous contents index
Next: Genetic map distances Up: Documentation for TreeLD, version Previous: The interface   Contents   Index


Input format

The input file supplied by the user should indicate the position of the markers used, the phenotypes of the individuals studied and the phased genotypes of the sample. A schematic for the input file is given by figure 4 where the quantities are as follows:

Figure 4: Schematic of an input file. Each oval box indicates the input information for one individual. Figure 5 shows a specific example.
\begin{figure}\center
\fbox{
\parbox[b]{10cm}{ \char93  Comments\\
P Position(1...
...pe(NumberOfIndividuals.1)\ Haplotype(NumberOfIndividuals.2)}}
}
}\end{figure}

Comments
All lines in the beginning of the file that is started by a hash (#) will be ignored by the program and can therefore be used for comments.

P
The line that designates the map-positions of the markers starts with a P (upper case). The positions should be separated by a single whitespace.

Position(i)
Each of those numbers indicates the position of a marker relative to an arbitrary point of reference in basepairs. The loci must be in consecutive order along the chromosome (i.e. the positions have to be increasing). Position(i) should be separated from Position(i+1) by a single whitespace.
For each individual i in the sample, the phenotype and genotype information must be provided. The first line indicates the phenotype, the second and possibly third line contain the genotype information.
Phenotype(i)
This floating point number indicates the phenotype of the individual i. For a QTL-study, this can be the measured quantitative trait. In a case-control study, the phenotypes should be indicated as 1.0 for cases and 0.0 for controls.

1/2
This entity should designate how many chromosomes share the phenotype designated in the same line. For case-control studies on autosomal loci this will be a 2, while for male X-chromosomal loci or for non-transmitted haplotypes in a TDT this will be a 1.

Haplotype(i.x)
In this line(s) the input file contains the one or two haplotypes as indicated by the line with the phenotype information. The state of each SNP is indicated by a 1 or a 2 without a space between the individual characters. It is not important which allele of the SNP is assigned to which number, as no information about ancestral state is used. The number of markers displayed in each of this lines must match the number of marker positions provided in the line starting with a P.

An example for a simple input file is given in the file SampleInput.txt, which is shown in figure 5.
Figure 5: Example of an input file. The first line is a comment that is ignored by the program. The second line indicates the location of the 5 markers at bp 820, 22312,..., 82290 relative to an arbitrary starting point. The first three individuals in the sample are the cases, designated by the phenotype of 1.0. The 2 after the phenotype indicates that these are diploid individuals. The last three individuals are controls as indicated by the phenotype 0.0.
\begin{figure}\hspace{2.5cm}
\fbox{
\parbox[b]{12cm}{ \char93  SampleInput.txt, ...
...\\
12111\\
0.0 2\\
22211\\
11112\\
0.0 2\\
11122\\
11111
}
}
\end{figure}

Please note that the chromosomes in the input file are assumed to be unrelated. Thus, data that are generated in a case-control study can be used without modifications. See 5.3 for instructions how to generate an input file for trios.



Subsections
next up previous contents index
Next: Genetic map distances Up: Documentation for TreeLD, version Previous: The interface   Contents   Index
Sebastian Zoellner 2005-01-27