Reference order data sets

In a comparative mapping approach, you want to take into account the knowledge of a completely sequenced genome for the purpose of mapping a genome of interest. In order to construct a reference order dataset, you have to find the orthologous relationships between your markers associated to genes and the genes in the sequenced genome. This can be obtained by sequence alignment reciprocal best hits.

The first header line of the dataset should be set to:

data type order

The second header line of the dataset indicates the number of orthologous markers followed by a zero. The remaining lines contain the ordered list of markers for each chromosome segment in the reference genome. Each line begins with a star character (*) followed by the name of a marker and the position in basepairs of the marker in the chromosome, except for the first and last markers of each segment where the star character is replaced by the chromosome number. This number should be a positive integer between 1 and 9999. Notice that it is not possible to express singleton markers (i.e. a segment with only one marker). The basepair positions are needed by the graphical comparative mapping view only (see paretoplotg). This information is not used during the comparative mapping process.

The following is an example with 7 orthologous markers on two chromosomes (chr02 and chr07) in the reference genome.

data type order
7 0
2 UniSTS208788 7059292
* UniSTS180373 9019692
2 UniSTS160192 10214792
7 UniSTS160197 106361623
* UniSTS165815 106779006
* UniSTS180382 106822847
7 UniSTS180378 107120936

Thomas Schiex 2009-10-27