Constraints data sets

In some cases, you may want to incorporate ordering information that comes from another source of information (already published maps...) for which you don't have the data. CARTHAGENE can try to take advantage of such information using so called ``triplet constraints''. A triplet constraint is defined by a triple of markers $(m_1,m_2,m_3)$ and a loglikelihood penalty $p$ (expressed in base 10 logarithm). The ``semantics'' of such a triplet constraint is that any marker ordering inconsistent with the triplet ordering (i.e., an order that places $m_2$ outside of the $m_1-m_3$ interval) will see its loglikelihood penalized by the amount $p$. Use this facility with extreme precaution since it breaks the good property of the maximum likelihood criterion used by CARTHAGENE. The most obvious and reasonable use of this facility is to answer a question such as: is there a map that places the three markers as indicated in the triplet and whose loglikelihood is not worse than the loglikelihood of an optimal map by more than $p$.

A constraint dataset is simple composed by a set of such triplet constraints after a single header line:

data type constraint
A triplet constraint is simply written by puting the name of the 3 markers in the triplet in sequence followed by the penalty. An example of such a dataset is given below:

data type constraint
MS4 MS5 MS13 3.0
MS5 MS13 MS6 3.0

The markers mentionned in the triplet must be already know to CARTHAGENE, i.e, one or several datasets which contain data about these markers must be already loaded.

Thomas Schiex 2009-10-27