A Scroll-through Tutorial: | |||||||||
Using CRIMAP to Perform Linkage Analysis
| |||||||||
This tutorial is to serve as a complimentary material to the CRIMAP Manual
("Documentation for CRI-MAP, version 2.4
(3/26/90)" by Phil Green, Kathy Falls, and Steve Crooks) for beginners
to walk through step-by-step procedures running CRIMAP program. The users
are assumed to have already known the basics of the linkage analysis and
understand the theory behind. For concepts of linkage and LODs, please refer
to this link.
This tutorial is designed using PiGMaP family genotype data as an example. By the end of the tutorial, the users are expected to be able to do linkage analysis for his/her own data set against the existing PiGMaP data for mapping purposes. 1. The data structure and data preprocessing:The datastructure is defined to have the format:FamID ChildreNum PigID CrimapCode DameID SireID Sex Allele 1 Allele 2 An actual example of the data structure:
In crimap analysis, CRIMAP uses the "CrimapCode", not the "Individual ID" ("PigID") for analysis. The "PigID" have to be taken out once the genotype is enterred and the actual data crimap working sheet should look like:
The reformating of the datasheet is taken care of by a web form and its related CGI program written in Perl (by Zhiliang Hu). To use the CGI program to reformat your data, please fill in the information in the web form and be sure to also put in your email address correctly in the corresponding field in order for you to receive the reformated data in your email. The mail containing the reformated data you receive may be in the following format: FamNumbers MarkerNumbers MarkerName(s) Fam1 ChildreNum PigID CrimapCode DameID SireID Sex Allele 1 Allele 2 Fam2 ChildreNum PigID CrimapCode DameID SireID Sex Allele 1 Allele 2 As in:
You have to cut off the mail header and trailer on (include) the line that says <----X cut here ... ---->and save the data into a file, say, "fatty.data", for further analysis. 2. Getting your unix account environment ready:Use Secured Shell (ssh) to login in your "genome" account:Your unix account environment is customized to use "tcsh". If you find it is not the case, do a "chsh" to change it to "tcsh". Creat a sub-directory for yourself and "cd" to your working directory: > mkdir yourname > cd yourname To get the existing PiGMaP family genotype data:
> getdataThis will get you a set of 12 "*.gen" files in your current directory, where "*" represent either chromosome numbers or something that tells the nature of the data (e.g. "all.gen" means all markers from 19 chromosomes). In the future, you can always use this command to get the most updated PiGMaP family genotype data (which will override the existing ones). [NOTE: "getdata" is an UNIX ultility developed by Zhiliang Hu and used on the Pig Genome Server only] 3. Merge your data with the existing PiGMaP family genotype data:For this particular example, we knew it should map to pig chromosome 16, therefore, we are going to analysis the new data only again the chomosome 16 data. The crimap excutable should be already in your path. So just type:> crimap new.par mergewhere "crimap" is the command; "new.par" is the parameter file you are going to use, and "merge" is the particular crimap option for merging the data. You will be asked for the first input file, the second input file and the output file. Here is an actual run (those in red are the characters you suppose to provide/ type in):
Check if the file "new.gen" is in your working directory. 4. Prepare your data set for crimap analysis:You need to setup your "parameter (.par), data (.dat) and order (.ord) files for your crimap analysis. Here is an actual sample:
As a result, now you should have 4 new files in your working directory:
-rw-r--r-- 1 hu adm 18351 Jan 8 10:45 new.dat -rw-r--r-- 1 hu adm 13296 Jan 8 10:35 new.gen -rw-r--r-- 1 hu adm 870 Jan 8 10:45 new.loc -rw-r--r-- 1 hu adm 315 Jan 8 10:49 new.par 5. Two point linkage analysis:Two point linkage analysis is to calculate the LOD scores by comparing two markers, the new marker and one of the existing marker, at a time, and output the significant LOD scores (LOD > 3.0) only.For "twopoint" linkage analysis, you have to modify the ".par" file to include the marker you want to analyse as the inserted_loci (the line in red are the new line you type in):
dat_file new.dat * gen_file new.gen * ord_file new.ord * nb_our_alloc 3000000 * SEX_EQ 1 * TOL 0.010000 * PUK_NUM_ORDERS_TOL 6 * PK_NUM_ORDERS_TOL 8 * PUK_LIKE_TOL 3.000 * PK_LIKE_TOL 3.000 * use_ord_file 0 * write_ord_file 1 * use_haps 1 * ordered_loci 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 * inserted_loci 16 * ENDwhere the number "16" is corresponding to locus "fatty" and this information can be found in the "new.loc" file. Now you can do the twopoint linkage analysis:
You can also save the output into a file, say "fatty.2pt", by following syntax:
> crimap new.par twopoint > fatty.2ptUse following command to extract the wanted information: > get2pt fatty.2pt S0390 lods = 5.59 S0371 lods = 10.27 S0363 lods = 7.39 S0077 lods = 3.91 CART lods = 15.35 S0298 lods = 3.01 fatty lods = 15.35 6. Multipoint linkage analysis:Once you find the significant linkage with markers on a chromosome, the next step is to determine the linear marker order of the linked markers on the chromosome. Multipoint linkage analysis is to calculate the likelihood of an order by weighing the closeness of linkage of the marker in question against all existing markers.There are a few options invloved in determining the correct marker order. We will introduce a simple approach. In practice you have to choose among the approaches that fits the situation the best. Assuming the existing marker is in a "correct" order: use option all. Edit the new.par file so that the marker to exam is NOT in the "ordered_loci":
dat_file new.dat * gen_file new.gen * ord_file new.ord * nb_our_alloc 3000000 * SEX_EQ 1 * TOL 0.010000 * PUK_NUM_ORDERS_TOL 6 * PK_NUM_ORDERS_TOL 8 * PUK_LIKE_TOL 3.000 * PK_LIKE_TOL 3.000 * use_ord_file 0 * write_ord_file 1 * use_haps 1 * ordered_loci 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 * inserted_loci 16 * ENDThen run a TESTING multipoint linkage analysis:
The last section of the multipoint linkage analysis output shows the best fitting order of the "new" marker position in the existing orderred markers, where the last field gives the likelihood score of each possible order with the highest likelihood on the top. If the existing marker is not in a right order, you have to run "flip" option to get the best order before you run an "all" option. Of course you may also like to run "flip" options with the new marker inserted into the "ordered_loci" first. It is just a preference or choice of approaches to reduce the number of unnecessary runs before you reach the optimium marker order. To find out possible unfit marker orders in an existing marker order array, use option flip.
> crimap new.par flips2where flips2 means to do flips with two markers at a time (the more of the marker numbers you choose the long the time it takes to run one crimap session, while it may help to reduce the number of flip runs before you find the best order).
The last field of the results shows the difference of the likelihoods between the original and the flipped orders of two adjecent markers, therefore we want it to be positive. Any negative value indicates that the flipped order is better.
By repeating the flip and all options, you will get the best order. 7. Determine the marker distances on a linear map:The option to do crimap for the determination of the marker distances is fixed. (Supposedly you have done through all flip and all games and with the new marker in the "ordered_loci" you have determined the best map order).
The definitions for the columns of data in the above output are (from the
left to the right):
NOTE that there are more options in crimap analysis that we
have not covered here. For example, Option chrompic
is extremely useful in checking genotype data for potential errors and
conflicting results among markers. Option build
is useful in gradually building up a map by adding markers one at a time.
It is useful when build a map de novo. Option instant
works in conjunction with "build" that finds a uniquely ordered set of
loci quikly.
8. References and/or further readings:(1) The Official CRIMAP Homepage(2) The Authors manual (3) EMBnet Crimapp Tutorial by David Featherston 9. Acknowledgement:The author would like to thank Dr. Gary Rohrer who introduced CRIMAP to me, and Dr. Lizhen Wang for indepth discussions on exploring the use of some CRIMAP options, and Dr. Max Rothschild for his support in preparation of this material.10. Appendix: About the CRIMAP softwareAuthors: Phil Green, Kathy Falls, and Steve Crooks Descriptions: The software is written in C for constructing multilocus linkage maps. Operating systems: UNIX, VMS Availablity: Please contact Dr. Phil Green (Univ. of Washington) to get both permission and the source code. Download: http://compgen.rutgers.edu/old/multimap/crimap/crimap.source.tar.Z |