This is the short description of the program that is running.
PROGRAMNAME what sequence(s) ? ge:someseq
Begin (* 1 *) ?
Select one of:
A) First option
Please choose one (* A *): B (don't accept defaults without
What should I call the output file (* someseq.pgmnm *) ?
Note that the arguments can occur before or after any
switches; an argument is actually the answer to the programmes
default switch "-INfile=". If the arguments are
not present on the command line, then the programme will prompt for them. If
switches are not present on the command line, the programme will use default
values and will NOT prompt for them.
To see what switches are available and optionally to set them, run the programme
with the switch "-CHEck". You may abbreviate a switch by
entering only the uppercase part of the switchname; the rest is optional.
This is the short description of the program that is running.
Press <rtn> for more:
Syntax: % programname [-INfile=]GenEMBL:Humhb*
Required Parameters: None
Local Data Files: None
Optional Parameters:
-OUTfile=FileName copy file(s)-sequence(s) into one file
Add what to the command line ? -pro
PROGRAMNAME what sequence(s) ?
etc.
One point to note about arguments for E/GCG programmes: arguments
that are database entries [actually from E/GCG data libraries]
may be given in upper- &/or lower-case because E/GCG itself
is "case-insensitive". E/GCG programmes are run
under the UNIX environment, though, and UNIX is a
"case-sensitive" operating system. Therefore, if an
argument is a UNIX file with one or more upper-case letters, it must be typed
with its upper-case letter(s).
Exercise 1: map a sequence
Exercise 2: edit a resource file; re-map a sequence
Exercise 3: configure the graphics display; plot a sequence map
with mapplotThe generic E/GCG programme
As you saw with the GCG programmes fetch, translate,
reformat and fromstaden, most E/GCG programmes are called
like UNIX commands. You type the programme name, a flag or two (optional), and
an argument or two (sometimes optional) at the UNIX prompt, press
<RETURN>, and follow directions. Most E/GCG
"commands" expect one or more arguments specifying the names of
files or database entries to act on. And many E/GCG commands accept
flags (called "switches") to modify their behaviour.
prompt> programname argument1 argument2 -switch1 -switch2
It is usually two lines long and fairly terse.
End (* 516 *) ?
Reverse (* No *) ?
B) Second option
knowing what you are accepting)
prompt> programname -che
It is usually two lines long and fairly terse.
-DOCLines=6 copies only the first 6 lines of documentation.
-NOMONitor suppresses the screen monitor
-PROtein input sequence is protein
Input sequence specification: answering the -INfile switch
With the exception of the sequence exchange programmes and a few others,
E/GCG programmes only recognise E/GCG format sequence files or entries in
databases that have been converted to E/GCG data
libraries.
Output sequence file specification: answering the -OUTfile switch
E/GCG programmes usually suggest a default name for their output file. It is
best to select a name that has an extension reminding you of the programme that
created the file, and this is what E/GCG attempts with the default suggestion.
For example, DNAsequence23.fra could be the filename of the result
from passing a nucleotide sequence through frames. Often, the output
file of one programme is the input file for another; accepting E/GCG's default
file extension for output files can save typing in subsequent steps.
Mapping sequence with map
map is a versatile program that finds restriction enzyme sites
in a sequence. As with most E/GCG programmes,
it accepts sequence data as its default input, and can be run with zero
to many switches. These switches can modify the behaviour of map
in useful ways, and we'll explore some of these modifications with the
sequences fetched in the
Sequences Databases Exercise 2.
In addition to the files or data library entries you specify, map accesses a file describing a vast number of commercially available restriction enzymes to determine what sites it can seek. This extra input file is normally read in from a central, hidden part of the system. We will fetch this file, too, and modify it to reflect our enzyme freezer stock, budget, and available vector sites.
prompt> map
Map displays both strands of a DNA sequence with restriction sites shown
above the sequence and possible protein translations shown below.
(Linear) MAP of what sequence ? hsfau.ge_pr
Begin (* 1 *) ?
End (* 518 *) ?
Select the enzymes: Type nothing or "*" to get all enzymes. Type "?"
for help on which enzymes are available and how to select them.
Enzyme(* * *):
What protein translations do you want:
a) frame 1 b) frame 2 c) frame 3
d) frame 4 e) frame 5 f) frame 6
t)hree forward frames s)ix frames o)pen frames only
n)o protein translation q)uit
Please select (capitalize for 3-letter) (* t *):
What should I call the output file (* hsfau.map *) ?
prompt> more hsfau.map
(Linear) MAP of: hsfau check: 2981 from: 1 to: 518
LOCUS HSFAU 518 bp RNA PRI 23-SEP-1993
DEFINITION H.sapiens fau mRNA.
ACCESSION X65923
KEYWORDS fau gene.
SOURCE human.
ORGANISM Homo sapiens . . .
With 209 enzymes: *
October 26, 1995 15:21 ..
S
MH B C AN a B CB
P TbiM B AcT Av vlMAu s vs
l aonn c ceh li aawc9 m io
e qIfl c ifa uJ IIoi6 F RF
I IIII I III II IVIII I II
/ / / /
TTCCTCTTTCTCGACTCCATCTTCGCGGTAGCTGGGACCGCCGTTCAGTCGCCAATATGC
1 ---------+---------+---------+---------+---------+---------+ 60
AAGGAGAAAGAGCTGAGGTAGAAGCGCCATCGACCCTGGCGGCAAGTCAGCGGTTATACG
a F L F L D S I F A V A G T A V Q S P I C -
b S S F S T P S S R * L G P P F S R Q Y A -
c P L S R L H L R G S W D R R S V A N M Q -
[several pages deleted]
Enzymes that do cut:
AceIII AciI AflII AluI ApaI AscI AvaII BanII
BbsI BbvI BccI BcefI BmgI BpmI Bpu1102I BsaJI
BsaXI BscGI BsiEI BsiHKAI BslI BsmFI BsoFI Bsp1286I
BsrI BsrDI BsrFI BssHII BstEII Bsu36I Cac8I CviJI
CviRI DdeI DpnI DrdII EaeI EciI EcoO109I EcoRII
FauI FokI GdiII HaeI HaeII HaeIII HhaI Hin4I
HincII HinfI HphI MaeII MaeIII MboII MnlI MscI
MseI MspI MwoI NciI NlaIII NlaIV NspI PleI
Psp1406I RsaI Sau96I Sau3AI ScrFI SfaNI SphI TaqI
TauI ThaI TseI Tsp45I Tsp509I TspRI Tth111II UbaCI
Enzymes that do not cut:
AatII AccI AflIII AhdI AlwI AlwNI ApaBI ApaLI
ApoI AvaI AvrII BaeI BamHI BanI Bce83I BcgI
BcgI BclI BfaI BfiI BglI BglII BplI Bpu10I
BsaI BsaAI BsaBI BsaHI BsaWI BsbI BseRI BsgI
BsmI BsmAI BsmBI Bsp24I Bsp24I BspEI BspGI BspLU11I
BspMI BsrBI BsrGI BssSI Bst1107I BstXI BstYI CjeI
CjeI CjePI CjePI ClaI DraI DraIII DrdI DsaI
EagI EarI Eco47III Eco57I EcoNI EcoRI EcoRV FseI
FspI HgaI HgiEII HindIII HpaI KpnI MluI MmeI
MslI MspA1I MunI NarI NcoI NdeI NgoAIV NheI
NotI NruI NsiI NspV PacI Pfl1108I PflMI PinAI
PmeI PmlI PshAI Psp5II PstI PvuI PvuII RcaI
RleAI RsrII SacI SacII SalI SanDI SapI ScaI
SexAI SfcI SfiI SgfI SgrAI SmaI SnaBI SpeI
SrfI Sse8387I Sse8647I SspI StuI StyI SunI SwaI
TaqII TaqII TfiI Tth111I VspI XbaI XcmI XhoI
XmnI
prompt>
prompt> fetch data:enzyme.dat
prompt> map hsfau.ge_pr -dat=enzyme.dat -out=hsfau2.map
prompt> more hsfau2.map
prompt> map -che
prompt> map hsfau.ge_pr -dat=enzyme.dat -out=hsfau3.map
-minc=2 -maxc=3
prompt> more hsfau3.map
prompt> setplot
+---------------------> displaying all of 10 option(s) <---------------------+
|psf postscript - sent to file: homedir:graf.ps |
|epsf eps postscript - sent to file: homedir:graf.eps |
|hpg hp laser with hpgl - sent to file: homedir:graf.hp |
|xcol x windows colour graphics - for x-windows terminal |
|xmon x windows monochr. graphics - for x-windows terminal |
|vt340 vt340 graphics - for a vt340 terminal |
|vt241 vt241 graphics - for a vt241 terminal |
|tek versaterm tektronix 4105 graphics on your terminal |
|dec declaser 5100 postscript/pcl/hpgl printer at biobase |
|qms qms colorscript210 ps printer at biobase (14 kr./pg) |
| |
| |
+------------------------------------------------------------------------------+
enter a command. choices are:
<up-arrow> and <down-arrow> scroll the list
<return> makes GCG use the selected device
Q quits without doing anything
C creates and edits a new device
(you can't delete from the site file)
V views the selection (use C to edit a copy)
prompt> mapplot hsfau.ge_pr -dat=enzyme.dat -minc=2 -maxc=3
This final output might show possibilities for sub-cloning most of hsfau with only one enzyme. Can you sub-clone a fragment that is only coding sequence? Which open reading frame(s) is (are) used? Where is this information shown in the orginal sequence file? (Hint!) Are "hser2.ge_pr" or "hsht.ge_pr" better or worse prospects for sub-cloning with your reduced enzyme list?

prompt> eextractpeptide hsfau3.map -out=hsfau3.pep
prompt> more hsfau3.pep
Given that we know the coding regions for these three example sequences, let's translate them properly into proteins. For quick reference, the coding regions of these three sequences follow:
| data library entry | filename | coding sequence |
|---|---|---|
| ge:hsef2 | hsef2.ge_pr | 1 .. 2577 |
| ge:hsfau | hsfau.ge_pr | 57 .. 458 |
| ge:hsht | hsht.ge_pr | 128 .. 1420 |
prompt> translate hsef2.ge_pr
TRANSLATE translates nucleotide sequences into peptide sequences.
Begin (* 1 *) ?
End (* 3075 *) ? 2577
Reverse (* No *) ?
Range begins ATGGT and ends TGTAG. Is this correct (* Yes *) ?
That is done, now would you like to:
A) Add another exon from this sequence
B) Add another exon from a new sequence
C) Translate and then add more genes from this sequence
D) Translate and then add more genes from a new sequence
W) Translate assembly and write everything into a file
Please choose one (* W *):
What should I call the output file (* hsef2.pep *) ?

Please continue with Part
7 - Sequence Comparison