|
|
palindrome |
Please help by correcting and extending the Wiki pages.
palindrome finds inverted repeats (stem loops) in nucleotide sequences. It will find inverted repeats that include a proportion of mismatches and gaps, that correspond to bulges in the stem loop. It finds all possible inverted matches satisfying the specified conditions of minimum and maximum length of palindrome, maximum gap between repeated regions and number of mismatches allowed.
% palindrome Find inverted repeats in nucleotide sequence(s) Input nucleotide sequence(s): tembl:d00596 Enter minimum length of palindrome [10]: 15 Enter maximum length of palindrome [100]: Enter maximum gap between repeated regions [100]: Number of mismatches allowed [0]: Output file [d00596.pal]: Report overlapping matches [Y]: |
Go to the input files for this example
Go to the output files for this example
Find inverted repeats in nucleotide sequence(s)
Version: EMBOSS:6.6.0.0
Standard (Mandatory) qualifiers:
[-sequence] seqall Nucleotide sequence(s) filename and optional
format, or reference (input USA)
-minpallen integer [10] Enter minimum length of palindrome
(Integer 1 or more)
-maxpallen integer [100] Enter maximum length of palindrome
(Any integer value)
-gaplimit integer [100] Enter maximum gap between repeated
regions (Integer 0 or more)
-nummismatches integer [0] Number of mismatches allowed (Positive
integer)
[-outfile] outfile [*.palindrome] Output file name
-[no]overlap boolean [Y] Report overlapping matches
Additional (Optional) qualifiers: (none)
Advanced (Unprompted) qualifiers: (none)
Associated qualifiers:
"-sequence" associated qualifiers
-sbegin1 integer Start of each sequence to be used
-send1 integer End of each sequence to be used
-sreverse1 boolean Reverse (if DNA)
-sask1 boolean Ask for begin/end/reverse
-snucleotide1 boolean Sequence is nucleotide
-sprotein1 boolean Sequence is protein
-slower1 boolean Make lower case
-supper1 boolean Make upper case
-scircular1 boolean Sequence is circular
-squick1 boolean Read id and sequence only
-sformat1 string Input sequence format
-iquery1 string Input query fields or ID list
-ioffset1 integer Input start position offset
-sdbname1 string Database name
-sid1 string Entryname
-ufo1 string UFO features
-fformat1 string Features format
-fopenfile1 string Features file name
"-outfile" associated qualifiers
-odirectory2 string Output directory
General qualifiers:
-auto boolean Turn off prompts
-stdout boolean Write first file to standard output
-filter boolean Read first file from standard input, write
first file to standard output
-options boolean Prompt for standard and additional values
-debug boolean Write debug output to program.dbg
-verbose boolean Report some/full command line options
-help boolean Report command line options and exit. More
information on associated and general
qualifiers can be found with -help -verbose
-warning boolean Report warnings
-error boolean Report errors
-fatal boolean Report fatal errors
-die boolean Report dying program messages
-version boolean Report version number and exit
|
| Qualifier | Type | Description | Allowed values | Default |
|---|---|---|---|---|
| Standard (Mandatory) qualifiers | ||||
| [-sequence] (Parameter 1) |
seqall | Nucleotide sequence(s) filename and optional format, or reference (input USA) | Readable sequence(s) | Required |
| -minpallen | integer | Enter minimum length of palindrome | Integer 1 or more | 10 |
| -maxpallen | integer | Enter maximum length of palindrome | Any integer value | 100 |
| -gaplimit | integer | Enter maximum gap between repeated regions | Integer 0 or more | 100 |
| -nummismatches | integer | Number of mismatches allowed | Positive integer | 0 |
| [-outfile] (Parameter 2) |
outfile | Output file name | Output file | <*>.palindrome |
| -[no]overlap | boolean | Report overlapping matches | Boolean value Yes/No | Yes |
| Additional (Optional) qualifiers | ||||
| (none) | ||||
| Advanced (Unprompted) qualifiers | ||||
| (none) | ||||
| Associated qualifiers | ||||
| "-sequence" associated seqall qualifiers | ||||
| -sbegin1 -sbegin_sequence |
integer | Start of each sequence to be used | Any integer value | 0 |
| -send1 -send_sequence |
integer | End of each sequence to be used | Any integer value | 0 |
| -sreverse1 -sreverse_sequence |
boolean | Reverse (if DNA) | Boolean value Yes/No | N |
| -sask1 -sask_sequence |
boolean | Ask for begin/end/reverse | Boolean value Yes/No | N |
| -snucleotide1 -snucleotide_sequence |
boolean | Sequence is nucleotide | Boolean value Yes/No | N |
| -sprotein1 -sprotein_sequence |
boolean | Sequence is protein | Boolean value Yes/No | N |
| -slower1 -slower_sequence |
boolean | Make lower case | Boolean value Yes/No | N |
| -supper1 -supper_sequence |
boolean | Make upper case | Boolean value Yes/No | N |
| -scircular1 -scircular_sequence |
boolean | Sequence is circular | Boolean value Yes/No | N |
| -squick1 -squick_sequence |
boolean | Read id and sequence only | Boolean value Yes/No | N |
| -sformat1 -sformat_sequence |
string | Input sequence format | Any string | |
| -iquery1 -iquery_sequence |
string | Input query fields or ID list | Any string | |
| -ioffset1 -ioffset_sequence |
integer | Input start position offset | Any integer value | 0 |
| -sdbname1 -sdbname_sequence |
string | Database name | Any string | |
| -sid1 -sid_sequence |
string | Entryname | Any string | |
| -ufo1 -ufo_sequence |
string | UFO features | Any string | |
| -fformat1 -fformat_sequence |
string | Features format | Any string | |
| -fopenfile1 -fopenfile_sequence |
string | Features file name | Any string | |
| "-outfile" associated outfile qualifiers | ||||
| -odirectory2 -odirectory_outfile |
string | Output directory | Any string | |
| General qualifiers | ||||
| -auto | boolean | Turn off prompts | Boolean value Yes/No | N |
| -stdout | boolean | Write first file to standard output | Boolean value Yes/No | N |
| -filter | boolean | Read first file from standard input, write first file to standard output | Boolean value Yes/No | N |
| -options | boolean | Prompt for standard and additional values | Boolean value Yes/No | N |
| -debug | boolean | Write debug output to program.dbg | Boolean value Yes/No | N |
| -verbose | boolean | Report some/full command line options | Boolean value Yes/No | Y |
| -help | boolean | Report command line options and exit. More information on associated and general qualifiers can be found with -help -verbose | Boolean value Yes/No | N |
| -warning | boolean | Report warnings | Boolean value Yes/No | Y |
| -error | boolean | Report errors | Boolean value Yes/No | Y |
| -fatal | boolean | Report fatal errors | Boolean value Yes/No | Y |
| -die | boolean | Report dying program messages | Boolean value Yes/No | Y |
| -version | boolean | Report version number and exit | Boolean value Yes/No | N |
The input is a standard EMBOSS sequence query (also known as a 'USA').
Major sequence database sources defined as standard in EMBOSS installations include srs:embl, srs:uniprot and ensembl
Data can also be read from sequence output in any supported format written by an EMBOSS or third-party application.
The input format can be specified by using the command-line qualifier -sformat xxx, where 'xxx' is replaced by the name of the required format. The available format names are: gff (gff3), gff2, embl (em), genbank (gb, refseq), ddbj, refseqp, pir (nbrf), swissprot (swiss, sw), dasgff and debug.
See: http://emboss.sf.net/docs/themes/SequenceFormats.html for further information on sequence formats.
ID D00596; SV 1; linear; genomic DNA; STD; HUM; 18596 BP.
XX
AC D00596;
XX
DT 17-JUL-1991 (Rel. 28, Created)
DT 07-DEC-2007 (Rel. 94, Last updated, Version 6)
XX
DE Homo sapiens gene for thymidylate synthase, complete cds.
XX
KW .
XX
OS Homo sapiens (human)
OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia;
OC Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae;
OC Homo.
XX
RN [1]
RP 1-18596
RX PUBMED; 2243092.
RA Kaneda S., Nalbantoglu J., Takeishi K., Shimizu K., Gotoh O., Seno T.,
RA Ayusawa D.;
RT "Structural and functional analysis of the human thymidylate synthase
RT gene";
RL J Biol Chem 265(33):20277-20284(1990).
XX
DR Ensembl-Gn; ENSG00000176890; Homo_sapiens.
DR Ensembl-Tr; ENST00000323274; Homo_sapiens.
DR GDB; 163670.
DR GDB; 182340.
XX
CC These data kindly submitted in computer readable form by:
CC Sumiko Kaneda
CC National Institute of Genetics
CC 1111 Yata
CC Mishima 411
CC Japan
XX
FH Key Location/Qualifiers
FH
FT source 1..18596
FT /organism="Homo sapiens"
FT /chromosome="18"
FT /map="18p11.32"
FT /mol_type="genomic DNA"
FT /clone="lambdaHTS-1 and lambdaHTS-3"
FT /db_xref="taxon:9606"
FT repeat_region 1..148
FT /rpt_family="Alu"
FT repeat_region 202..477
FT /rpt_family="Alu"
[Part of this file has been deleted for brevity]
ttttgttttt agcttcagcg agaacccaga cctttcccaa agctcaggat tcttcgaaaa 15660
gttgagaaaa ttgatgactt caaagctgaa gactttcaga ttgaagggta caatccgcat 15720
ccaactatta aaatggaaat ggctgtttag ggtgctttca aaggagctcg aaggatattg 15780
tcagtcttta ggggttgggc tggatgccga ggtaaaagtt ctttttgctc taaaagaaaa 15840
aggaactagg tcaaaaatct gtccgtgacc tatcagttat taatttttaa ggatgttgcc 15900
actggcaaat gtaactgtgc cagttctttc cataataaaa ggctttgagt taactcactg 15960
agggtatctg acaatgctga ggttatgaac aaagtgagga gaatgaaatg tatgtgctct 16020
tagcaaaaac atgtatgtgc atttcaatcc cacgtactta taaagaaggt tggtgaattt 16080
cacaagctat ttttggaata tttttagaat attttaagaa tttcacaagc tattccctca 16140
aatctgaggg agctgagtaa caccatcgat catgatgtag agtgtggtta tgaactttaa 16200
agttatagtt gttttatatg ttgctataat aaagaagtgt tctgcattcg tccacgcttt 16260
gttcattctg tactgccact tatctgctca gttccttcct aaaatagatt aaagaactct 16320
ccttaagtaa acatgtgctg tattctggtt tggatgctac ttaaaagagt atattttaga 16380
aataatagtg aatatatttt gccctatttt tctcatttta actgcatctt atcctcaaaa 16440
tataatgacc atttaggata gagttttttt tttttttttt taaactttta taaccttaaa 16500
gggttatttt aaaataatct atggactacc attttgccct cattagcttc agcatggtgt 16560
gacttctcta ataatatgct tagattaagc aaggaaaaga tgcaaaacca cttcggggtt 16620
aatcagtgaa atatttttcc cttcgttgca taccagatac ccccggtgtt gcacgactat 16680
ttttattctg ctaatttatg acaagtgtta aacagaacaa ggaattattc caacaagtta 16740
tgcaacatgt tgcttatttt caaattacag tttaatgtct aggtgccagc ccttgatata 16800
gctatttttg taagaacatc ctcctggact ttgggttagt taaatctaaa cttatttaag 16860
gattaagtag gataacgtgc attgatttgc taaaagaatc aagtaataat tacttagctg 16920
attcctgagg gtggtatgac ttctagctga actcatcttg atcggtagga ttttttaaat 16980
ccatttttgt aaaactattt ccaagaaatt ttaagccctt tcacttcaga aagaaaaaag 17040
ttgttggggc tgagcactta attttcttga gcaggaagga gtttcttcca aacttcacca 17100
tctggagact ggtgtttctt tacagattcc tccttcattt ctgttgagta gccgggatcc 17160
tatcaaagac caaaaaaatg agtcctgtta acaaccacct ggaacaaaaa cagattttat 17220
gcatttatgc tgctccaaga aatgctttta cgtctaagcc agaggcaatt aattaatttt 17280
tttttttttg acatggagtc actgtccgtt gcccaggctg cagtgcagtg gcgcaatctt 17340
ggctcactgc aacctccacc tcccaggttc aagtgattct cctgcctcag cctcccatgt 17400
agctgggatc acaggcacct gccaccatgc ccggctaatt ttttgtattt tttgtagaga 17460
cagggtttca ccatgttggc caggctggtc tcaaacacct gacctcaaat gatccacctg 17520
cctcagcctc ccaaagtgtt gggattacag gcgtaagcca ccatgcccag ccctgaatta 17580
atatttttaa aataagtttg gagactgttg gaaataatag ggcagaggaa catattttac 17640
tggctacttg ccagagttag ttaactcatc aaactctttg ataatagttt gacctctgtt 17700
ggtgaaaatg agccatgatc tcttgaacat gatcagaata aatgccccag ccacacaatt 17760
gtagtccaaa ctttttaggt cactaacttg ctagatggtg ccaggttttt ttgcacaagg 17820
agtgcaaatg ttaagatctc cactagtgag gaaaggctag tattacagaa gccttgtcag 17880
aggcaattga acctccaagc cctggccctc aggcctgagg attttgatac agacaaactg 17940
aagaaccgtt tgttagtgga tattgcaaac aaacaggagt caaagcttgg tgctccacag 18000
tctagttcac gagacaggcg tggcagtggc tggcagcatc tcttctcaca ggggccctca 18060
ggcacagctt accttgggag gcatgtagga agcccgctgg atcatcacgg gatacttgaa 18120
atgctcatgc aggtggtcaa catactcaca caccctagga ggagggaatc agatcggggc 18180
aatgatgcct gaagtcagat tattcacgtg gtgctaactt aaagcagaag gagcgagtac 18240
cactcaattg acagtgttgg ccaaggctta gctgtgttac catgcgtttc taggcaagtc 18300
cctaaacctc tgtgcctcag gtccttttct tctaaaatat agcaatgtga ggtggggact 18360
ttgatgacat gaacacacga agtccctctg agaggttttg tggtgccctt taaaagggat 18420
caattcagac tctgtaaata tccagaatta tttgggttcc tctggtcaaa agtcagatga 18480
atagattaaa atcaccacat tttgtgatct atttttcaag aagcgtttgt attttttcat 18540
atggctgcag cagctgccag gggcttgggg tttttttggc aggtagggtt gggagg 18596
//
|
Palindromes of: D00596
Sequence length is: 18596
Start at position: 1
End at position: 18596
Minimum length of Palindromes is: 15
Maximum length of Palindromes is: 100
Maximum gap between elements is: 100
Number of mismatches allowed in Palindrome: 0
Palindromes:
126 caaaaaaaaaaaaaaaa 142
|||||||||||||||||
217 gtttttttttttttttt 201
127 aaaaaaaaaaaaaaaa 142
||||||||||||||||
215 tttttttttttttttt 200
127 aaaaaaaaaaaaaaaa 142
||||||||||||||||
214 tttttttttttttttt 199
127 aaaaaaaaaaaaaaaa 142
||||||||||||||||
213 tttttttttttttttt 198
127 aaaaaaaaaaaaaaaa 142
||||||||||||||||
212 tttttttttttttttt 197
127 aaaaaaaaaaaaaaaa 142
||||||||||||||||
211 tttttttttttttttt 196
127 aaaaaaaaaaaaaaaa 142
||||||||||||||||
210 tttttttttttttttt 195
127 aaaaaaaaaaaaaaaa 142
||||||||||||||||
209 tttttttttttttttt 194
127 aaaaaaaaaaaaaaaa 142
||||||||||||||||
208 tttttttttttttttt 193
127 aaaaaaaaaaaaaaaa 142
||||||||||||||||
207 tttttttttttttttt 192
127 aaaaaaaaaaaaaaaa 142
||||||||||||||||
206 tttttttttttttttt 191
127 aaaaaaaaaaaaaaaa 142
||||||||||||||||
205 tttttttttttttttt 190
127 aaaaaaaaaaaaaaaagaccgccagggct 155
|||||||||||||||||||||||||||||
204 ttttttttttttttttctggcggtcccga 176
|
Secondary structures-like inverted repeats in genomic sequences may be implicated in initiation of DNA replication.
Some genomic sequence entries in the databases are composed of unfinished, draft sequence with gaps of unknown size between contigs. The positions of these gaps are often indicated by runs of 200 'N' characters. To prevent palindrome producing large, uninformative outputs, any palindromes found that are composed only of N will not be reported.
Unless the qualifier -nooverlap is specified, palindrome makes no attempt to exclude subsets of previously found palindromes.
| Program name | Description |
|---|---|
| einverted | Find inverted repeats in nucleotide sequences |
| equicktandem | Find tandem repeats in nucleotide sequences |
| etandem | Find tandem repeats in a nucleotide sequence |
einverted also looks for inverted repeats but is much slower and more sensitive, as it finds low-quality (very mismatched) repeats and repeats with gaps.
Please report all bugs to the EMBOSS bug team (emboss-bug © emboss.open-bio.org) not to the original author.