Help

This page serves as a help reference for the options available in the web input form. Every option has a short description which can be accessed by clicking on a questionmark in the form.

Input data

Input - input file must contain sequences in FASTA format. If MEME program is used, additional limitation occur. Namely, each sequence's name has to be at most 24-characters long (sequence's name in the title line is everything following the ">" up to the first blank).
An example of FASTA file:

>SEQ1; M: AACTaGAGTT at 12
ttagaatggttAACTaGAGTTccgtcaggccattgataccgcacagttggtaacactcac
ctatatggaaggtaatgtagtggagcgcgtggttgcgtag
>SEQ2; M: AACTTGAGTg at 18
aatgcaaatctgtccttAACTTGAGTgcacacactatgtctcgtaaccatgacggtgaag
gtacggaacatgcgggccacagtttcgcgggtcttgggtt
>SEQ3; M: AACTTGAtTT at 53
actccatgtactgggcttaacaagcccatgctgacgcagaagtttggcatggAACTTGAt
TTtcatgcttttataccgccgttaacttcatctatcctca
>SEQ4; M: AACTTGcGTT at 28
ggcgtatacacacacgactcagtaaagAACTTGcGTTgctggtcgctccctgagcaggag
ggtacgtagtgcgtaaacgtagtcatagttggctaaccct
>SEQ5; M: AACTTGAaTT at 53
agagtgggcacgcgggaaacggtgaaaaaagtccagactcaagcggttggttAACTTGAa
TTccacccgagtcgtacacgtatggaagtcgagttctagt
>SEQ6; M: AAtTTGAGTT at 63
taacgcagcttgcataatacatgacacgatttggccttcgtcacgggctacgtctattgt
ccAAtTTGAGTTaagtgccgagttcaatagaaccgtccga
>SEQ7; M: AACTTGAcTT at 1
AACTTGAcTTaacggtttacagcgctgtcgcaccgtcaagcgaccctgctgtcctggata
aatggtgccgacatccagattcggtggagtactccttccg
>SEQ8; M: AACTTGAGTc at 63
ctaggggggctgctactacgattaatgagggccacgcggcagaccggcatgcagtggacc
gaAACTTGAGTcgcccacgtgcccctactttgtccgtgga
>SEQ9; M: AACTTGAGgT at 11
ttgataatggAACTTGAGgTtctaaaatgagtcctgagtcgactcgaaacagattacggt
cggagaaccccattaggttgtacaggcgatagaatggaaa
>SEQ10; M: AACTcGAGTT at 71
tttaatgtgtattctattgtaattaggtgtcacttaggctacgcccacatttgatgaagc
cagtaattcgAACTcGAGTTgcgtggctgcttacgactcg

Organism - organism that the sequences come from. For each mentioned species different DNA background model is used.


Motif Prediction

Programs - Select which programs you want to use for motif finding. Details...

Search on both strands - whether to search for motifs occurrences only on the given DNA strand or on its reverse complement as well

Motif length - expected motif length; this value is passed to the motif discovery programs. Note: the returned motifs may not be of exactly the given length

Number of results for each program - the maximum number of motifs returned by each motif discovery program. This value should be kept relatively low (1-15), as otherwise the clustering process (especially comparing motifs each with each) may be very time-consuming

External motif predictions - (optional) file with user supplied motifs. These motifs are clustered together with the predicted ones and (optionally) motifs from reference database. Acceptable file formats are described here


Reference Motif Database

Reference database - select a variant of the JASPAR database to be used. details...

User supplied database - (optional) instead of using Jaspar as a reference DB, one may prefer to use his/her own database. It must be of the format described here.
Note: if you specify any database in this field, Jaspar will not be used


Motif comparison

Distribution comparison function - Select one of the supplied metrics for comparing probability distributions defined by motifs.

Comparison type - there are two ways of obtaining distributions to be compared from motifs. One approach is to take columns from motif's PSPM and is called columns comparison (choose Motif from the select element). Another approach (choose Sequence) is to check how each motif fits on each position of the input sequences and is described here

Motif filtering threshold - the filtering phase takes place right after discovering the new motifs. The value specified in this field is used to determine which motifs returned by the same MDP (Motif Discovery Program) will be treated as the same motif. Namely, if the distance between them is less than this value, one of them is removed. Value 0 will result in skipping the filtering phase.
Note: this value should be adjusted to the chosen comparison type and function. Generally, recommended values are 0.01 - 0.2


Motif clustering

Clustering threshold - this value tells the program when to stop the hierarchical clustering process. Once the smallest distance between two clusters (we define the distance between two clusters as an average distance between objects in these clusters, using specified metric) of motifs is greater than value x, no more clusters are merged. The value x is computed as follows:

Hence, the latter should be used if we know that each clustered motif from the reference DB should appear in different cluster


Consensus motifs

Column trimming threshold - all positions on consesus motif's edge will be trimmed until one with information content of at least the specified value occurs

Column similarity function for consensus - chosen function will be used to compare columns of each motif to the so-far built consensus motif


Output

Provide data for weblogo - user may wish to create the graphical representation of the consensus motif using WebLogo tool. In such case, the size of alignment representing this consensus can be specified. The more sequences in the alignment, the better the weblogo is (i.e. closer to the actual PSPM of a consensus motif).


back