MAST

Database	Sequence Count	Residue Count	Last Modified
Osi_dmel.fa	24	7728	Sun Oct 2 14:15:20 2011
Total	24	7728

			Similarity
1	29	GWIALIAWKALIISKIALVLAGIIGLKKL	-	0.09	0.18	0.14	0.18	0.07	0.02	0.50	0.08	0.30
2	10	EEGRGKKKKM	0.09	-	0.18	0.17	0.19	0.19	0.11	0.00	0.19	0.19
3	19	LVDRIWNFLRTHTLQVNFP	0.18	0.18	-	0.23	0.33	0.16	0.31	0.13	0.17	0.33
4	11	DAQDLAYNGYK	0.14	0.17	0.23	-	0.28	0.22	0.20	0.04	0.28	0.27
5	21	MWWCLKEKALHYFDRVMNQDE	0.18	0.19	0.33	0.28	-	0.22	0.20	0.31	0.14	0.39
6	15	TTYEVVAHPHYTHSH	0.07	0.19	0.16	0.22	0.22	-	0.40	0.09	0.17	0.31
7	11	YHHHEHHIDHH	0.02	0.11	0.31	0.20	0.20	0.40	-	-0.03	0.12	0.23
8	15	PILMIMKLKTFWLIP	0.50	0.00	0.13	0.04	0.31	0.09	-0.03	-	-0.01	0.09
9	10	HSSGHSGWSR	0.08	0.19	0.17	0.28	0.14	0.17	0.12	-0.01	-	0.13
10	8	VRRIYDQC	0.30	0.19	0.33	0.27	0.39	0.31	0.23	0.09	0.13	-

Sequence	E-value
	0 100 200 300 400 500 600
Osi3	0	↧
Osi6	0	↧
Osi7	0	↧
Osi8	0	↧
Osi9	0	↧
Osi12	0	↧
Osi14	0	↧
Osi18	0	↧
Osi11	2.6e-81	↧
Osi20	4.4e-57	↧
Osi16	1.5e-51	↧
Osi19	1.7e-50	↧
Osi21	5.2e-50	↧
Osi15	3e-46	↧
Osi10-PA	1e-43	↧
Osi2	1e-42	↧
Osi5	5.8e-42	↧
Osi17	1.3e-35	↧
Osi23	1.6e-31	↧
Osi13	5.7e-24	↧
Osi1	3.1e-23	↧
Osi22	3.3e-20	↧
Osi4	6.6e-18	↧
Osi24	3.9e-06	↧

The MAST results consist of

The inputs to MAST including:
1. The sequence databases showing the sequence and residue counts. [View]
2. The motifs showing the name, width, best scoring match and similarity to other motifs. [View]
3. The nominal order and spacing diagram.
The search results showing top scoring sequences with tiling of all of the motifs matches shown for each of the sequences. [View]
The program details including:
1. The version of MAST and the date it was released. [View]
2. The reference to cite if you use MAST in your research. [View]
3. The command line summary detailing the parameters with which you ran MAST. [View]
This explanation of how to interpret MAST results.

Inputs

MAST received the following inputs.

Sequence Databases

This table summarises the sequence databases specified to MAST.

Database: The name of the database file.
Sequence Count: The number of sequences in the database.
Residue Count: The number of residues in the database.

Motifs

Summary of the motifs specified to MAST.

Name: The name of the motif. If the motif has been removed or removal is recommended to avoid highly similar motifs then it will be displayed in red text.
Width: The width of the motif. No gaps are allowed in motifs supplied to MAST as it only works for motifs of a fixed width.
Best possible match: The sequence that would achieve the best possible match score and its reverse complement for nucleotide motifs.
Similarity: MAST computes the pairwise correlations between each pair of motifs. The correlation between two motifs is the maximum sum of Pearson's correlation coefficients for aligned columns divided by the width of the shorter motif. The maximum is found by trying all alignments of the two motifs. Motifs with correlations below 0.60 have little effect on the accuracy of the combined scores. Pairs of motifs with higher correlations should be removed from the query. Correlations above the supplied threshold are shown in red text.

Nominal Order and Spacing

This diagram shows the normal spacing of the motifs specified to MAST.

Search Results

MAST provides the following motif search results.

Top Scoring Sequences

This table summarises the top scoring sequences with a Sequence E-value better than the threshold (default 10). The sequences are sorted by the Sequence E-value from most to least significant.

Sequence

The name of the sequence. This maybe be linked to search a sequence database for the sequence name.

E-value

The E-value of the sequence. For DNA only; if strands were scored seperately then there will be two E-values for the sequence seperated by a "/". The score for the provided sequence will be first and the score for the reverse-complement will be second.

↧

Click on this to show additional information about the sequence such as a description, combined p-value and the annotated sequence.

Block Diagram

The block diagram shows the best non-overlapping tiling of motif matches on the sequence.

The length of the line shows the length of a sequence relative to all the other sequences.
A block is shown where the positional p-value of a motif is less (more significant) than the significance threshold which is 0.0001 by default.
If a significant motif match (as specified above) overlaps other significant motif matches then it is only displayed as a block if its positional p-value is less (more significant) then the product of the positional p-values of the significant matches that it overlaps.
The position of a block shows where a motif has matched the sequence.
The width of a block shows the width of the motif relative to the length of the sequence.
The colour and border of a block identifies the matching motif as in the legend.
The height of a block gives an indication of the significance of the match as taller blocks are more significant. The height is calculated to be proportional to the negative logarithm of the positional p-value, truncated at the height for a p-value of 1e-10.
Hovering the mouse cursor over the block causes the display of the motif name and other details in the hovering text.
DNA only; blocks displayed above the line are a match on the given DNA, whereas blocks displayed below the line are matches to the reverse-complement of the given DNA.
DNA only; when strands are scored separately then blocks may overlap on opposing strands.

Additional Sequence Information

Clicking on the ↧ link expands a box below the sequence with additional information and adds two dragable buttons below the block diagram.

Description: The description appearing after the identifier in the fasta file used to specify the sequence.
Combined p-value: The combined p-value of the sequence. DNA only; if strands were scored seperately then there will be two p-values for the sequence seperated by a "/". The score for the provided sequence will be first and the score for the reverse-complement will be second.
Annotated Sequence: The annotated sequence shows a portion of the sequence with the matching motif sequences displayed above. The displayed portion of the sequence can be modified by sliding the two buttons below the sequence block diagram so that the portion you want to see is between the two needles attached to the buttons. By default the two buttons move together but you can drag one individually by holding shift before you start the drag. If the strands were scored seperately then they can't be both displayed at once due to overlaps and so a radio button offers the choice of strand to display.

Scoring

MAST scores sequences using the following measures.

Position score calculation

The score for the match of a position in a sequence to a motif is computed by by summing the appropriate entry from each column of the position-dependent scoring matrix that represents the motif. Sequences shorter than one or more of the motifs are skipped.

Position p-value

The position p-value of a match is the probability of a single random subsequence of the length of the motif scoring at least as well as the observed match.

Sequence p-value

The sequence p-value of a score is defined as the probability of a random sequence of the same length containing some match with as good or better a score.

Combined p-value

The combined p-value of a sequence measures the strength of the match of the sequence to all the motifs and is calculated by

finding the score of the single best match of each motif to the sequence (best matches may overlap),
calculating the sequence p-value of each score,
forming the product of the p-values,
taking the p-value of the product.

Sequence E-value

The E-value of a sequence is the expected number of sequences in a random database of the same size that would match the motifs as well as the sequence does and is equal to the combined p-value of the sequence times the number of sequences in the database.

Inputs

Sequence Databases

Motifs

Search Results

Top Scoring Sequences

MAST version

Reference

Command line summary

Model parameters

Explanation of MAST Results

The MAST results consist of

Inputs

Sequence Databases

Motifs

Nominal Order and Spacing

Search Results

Top Scoring Sequences

Additional Sequence Information

Scoring

Position score calculation

Position p-value

Sequence p-value

Combined p-value

Sequence E-value

			Similarity
Motif	Width	Best possible match	1	2	3	4	5	6	7	8	9	10
1	29	GWIALIAWKALIISKIALVLAGIIGLKKL	-	0.09	0.18	0.14	0.18	0.07	0.02	0.50	0.08	0.30
2	10	EEGRGKKKKM	0.09	-	0.18	0.17	0.19	0.19	0.11	0.00	0.19	0.19
3	19	LVDRIWNFLRTHTLQVNFP	0.18	0.18	-	0.23	0.33	0.16	0.31	0.13	0.17	0.33
4	11	DAQDLAYNGYK	0.14	0.17	0.23	-	0.28	0.22	0.20	0.04	0.28	0.27
5	21	MWWCLKEKALHYFDRVMNQDE	0.18	0.19	0.33	0.28	-	0.22	0.20	0.31	0.14	0.39
6	15	TTYEVVAHPHYTHSH	0.07	0.19	0.16	0.22	0.22	-	0.40	0.09	0.17	0.31
7	11	YHHHEHHIDHH	0.02	0.11	0.31	0.20	0.20	0.40	-	-0.03	0.12	0.23
8	15	PILMIMKLKTFWLIP	0.50	0.00	0.13	0.04	0.31	0.09	-0.03	-	-0.01	0.09
9	10	HSSGHSGWSR	0.08	0.19	0.17	0.28	0.14	0.17	0.12	-0.01	-	0.13
10	8	VRRIYDQC	0.30	0.19	0.33	0.27	0.39	0.31	0.23	0.09	0.13	-