Clustering motifs

After computing a distance matrix for the set of motifs we employ hierarchical clustering using average linkage method. This means that we start with all motifs as singletons and proceed in steps consisting of merging two clusters for which the average distance between their members is the shortest. This is repeated until the shortest distance between clusters exceeds a given threshold.

The threshold value for clustering can be specified either as an absolute distance or it can be specified relative to the minimal distance between the motifs in the reference database. The default is to select the relative threshold of 0.5, which guarantees that all reference motifs end up in different clusters. The user is also free to supply his/her own reference database in cases where the JASPAR database seems to be an over- or under-sampling of the motif space.

After the clustering is complete, motifs in each of the clusters are aligned and a consensus motif for each cluster is computed. This is done using an incremental ungapped alignment of the motifs starting from the most informative one. Finally, the consensus motifs are truncated so that they do not contain non-informative (IC below a user-supplied threshold) columns at their ends.