Identification of sequence motifs in biological data; challenges and possible solutions
Morten Nielsen
IIB, UNSAM
Most bioinformatics algorithms designed for identification of sequence motifs in biological data aims at defining one consensus motif that to the highest degree possible characterises the data. Whereas the importance of these algorithms is unquestionable, it is clear that for some biological questions the essential assumption that a single consensus-binding motif will characterize the data is an over-implication of the underlying biology. One such example is antibody-antigen binding data, where polyclonal antibody sera often are used to experimentally define antigenic binding sites. I my presentation, I will outline the challenges associated with developing methods for dissolving suc h poly-motif biological data, and describe one simple Gibbs sampler solution we have recently proposed. This solution is an extension of the Gibbs sampler algorithm earlier developed for characterisation of single motif receptor-ligand interactions and employs a simultaneous clustering and motif characterisation to define the optimal number of sub-motifs contained in any given data set.