BRIEF DESCRIPTION OF THE WORKSHOP'S OBJECTIVES
The methodology of each field of science includes two aspects viz. intellectual and technical. Intellectual aspects are essential for the formulation of a research problem, for understanding it and for devising the plan of its solution. Technical aspects are primarily focused on carrying out the solution of an already formulated problem. By analogy to the art of cracking ciphers and cryptographic codes, we can think of Bio-cryptology as a science addressing the problem of finding "genomic codes" i.e. evaluating the FUNCTIONAL SIGNIFICANCE of sequence data. In a broad sense a code is a transformation (or a set unambiguous rules) whereby messages are converted from one representation to another. To what extent this broad definition could apply to nucleic acids and protein sequences is not immediately clear. Thus far the only known set of rules that fits the definition is the translation code from mRNAs to proteins. In this particular case we know that a gene constitutes a "message" whereas mRNA and protein are its two representations subject to coding.
Although the existence of messages other than genes in the genome can be surmised, the proof of their existence remains to be provided. For example we might think of splicing introns out of hnRNAs as a process that requires the existence of a "message". A representation ("input" representation of an alleged code) of such a message would be some sequence (or structure) features in hnRNA.
The other representation ("output" representation of an alleged code) could be some structural features in the splicesome (or more broadly in "splicing apparatus"). In order to be able to talk of "splicing code" we would ideally like to know:
-- All elements of both representations
-- The correspondence between each element of the "input" representation and each element of the "output" representation
Unfortunately, what we know thus far are some sequence features present in hnRNAs (such as GT at the 5'end of an intron and AG at the 3'end, branching sequences, two base periodic repeats of dinucleotides and so on). There is no evidence (thus far) that these known features indeed represent a "message" or even that they correspond to anything beyond themselves. Needless to say, our knowledge of structural features that constitute "splicing apparatus" is still insufficient to consider it as an "output" representation in our alleged coding.
The above example illustrates a non-trivial and extremely
important problem: Can we decide which of the observed structural features are the code words (i.e elements of the "input" representation) without knowing the details of the "output" representation pertinent to a given biological function ?
The goal of the planned workshop is to provide (at least partially) an answer to this question. In particular we would like to:
1) Determine criteria for characterizing what constitutes biological coding (not only the "genetic code").
2) Explore and assess the applicability of existing coding theory to problems of biological coding.
3) Examine (and suggest) criteria to recognize the existence of a code.
4) Examine methods to determine an entire set of code words given instances of it (or at least to deduce other code words once some code words are known).
5) Formulate (and suggest methods of solving) a parsing problem for biological coding.
In order to achieve these goals, we intend to bring together
experts in modern coding theory and experts in studying problems of biological coding. To our knowledge, the planned workshop will be the first meeting explicitly devoted to the problem of biological coding.
The Human Genome Project has given enormous momentum for development of computational techniques in Molecular Biology. We believe that our workshop will provide some theoretical and conceptual tools for those techniques. In this respect it will be an important contribution to the Genome project.