Abstract
This paper presents a model for creation of a corpus of transcribed Bulgarian
colloquial speech. The main goal is to show how the TEI specification is used for
resolving some problems in XML encoding of spontaneous speech. The first step is to
detennine the scope of the description and - respectively - the elements, included
in
the DTD, and their attributes. In the header of the XML documents, as usual, there
is
the meta-infonnation (which includes text infonnation, description of the participants,
duration of the record etc.). In the body, for now, only a partial syntactic annotation
is
presented. The description of some pragmatic features such as the illocutionary force
of the utterance (interrogative, exclamatory etc.), extra-linguistic phenomena such
as
facial expressions, gestures, pauses, etc. is also discussed. We have attempted to
manage
with some difficulties. for instance overlapping of the utterances, incomplete
sentences, etc.