Encoding Bulgarian Colloquial Speech Using the TEI Specification

Atanasov, Atanas

Abstract

This paper presents a model for creation of a corpus of transcribed Bulgarian colloquial speech. The main goal is to show how the TEI specification is used for resolving some problems in XML encoding of spontaneous speech. The first step is to detennine the scope of the description and - respectively - the elements, included in the DTD, and their attributes. In the header of the XML documents, as usual, there is the meta-infonnation (which includes text infonnation, description of the participants, duration of the record etc.). In the body, for now, only a partial syntactic annotation is presented. The description of some pragmatic features such as the illocutionary force of the utterance (interrogative, exclamatory etc.), extra-linguistic phenomena such as facial expressions, gestures, pauses, etc. is also discussed. We have attempted to manage with some difficulties. for instance overlapping of the utterances, incomplete sentences, etc.