The project of a deeply tagged parallel corpus of Middle Russian translations from Latin

Keywords: Middle Russian, Church Slavonic, Latin, translation, electronic corpora, syntactic alignment


Tagged parallel corpora are powerful tools for the analysis of natural language. Moreover, for historical linguistics, whose most peculiar shortcoming is lack of living native speakers, corpora — as paper or electronic collections of written texts — are the main source of linguistic information. Old and Middle Russian are well-documented languages, and a host of manuscripts in both idioms — including those containing numerous translations — are available for investigation. Nevertheless, up to now there is no parallel translational corpus of Middle Russian. Thus, a number of important written sources containing information valuable for linguists, literary scholars and historians cannot be studied properly. This article provides a preliminary account of the project of a deeply tagged parallel corpus of Middle Russian translations from Latin. Such corpus may prove useful in the formal description of the translation techniques of the time, which may help with dividing the anonymous texts of the time into several groups based on their language features. Such grouping may help with authorship attribution and, consequently, with incorporating each translation into a proper cultural landscape.

From the linguistic point of view, such corpus could provide researchers with crucial information on the vocabulary, morphology and syntax of Middle Russian with an emphasis on the argument structure of the verbs, usage of borrowed lexical items and set expressions and professional skills of the ancient translators. The article gives an outline of the crucial features of the prospective Middle Russian translational corpus, its possible primary contents, text standardization and annotation principles, as well as the reasons for not using a theory-neutral syntactic apparatus, characteristic of the existing historical corpora of ancient Indo-European languages, such as TOROT or PROIEL. An explanation of how the potential users of this corpus could benefit from our non-standard tagging principles is given.


