A corpus-based approach in archaeolinguistics


  • Ilia A. Afanasev Saint Petersburg State University




archaeolinguistics, corpus-based approach, review, Old Church Slavonic, Ancient Greek, corpus linguistics, ancient languages, extinct languages


The article focuses on archaeolinguistics as a separate field of knowledge and outlines the features that distinguish it from other disciplines in comparative studies. It analyses the existing text collections and shows how they may find application in a corpus-based research in ancient languages. It also discusses approaches to creating new corpora of texts. The study focuses on Old Church Slavonic and Ancient Greek, in particular, it analyses the existing corpora in these languages, e. g., Corpus Cyrillo-Methodianum Helsingiense. Most of the corpora under study are not tagged. Some of them change the original writing system (from Glagolitic to Latin, using, for instance, ASCII), while the others have a restricted access. Some of the corpora are no longer available at all or available as part of local databases only. Thus, corpus-based resources in ancient languages in question are obviously insufficient. To facilitate more effective research, the easiest possible solution is to develop new corpora by using platforms specializing in linguistic analysis (e. g., CDLI or Lingvodoc) or systems that support DIY corpora. However, such platforms are often paywalled, may have limited functionality, or lack comprehensive user guides. With all the above in mind, there seems to be no ready solution for archaeolinguists who want to use a corpus-based approach in their study. They either have to make a considerable effort to modify an existing system for their purposes, or to build one of their own. In conclusion, the article proposes one of the possible ways to address these issues.



