Nowadays, there are a lot of electronic texts, and we have expansive automatically collected web corpora. When looking for some information from these piles of text or searching for linguistic material from language corpora, it is necessary to know the genre of the texts - whether the analyzed sentence is from a fiction text, sports news, internet commentary or forum, etc. Therefore, it is necessary to classify texts according to their text type or genre. In order to solve this task, it is first necessary to find out which characteristics are important for classifying Estonian texts and how these characteristics appear linguistically.
Another important research direction is the automatic processing of older texts. One helpful step in the analysis of older texts is to create an intermediary layer closer to the modern language, to "translate" the old texts into the modern language form. Such "translation" is also called normalization. The normalized intermediate layer facilitates searching of texts and enables automatic analysis of texts using tools designed for modern language use.
Another topic related to older texts is the automatic extraction of information from them, currently the focus is on the automatic identification of name units (personal names, place names, organization names, etc.).
In the field of computational linguistics, we cooperate with the Institute of Computer Science of the University of Tartu.