Computational linguistics and digital humanities

Nowadays, there are a lot of electronic texts, and we have expansive automatically collected web corpora. When looking for some information from these piles of text or searching for linguistic material from language corpora, it is necessary to know the genre of the texts - whether the analyzed sentence is from a fiction text, sports news, internet commentary or forum, etc. Therefore, it is necessary to classify texts according to their text type or genre. In order to solve this task, it is first necessary to find out which characteristics are important for classifying Estonian texts and how these characteristics appear linguistically.

Another important research direction is the automatic processing of older texts. One helpful step in the analysis of older texts is to create an intermediary layer closer to the modern language, to "translate" the old texts into the modern language form. Such "translation" is also called normalization. The normalized intermediate layer facilitates searching of texts and enables automatic analysis of texts using tools designed for modern language use.

Another topic related to older texts is the automatic extraction of information from them, currently the focus is on the automatic identification of name units (personal names, place names, organization names, etc.).

In the field of computational linguistics, we cooperate with the Institute of Computer Science of the University of Tartu.

Researchers related to the field

Kadri Muischnek
Institute of Estonian and General Linguistics
Department of General Linguistics
Associate Professor of Computational Linguistics 0.5 p
Jakobi 2-426

Institute of Computer Science
Chair of Natural Language Processing
Associate Professor in Natural Language Processing 0.5 p
r 3058
Kadri Muischnek is an associate professor in Computational Linguistics. At the moment her research interests include computational syntax: treebanks and parsing. Also she is doing some work with historical text normalization and text genre classification. Her past research interest, to which he hopes to return someday, were multi-word expressions in Estonian.
Kadri Muischnek
Institute of Estonian and General Linguistics
Department of General Linguistics
Associate Professor of Computational Linguistics 0.5 p
Jakobi 2-426

Institute of Computer Science
Chair of Natural Language Processing
Associate Professor in Natural Language Processing 0.5 p
r 3058
Joshua Wilbur
Institute of Estonian and General Linguistics
Centre for Digital Humanities and Information Society, University of Tartu
Lecturer in Digital Linguistics
Jakobi 2-417
Joshua Wilbur is Lecturer in Digital Humanities at the Center for Digital Humanities and Information Society and associated with the Institute of Estonian and General Linguistics. He holds a PhD in General Linguistics, and has a research focus on documentary linguistics, mophophonology, syntax, corpus linguistics, lexicography and language technology, especially concerning Pite Saami, a critically endangered Uralic language of Sweden.
Joshua Wilbur
Institute of Estonian and General Linguistics
Centre for Digital Humanities and Information Society, University of Tartu
Lecturer in Digital Linguistics
Jakobi 2-417
Siim Orasmaa
Lecturer of Computational Linguistics 0.25 p
Siim Orasmaa is a lecturer in Computational Linguistics. He is actively developing Estonian natural language processing toolkit EstNLTK and teaching courses on programming and text analysis tools. His current research focuses on applying natural language processing on historical texts. He has also worked on event and temporal analysis of Estonian texts.
Siim Orasmaa
Lecturer of Computational Linguistics 0.25 p
Liina Lindström
tänapäeva eesti keele professor
Liina Lindström is a professor of Modern Estonian. Her main research interests are related to language variation and syntax of Estonian from the usage-based, functionalist perspective. Main focus of her research is on syntactic variation in Estonian and the role of different forces behind it. She has been in charge of compiliing corpora of Estonian, especially the Corpus of Estonian Dialects, and also uses mostly corpus data in her research and applies quantitative and qualitative methods on this data. Liina is also one of the main promoters of using digital methods in the Humanities at UT. Currently she is a leader of the project Interdsciplinary Corpus of Seto and is involved in other projects, such as teenager language corpus.
Liina Lindström
tänapäeva eesti keele professor
Pärtel Lippus
Department of Estonian
Phonetics Lab
Associate Professor of Estonian Phonetics
Jakobi 2-408
Pärtel Lippus is Associate Professor of Estonian Phonetics. His main research interest is Estonian prosody, focussing on the word-level features (the three-way quantity system and lexical stress), but also intonational aspects (prosodic marking of non-canonical questions) and socio-phonetic variability (creaky voice). He has also been involved in investigating the prosodical features of other Finno-Ugric languages. He teaches courses on phonetics, Praat, statistics and R. He is the editor of the Journal of Estonian and Finno-Ugric Linguistics. He is one of the developers of the Phonetic Corpus of Estonian Spontaneous Speech and the Archives of Estonian Dialects and Kindred Languages.
Pärtel Lippus
Department of Estonian
Phonetics Lab
Associate Professor of Estonian Phonetics
Jakobi 2-408
Heili Orav
Institute of Estonian and General Linguistics
Department of General Linguistics
Research Fellow in General Linguistics (employment contract suspended) 0.1 p

Institute of Computer Science
Chair of Natural Language Processing
Lecturer in Natural Language Processing
r 3059
+372 737 6143
Heili Orav is a research fellow in general linguistics, whose research is mostly related to computational linguistics and language technology. Her main research field is lexical semantics. Currently she is the leader of Estonian Wordnet project (https://cl.ut.ee/ressursid/teksaurus/index.php?lang=et), the main goal of which is to compile a large database of synonymous words or multi-word units in Estonian that express the same concept.
Heili Orav
Institute of Estonian and General Linguistics
Department of General Linguistics
Research Fellow in General Linguistics (employment contract suspended) 0.1 p

Institute of Computer Science
Chair of Natural Language Processing
Lecturer in Natural Language Processing
r 3059
+372 737 6143
Kristiina Vaik
Institute of Estonian and General Linguistics
Department of Estonian
Junior Research Fellow in Estonian and Finno-Ugric Linguistics
Jakobi 2-404
Kristiina Vaik is a doctoral student who has an interest in automatic text classification. She has worked as a data anlyst, teached aspiring computational linguists and has participated in many language technological projects. She is a natural language processing enthusiast.
Kristiina Vaik
Institute of Estonian and General Linguistics
Department of Estonian
Junior Research Fellow in Estonian and Finno-Ugric Linguistics
Jakobi 2-404
Maarja-Liisa Pilvik
Institute of Estonian and General Linguistics
Department of Estonian
Research Fellow in Estonian Language
Jakobi 2-430
Maarja-Liisa Pilvik works as a specialist of corpora and quantitative linguistics and is a PhD student at the Institute of Estonian and General Linguistics. Her main areas of research so far have been Finnish verb semantics, morphosyntax of Estonian dialects, and the productivity of derivational morphology in different registers of the Estonian language. More broadly, she is interested in language variation, the entrenchment and cognitive organization of linguistic constructions, and the interplay, competition, and change of the forces guiding actual language use. In her work, she mainly uses corpus data and applies both qualitative and quantitative data analysis methods. She is currently involved in projects which are building the Seto language corpus and the corpus of teen speak, and in a project which is developing the tools for automatic language processing of the 19th century parish court records, as well as testing the potential uses of this important linguistic and historical resource.
Maarja-Liisa Pilvik
Institute of Estonian and General Linguistics
Department of Estonian
Research Fellow in Estonian Language
Jakobi 2-430
Peeter Tinits
Institute of Estonian and General Linguistics
Centre for Digital Humanities and Information Society, University of Tartu
Specialist of Digital Humanities 0.25 p
Jakobi 2-417

Faculty of Social Sciences
Institute of Social Studies
Text Mining Expert 0.75 p
Lossi 36
Peeter Tinits is a digital humanities specialist in the Center for Digital Humanities and Information Society. He teaches introductory courses in digital humanities and text analytics in the University of Tartu. As a researcher he has dealt with describing the late 19th century Estonian language communities from the perspective of historical sociolinguistics, and applying the framework of cultural evolution in linguistics and in humanities more broadly, combining data analytics and various databases. At the moment he is, in collaboration with the social scientists ta the University of Tartu in the Deep Transitions research group, working on applying text mining tools to understand shifts in thinking about the natural environment and technology in industrialized nations during the 20th century.
Peeter Tinits
Institute of Estonian and General Linguistics
Centre for Digital Humanities and Information Society, University of Tartu
Specialist of Digital Humanities 0.25 p
Jakobi 2-417

Faculty of Social Sciences
Institute of Social Studies
Text Mining Expert 0.75 p
Lossi 36
Kaarel Veskis
doctoral student
Kaarel Veskis is a doctoral student and a junior research fellow at the Estonian Folklore Archives (EFA) of the Estonian Literary Museum, participating in the EFA’s project „A corpus-based approach to folkloric variation: regional styles, thematic networks, and communicative modes in runosong tradition”. His current work is centered on computational analysis methods of poetic synonyms in the Estonian runic songs.
Kaarel Veskis
doctoral student
Doktoritöö

Doctoral defence: Rodolfo Basile "Invenitive-Locational Constructions in Finnish: A Mixed Methods Approach"

täis kirjutatud vihik laual

University of Tartu Linguistics is among the top 200 in the world

töötuba

A multi-day practical workshop on automatic morpho-syntactic annotation is coming up