Computational linguistics and digital humanities

Nowadays, there are a lot of electronic texts, and we have expansive automatically collected web corpora. When looking for some information from these piles of text or searching for linguistic material from language corpora, it is necessary to know the genre of the texts - whether the analyzed sentence is from a fiction text, sports news, internet commentary or forum, etc. Therefore, it is necessary to classify texts according to their text type or genre. In order to solve this task, it is first necessary to find out which characteristics are important for classifying Estonian texts and how these characteristics appear linguistically.

Another important research direction is the automatic processing of older texts. One helpful step in the analysis of older texts is to create an intermediary layer closer to the modern language, to "translate" the old texts into the modern language form. Such "translation" is also called normalization. The normalized intermediate layer facilitates searching of texts and enables automatic analysis of texts using tools designed for modern language use.

Another topic related to older texts is the automatic extraction of information from them, currently the focus is on the automatic identification of name units (personal names, place names, organization names, etc.).

In the field of computational linguistics, we cooperate with the Institute of Computer Science of the University of Tartu.

Researchers related to the field

Kadri Muischnek

Institute of Estonian and General Linguistics

Department of General Linguistics

Associate Professor of Computational Linguistics 0.5 p

Jakobi 2-426

Institute of Computer Science

Chair of Natural Language Processing

Associate Professor in Natural Language Processing 0.5 p

r 3058

kadri.muischnek@ut.ee

Kadri Muischnek is an associate professor in Computational Linguistics. At the moment her research interests include computational syntax: treebanks and parsing. Also she is doing some work with historical text normalization and text genre classification. Her past research interest, to which he hopes to return someday, were multi-word expressions in Estonian.

Kadri Muischnek

Institute of Estonian and General Linguistics

Department of General Linguistics

Associate Professor of Computational Linguistics 0.5 p

Jakobi 2-426

Institute of Computer Science

Chair of Natural Language Processing

Associate Professor in Natural Language Processing 0.5 p

r 3058

kadri.muischnek@ut.ee

Joshua Wilbur

Institute of Estonian and General Linguistics

Centre for Digital Humanities and Information Society, University of Tartu

Lecturer in Digital Linguistics

Jakobi 2-417

joshua.wilbur@ut.ee

Joshua Wilbur is Lecturer in Digital Humanities at the Center for Digital Humanities and Information Society and associated with the Institute of Estonian and General Linguistics. He holds a PhD in General Linguistics, and has a research focus on documentary linguistics, mophophonology, syntax, corpus linguistics, lexicography and language technology, especially concerning Pite Saami, a critically endangered Uralic language of Sweden.

Joshua Wilbur

Institute of Estonian and General Linguistics

Centre for Digital Humanities and Information Society, University of Tartu

Lecturer in Digital Linguistics

Jakobi 2-417

joshua.wilbur@ut.ee

Siim Orasmaa

Lecturer of Computational Linguistics 0.25 p

siim.orasmaa@ut.ee

Siim Orasmaa is a lecturer in Computational Linguistics. He is actively developing Estonian natural language processing toolkit EstNLTK and teaching courses on programming and text analysis tools. His current research focuses on applying natural language processing on historical texts. He has also worked on event and temporal analysis of Estonian texts.

Siim Orasmaa

Lecturer of Computational Linguistics 0.25 p

siim.orasmaa@ut.ee

Liina Lindström

tänapäeva eesti keele professor

liina.lindstrom@ut.ee

Liina Lindström is a professor of Modern Estonian. Her main research interests are related to language variation and syntax of Estonian from the usage-based, functionalist perspective. Main focus of her research is on syntactic variation in Estonian and the role of different forces behind it. She has been in charge of compiliing corpora of Estonian, especially the Corpus of Estonian Dialects, and also uses mostly corpus data in her research and applies quantitative and qualitative methods on this data. Liina is also one of the main promoters of using digital methods in the Humanities at UT. Currently she is a leader of the project Interdsciplinary Corpus of Seto and is involved in other projects, such as teenager language corpus.

Liina Lindström

tänapäeva eesti keele professor

liina.lindstrom@ut.ee

Pärtel Lippus

Department of Estonian

Phonetics Lab

Associate Professor of Estonian Phonetics

Jakobi 2-408

partel.lippus@ut.ee

Pärtel Lippus is Associate Professor of Estonian Phonetics. His main research interest is Estonian prosody, focussing on the word-level features (the three-way quantity system and lexical stress), but also intonational aspects (prosodic marking of non-canonical questions) and socio-phonetic variability (creaky voice). He has also been involved in investigating the prosodical features of other Finno-Ugric languages. He teaches courses on phonetics, Praat, statistics and R. He is the editor of the Journal of Estonian and Finno-Ugric Linguistics. He is one of the developers of the Phonetic Corpus of Estonian Spontaneous Speech and the Archives of Estonian Dialects and Kindred Languages.

Pärtel Lippus

Department of Estonian

Phonetics Lab

Associate Professor of Estonian Phonetics

Jakobi 2-408

partel.lippus@ut.ee

Heili Orav

Institute of Estonian and General Linguistics

Department of General Linguistics

Research Fellow in General Linguistics (employment contract suspended) 0.1 p

Institute of Computer Science

Chair of Natural Language Processing

Lecturer in Natural Language Processing

r 3059

+372 737 6143

heili.orav@ut.ee

Heili Orav is a research fellow in general linguistics, whose research is mostly related to computational linguistics and language technology. Her main research field is lexical semantics. Currently she is the leader of Estonian Wordnet project (https://cl.ut.ee/ressursid/teksaurus/index.php?lang=et), the main goal of which is to compile a large database of synonymous words or multi-word units in Estonian that express the same concept.

Heili Orav

Institute of Estonian and General Linguistics

Department of General Linguistics

Research Fellow in General Linguistics (employment contract suspended) 0.1 p

Institute of Computer Science

Chair of Natural Language Processing

Lecturer in Natural Language Processing

r 3059

+372 737 6143

heili.orav@ut.ee

Kristiina Vaik

Institute of Estonian and General Linguistics

Department of Estonian

Junior Research Fellow in Estonian and Finno-Ugric Linguistics

Jakobi 2-404

kristiina.vaik@ut.ee

Kristiina Vaik is a doctoral student who has an interest in automatic text classification. She has worked as a data anlyst, teached aspiring computational linguists and has participated in many language technological projects. She is a natural language processing enthusiast.

Kristiina Vaik

Institute of Estonian and General Linguistics

Department of Estonian

Junior Research Fellow in Estonian and Finno-Ugric Linguistics

Jakobi 2-404

kristiina.vaik@ut.ee

Maarja-Liisa Pilvik

Institute of Estonian and General Linguistics

Department of Estonian

Research Fellow in Estonian Language

Jakobi 2-430

maarja-liisa.pilvik@ut.ee

Maarja-Liisa Pilvik works as a specialist of corpora and quantitative linguistics and is a PhD student at the Institute of Estonian and General Linguistics. Her main areas of research so far have been Finnish verb semantics, morphosyntax of Estonian dialects, and the productivity of derivational morphology in different registers of the Estonian language. More broadly, she is interested in language variation, the entrenchment and cognitive organization of linguistic constructions, and the interplay, competition, and change of the forces guiding actual language use. In her work, she mainly uses corpus data and applies both qualitative and quantitative data analysis methods. She is currently involved in projects which are building the Seto language corpus and the corpus of teen speak, and in a project which is developing the tools for automatic language processing of the 19th century parish court records, as well as testing the potential uses of this important linguistic and historical resource.

Maarja-Liisa Pilvik

Institute of Estonian and General Linguistics

Department of Estonian

Research Fellow in Estonian Language

Jakobi 2-430

maarja-liisa.pilvik@ut.ee

Peeter Tinits

Institute of Estonian and General Linguistics

Centre for Digital Humanities and Information Society, University of Tartu

Specialist of Digital Humanities 0.25 p

Jakobi 2-417

Faculty of Social Sciences

Institute of Social Studies

Text Mining Expert 0.75 p

Lossi 36

peeter.tinits@ut.ee

Peeter Tinits is a digital humanities specialist in the Center for Digital Humanities and Information Society. He teaches introductory courses in digital humanities and text analytics in the University of Tartu. As a researcher he has dealt with describing the late 19th century Estonian language communities from the perspective of historical sociolinguistics, and applying the framework of cultural evolution in linguistics and in humanities more broadly, combining data analytics and various databases. At the moment he is, in collaboration with the social scientists ta the University of Tartu in the Deep Transitions research group, working on applying text mining tools to understand shifts in thinking about the natural environment and technology in industrialized nations during the 20th century.

Peeter Tinits

Institute of Estonian and General Linguistics

Centre for Digital Humanities and Information Society, University of Tartu

Specialist of Digital Humanities 0.25 p

Jakobi 2-417

Faculty of Social Sciences

Institute of Social Studies

Text Mining Expert 0.75 p

Lossi 36

peeter.tinits@ut.ee

Kaarel Veskis

doctoral student

kaarel.veskis@gmail.com

Kaarel Veskis is a doctoral student and a junior research fellow at the Estonian Folklore Archives (EFA) of the Estonian Literary Museum, participating in the EFA’s project „A corpus-based approach to folkloric variation: regional styles, thematic networks, and communicative modes in runosong tradition”. His current work is centered on computational analysis methods of poetic synonyms in the Estonian runic songs.

Kaarel Veskis

doctoral student

kaarel.veskis@gmail.com

Computational linguistics and digital humanities

Researchers related to the field

Doctoral defence: Rodolfo Basile "Invenitive-Locational Constructions in Finnish: A Mixed Methods Approach"

University of Tartu Linguistics is among the top 200 in the world

A multi-day practical workshop on automatic morpho-syntactic annotation is coming up