Friday, August 28, 2015

Oskar Kohonen: Weakly Supervised Learning of Morphology

Oskar Kohonen defended his dissertation ”Advances in Weakly Supervised Learning of Morphology” in Aalto University, School of Science. As the opponent served Professor Lars Borin (Språkbanken, Institutionen för svenska språket, University of Göteborg, Sweden) and as the custos Professor Emeritus Erkki Oja, Aalto University. Among other things, professor Borin is the director of SWE-CLARIN, the sister organization of FIN-CLARIN, directed by Dr. Krister Lindén. Thesis advisor has been Dr. Krista Lagus who is the originator of the Morfessor method with Dr. Mathias Creutz. Morfessor was introduced as an unsupervised method for discovery of morphological segmentation of words in a data-driven manner. Dr. Creutz defended his thesis on this topic in 2006. Dr. Sami Virpioja extended the scope and the method. He defended his thesis "Learning Constructions of Natural Language: Statistical Models and Evaluations" in 2012.

In morphological segmentation, unsupervised methods do not typically model allomorphy, that is, non-concatenative structure. In English, one can given pretty/prettier as an example of allomorphy. In Finnish this phenomenon is very common. Moreover, the accuracy of unsupervised methods remains far behind rule-based methods. In this thesis, Oskar Kohonen studies the use of weakly supervised methods in order to alleviate these problems. With his colleagues, Kohonen has propose a novel extension to the Morfessor Baseline method to model allomorphy via the use of string transformations. Moreover, Kohonen has with his colleagues examined the effect of weak supervision on accuracy by training on a small annotated data set in addition to a large unannotated data set. Two novel semi-supervised morphological segmentation methods have been developed. First, a semi-supervised extension of Morfessor Baseline has been introduced, and, second, a means for morphological segmentation with conditional random fields (CRF) has been developed. The methods have been evaluated on several languages including English, Estonian, Finnish, German and Turkish.

The opponent, professor Borin represented the computational linguistics aspect of the work. He paid attention to a number of linguistic, methodological, terminological and conceptual issues. Borin first set Kohonen's work to a broader context mentioning, for instance, the fact that there are about 7,000 languages in the world. The discussion took place in a sophisticated manner. Both the defendant and the opponent speak Swedish, Finnish and English and therefore the examples could be related to any of these languages. As a conceptual topic, Borin asked to clarify what has been meant by morphological analysis.

In the evening party, the opponent and I realized that we had met in a conference already quite some time ago. First we thought it would have beeen in Gothenburg but adterwards I realized it must have been a conference in Copenhagen in 1987. At that time, we both had a shared interest on morphology. As a young researcher I got the idea that one could try to learn morphological analysis rules through inductive inference. Almost thirty years ago the attempt did not give convincing results. Therefore, from a personal point of view, it has been a pleasure to see how substantial developements have taken place.

No comments:

Post a Comment