Saturday, August 16, 2014

Ville Marttila: Creating Digital Editions for Corpus Linguistics

Ville Marttila is defending his thesis "Creating Digital Editions for Corpus Linguistics - The case of Potage Dyvers, a family of six Middle English recipe collections". Professor Graham Caie (University of Glasgow) serves as the opponent, and Professor Minna Palander-Collin as the custos.

In his lectio precursoria, Marttila explained how his work consists of two main parts. From historical linguistics point of view, the focus is in Middle English recipe collections. The other contribution, highly relevant for digital humanities, is the development of coherent methodology for contextualized study of historical texts using corpus-linguistic methods. A central objective has been to go beyond the traditional focus on language as disembodied sequences of written or spoken utterances, represented in digital form as linear streams of character data. Marttila describes the broadening in the following way in the introduction of his thesis:

"[...] new trends in historical linguistics [...] emphasising the importance of studying historical language use in its original context. Since the situational context and much of the cultural context of historical texts is inaccessible to us, the importance of the documentary context is relatively much greater than for present-day texts, both because it is all we have, and because the pragmatic functions of its material features are less well known." (p. 1)

One important contribution of the work is based on the idea of layered annotation. Marttila describes in detail documentary, descriptive and analytical annotation. By using annotation overlays, one can highlight the differing ontological status of the descriptive annotation versus documentary and analytical annotation. The division also can also be useful in defining editorial responsibilities. The whole has been built on existing standard solutions when possible putting these together as a coherent whole.

The opponent, professor Caie started his remarks very positively, stating that the thesis is absolutely brilliant and even characterized it the most impressive thesis he has read during his 40 years career. He also mentioned about the VARIENG unit at University of Helsinki as one of the leading centers of historical corpus linguistics. The opponent characterized the work to have content even for two or three theses, being impressive not only in quantity but also in quality.

Wednesday, March 19, 2014

Open Science in Finland 2014-2017

An Open Science seminar was organized at University of Helsinki on 19th of March in order to discuss the domain of the Open science and research (Avoin tiede ja tutkimus, ATT) project. The Ministry of Education and Culture coordinates this project that lasts from 2014 to 2017. The seminar was opened by Director Tapio Kosunen. Kosunen provided motivations for open science including its importance in promoting science and the impact of science in the society. Openness is a central principle in research through which new opportunities for participation are provided for researchers, decision makers, and citizens. This requires that the publications, data, methods, skills, and support services are widely available. It is foreseen that the openness and digitization of the research provides for all new opportunities for collaboration and communication. This should lead to increased trust on science as well as support for entrepreneurial activities. Detailed information on the project will be available at

The seminar consisted of three panel discussions on open publishing, data, and methods. A number of questions for all the panels had been prepared by Juha Haataja, Counsellor of Education. As a background, Haataja summarized main items related to the three areas, i.e. storage, meta data services, and access services for publications, data, and methods. It is also necessary to solve issues, for instance, in managing identity recognition and access rights. The questions posed by Haataja were related to guidelines, funding, and renewal:
  • How does openness change research practices?
  • How to implement openness taking into account ethical principles?
  • How to make openness to to be a part of everyday the practices among researchers?
  • How should public funding policies be adapted to take into account the changes?
  • What are the main priority areas for development?
  • Does openness increase trust into science?
  • How does openness support entrepreneurial activities?
  • Does openness help in assessing quality of research?

The panel discussion on open publications was chaired by Annikki Roos. The participants, Pekka Nygrén, Sirpa Thessler, Joona Lehtomäki, Mikko Mönkkönen, and Jyrki Ilva, agreed with Tapio Kosunen about the positive sides of openness adding the fact that texts are becoming data for research. The group concentrated on some problems including predator journals, cost of publishing, and ethical problems. In open publishing, the author and publisher have typically a joint interest in publishing. This leads potentially to decrease in the quality of science: peer review may be shallow or even non-existing. However, same kind of quality problems are present also in traditional publishing.

The panel on open data was chaired by Sami Borg who is the director of the Finnish Social Science Data Archive. The participants were Pirjo-Leena Forström, Laura Höijer, Jussi Simpura and Tuuli Toivonen. Positive aspects were acknowledged mentioning separately democratic access to research, increasing innovation and international collaboration in exchanging data. Challenges that were mentioned included the need to build meriting systems, defining the ways how to cite data, and life cycle management. Professor Simpura advertised a Soterko event on 5th of May related to the last item.

In the panel on open methods, the discussion between Markus Kainu, Johanna Seppänen, Ilpo Vattulainen, and Timo Väliharju was chaired by Hanna Vehkamäki. (At the beginning of this discussion, the blogger had to rush to another event.)

Monday, February 17, 2014

Heather Froehlich:
Introduction to Digital Humanities

Heather Froehlich from the University of Strathclyde, Glasgow is visiting the English subject and the Varieng Research Unit at the Department of Modern Languages, University of Helsinki. Froehlich studies representations of gender in Early Modern London plays as part of the Visualizing English Print 1470-1800 project, and she works with DocuScope, a text analysis software which provides interactive visualisation tools for corpus-based rhetorical analysis.

In her presentation, Froehlich discussed in detail different aspects of digital humanities. She posed a series of questions related to the role of digital and computational tools within humanities. She cited Ted Underwood who has stated that "there is actually a lot of low-hanging fruit out there still worth picking - big questions that are easy to answer quantitatively and that only require organizing large datasets". Froehlich mentioned methods that are commonly used within digital humanities including creating and maintaining digital archives (Omeka, Scalar), text encoding (TEI), network analysis (Gephi), text analysis and topic modeling (MALLET) and data visualization.

A division into questions that easy and those that are "easy" to answer was discussed. Referring again to underwood, Froehlich discussed the need to go beyond binary categories, i.e. to deal with questions of degree. She also reminded that "simplifying topic models for humanists who will not (and should not) study the underlying algorithms creates an enormous potential for groundless - or even misleading - insights (see "Words Alone: Dismantling Topic Models in the Humanities" by Benjamin M. Schmidt in the Journal of Digital Humanities).

Froehlich emphasized collaboration as an important feature of digital humanities. Only in some exceptional cases people have experiences and skills in multiple disciplines that are needed to conduct digital humanities research successfully. Rather, digital humanities is essentially a networked activity that challenges researchers to reach over disciplinary boundaries. Digital humanities thrives on (1) a network of people, (2) openess, and (3) experimentation. Humanists are learning about computers, but computationally driven are also learning about the humanities.

Towards the end of her presentation, Froehlich discussed fora and resources. For instance, Jeffrey Schnapp has written A Short Guide to Digital_Humanities in Digital_Humanities by Anne Burdick, Johanna Drucker, Peter Lunenfeld, Todd Presner, MIT Press, 2012, pp. 121-136. Organizations include Alliance of Digital Humanities Organizations (ADHO) and European Association for Digital Humanities. To relevant journals in this area belong Literary and Linguistic Computing, Digital Humanities Quarterly, and Journal of Digital Humanities.

Heather Froehlich was hosted by Anni Sairio, Tanja Säily and Terttu Nevalainen. The discussion was active and in addition to the Finnish participants, insightful remarks were presented by William Kretzschmar who is also visiting University of Helsinki. Kretzschmar pursues research and teaching on American English, language variation, and computer methods for description, analysis, and presentation of language data from literary and non-literary sources.