Monday, February 15, 2016

The "Digital" in the Digital Humanities

Is a methodology an essential ingredient in a scientific discipline, so essential that it needs to be mentioned in the name? Digital Humanities is a commonly used name for a research activity where computers are used to support endeavours within humanities and social sciences. Similar combined terms are, for example, computational linguistics and bioinformatics. Some disciplines such as mathematics, statistics, computer science and logic in philosophy are already themselves methodologically oriented. Is the use of computers, at some time in the future, so commonplace and obvious in digital humanities that the qualifying part is left away? In which way is the qualification relevant? Considering good research in humanities, is it necessary to make a difference between approaches that make use of computers and those that do not?

As a starting point to the discussion one could state a claim that the objects of study and phenomena considered in humanities and social sciences are even much more complex than the ones of physical sciences and biological sciences. Human thinking, language and culture are dynamical phenomena, subject to continuous change. A theory may become invalid due to itself. Also in physics measurement influence the results but this effect is not as complex and unbredictable as in humanities. Many approaches in ”simpler” sciences are based on the concept of predictability. Scientist look for experimental settings that can be repeated even as a criteria for being scientific. As Von Foester has stated, such an attitude makes most of research irrelevant from the point of view of real world phenomena. A technical term that can be used here is non-stationarity. Unlike in physics, phenomena discussed in humanities are in constant flux, not only superficially but sometimes even concerning the basic framework. Peter Gärdenfors might explain this as introduction of new quality dimensions. In physics, new quality dimensions may be introduced to craft better theories of physics, but in human behaviours or social activities inherently new dimensions may emerge. These are not new explanatory models but new aspects of the phenomenon itself. What this means in practice is that one cannot compare two situations or contexts in a straightforward way. Due to such complexities, it has been necessary to focus on the use of qualitative methods and textually oriented mode of presentation unless taking a risk of reductionistism. For example, it seems that economics has suffered from such a problem by formulating closed-form equations with small numbers of variables, limited feedback methanisms and consideration of adaptive processes involved. Formalisation is not a guarantee of being scientific if the formalism is not on par with the complexity of the phenomenon under considetation.

Digital humanities is an interesting area because it includes a promise of approaching the complex phenomena in humanities in new ways, facilitated by the availability of large data collections and the latest developments in computer science. The use of the word ”digital” may be considered misleading. Having resources in digital form helps in sharing them, to involve a larger number of researchers than before that can reach the texts and data through networks. A more signiticant impact is, however, reachable through "Computational Humanities." This term can be used to characterize the activity in which humanities data is modeled using modern computer science methods such as statistical machine learning. The complexity of topics at hand motivate development of improved and novel methods that enable modeling data that is more complex than anything seen in physical or biological sciences. Future data sets and analysis processes could, for example, include all books and newspapers published and stored during the history of humankind. This would give a chance to study traditional questions in new ways as well as address wholly new perspectives in holistic manner. Methodological themes to address include non-stationarity, multilayered contextuality and multilevel simulation of large communities of cultural adaptive agents. It will be useful to continue the emerging practice and to bring togethe representatives of humanities and social sciences in one hand and data sciences in the other. The people with formal computing skills need to remember not to bring in reductionistic assuptions and the historians, economists, linguists and others may take a role in which they supervise the assumptions taken in the data driven modeling processes and an analytic role in interpreting the models and results. An essential factor that differs from the past is the chance to rely on emergent processes. They give rise to dynamical understanding that is not dependent on the limited capacity of coding knowledge in small pieces as a manual process. Human interpretation and analysis remains important but can be taken to high level of abstraction or to unforseen level of contextual detail. In this way, Digital Humanities can serve the humankind in its most burning challences and questions related, for example, to successful communication, organizing societies in a good way, solving crises in a peaceful manner, addressing climate change and other environmental issues, improving scientific communication to improve its results and their use, and protect and further refine human cultural heritage.

Tuesday, January 26, 2016

Modeling Meaning and Knowledge, Spring 2016

A series of mini-symposia entitled Modeling Meaning and Knowledge started on Monday, 25th of January. During the spring 2016, the topic is handled in a multidisciplinary fashion. In linguistics, philosophy of language, cognitive science, psychology, sociology, artificial intelligence, information systems design and some other scientific discipline or application areas, it has been of primary interest to study knowledge. What does it mean to know? How is knowledge acquired? How are knowledge and meaning related? How is prototypical meaning different from contextual meaning? What are the characteristics of explicit and implicit knowledge? Is there knowledge beyond language? What kind of approaches have been taken to model knowledge in computer science and artificial intelligence? Can computational modeling be used to test philosophical ideas related to knowledge and meaning? What is the relationship of these questions with digital humanities? Can large data and text collections, i.e. so called big data, be used to extract knowledge automatically? What kind of practical, ethical and societal consequences does the chosen approach have? The series addresses questions like this.

Timo Honkela gave a short introduction and discussed professor Terry Wingrad career from SHRDLU to the book on syntactic processing in NLP and all the way to the role in the advent of Google. Juha Himanka gave a talk "Fernando Flores reads Heidegger". He described how Flores had met Stafford Beer in the times of hopeful developments in Chile and how they were dramatically abrupted. Flores later collaboparted with Winograd and they authored an influential book together. Pirjo Kukkonen discussed dynamic semiotics providing a wide range of theoretical and practical views on the complexity of synbolic communication. Based on her long experience on the topic, Terttu Nevalainen presented in-depth views on linguistic variation.

Juha Himanka

The agenda for the spring is given below. The sessions are held in the Auditorium IV of the main building of the University of Helsinki.

  • Jan 25 (14.15-16.00):
    Complexity of meaning and knowledge dynamic semiotics, prof. Pirjo Kukkonen
    linguistic variation, prof. Terttu Nevalainen
    Tutorial: A story of syntax, semantics and pragmatics from Syntax I to phenomenology and Google; guest speaker Dr. Juha Himanka
  • Feb 1 (14.15-16.00):
    Knowledge representation
    networks of knowledge, prof. Eero Hyvönen (Aalto)
    spaces of knowledge, prof. Timo Honkela
    Tutorial: An introduction to artificial intelligence
  • Feb 8 (14.15-16.00):
    Conceptual change
    cognitive view, prof. Ismo Koponen
    historical view, prof. Mikko Tolonen
    Tutorial: Study of words and concepts – Qualitative and quantitative approaches
  • Feb 15 (14.15-16.00):
    Knowledge over language borders
    prof. Jörg Tiedemann
    prof. Liisa Tiittula
    Tutorial: An intellectual obituary of Melissa Bowerman
  • Feb 22 (14.15-16.00):
    Interactive session
    Tutorial: Independent component analysis of signals and texts
  • Feb 29 (14.15-16.00):
    From data to knowledge with machine learning
    prof. Tapani Raiko (Aalto)
    Tutorial: A history of machine learning and neural networks research
  • Mar 14 (14.15-16.00):
    Studying understanding and emotions through brain research
    prof. Mikko Sams (Aalto)
    prof. Arto Mustajoki
    Tutorial: An introduction to ambiguity and vagueness
  • Mar 21 (14.15-16.00):
    Meaning in art
    Automating literary creativity, prof. Hannu Toivonen
    Tutorial: Metaphors, analogies and conceptual blending
  • Apr 4 (14.15-16.00):
    Creating scientific knowledge as a social process
    Dr. Nina Janasik-Honkela
    Dr. Arho Toikka
    Tutorial: Kuhn’s Structure of Scientific Revolutions and Gärdenfors’ Conceptual Spaces
  • Apr 11 (14.15-16.00):
    Uncertain knowledge of the future
    Dr. Mikko Rask
    Tutorial: An introduction to futures studies
  • Apr 18 (14.15-16.00):
    Knowledge of society
    prof. Mika Pantzar
    Sakari Virkki (Competence Map Solutions)
    Tutorial: Modeling evolutionary and dynamical systems
  • Apr 25 (14.15-16.00):
    Legal and wellbeing knowledge
    Anna Ronkainen (TrademarkNow)
    Dr. Krista Lagus (TBC)
    Tutorial: Text mining of document collections and social media discussions
Small changes are possible.

Monday, August 31, 2015

Citizens' Dialogue event was organized at the University of Helsinki on Monday, 31st of August. The topic was Open Science, Open Data and Open Innovation. Panel members were:

  • Commissioner for Research, Science and Innovation Carlos Moedas, European Commission
  • Chancellor, Professor Thomas Wilhelmsson, University of Helsinki
  • Co-founder and Senior Partner Valto Loikkanen, Grow VC Group
  • Director Kristiina Hormia-Poutanen, the National Library of Finland and President of the Association of European Research Libraries LIBER
President, Professor Anneli Pauli, Lappeenranta University of Technology served as the moderator.

Before her current position at Lappeenranta University of Technology, Anneli Pauli has worked as the Deputy Director-General of the Directorate-General Research and Innovation of the European Commission and as the Deputy Director-General of the Joint Research Centre of the European Commission. Pauli first congratulated University of Helsinki for its 375th anniversary. She continued by introducing the two main themes, Open Science and Open Data in one hand, and new forms of funding in the other. The audience could participate by asking questions directly or presenting questions through a message wall or social media (Twitter and Instagram).

Commissioner Moedas told about the importance of European collaboration in an early stage of his career. He was born in Beja, a town in southers Portugal. Moedas mentioned that Erasmus changed his life forever, enabled studying in Paris. These are the kind of things that make Europe and that are easily forgotten. The main projects of the EU are peace and prospery. Europe has been a conversion machine. First, wellbeing of citizens has developed to a large degree. Second, within EU there are 7-8% of the world population, but 30% of knowledge is created here. Third, we are the only social platform of the world where people are taked care of. This is a unique feature for EUrope.

Pauli introduced the first discussion theme, is openness a key to scientific excellence and to innovation?

Commissioner pondered that border of different disciplines is important for innovations in digital age. He presented the example of Ada Lovelace (and Charles Babbage), daugther of Lord Byron. Mother told Lovelace to study mathematics and science. Thanks to her understanding and experience of art, Lovelace who is coined the first programmer envisioned programming music, seeing the concept of digital world, connecting different areas. In summary, openness and crossing areas as preconditions for innovation. Digital humanities can be seen as an example of a crossing of areas that promotes emergence of new knowledge and innovations.

Wilhelmsson underlined that the EU has an important role in development of open science. Nandatory exceptions are needed for data and text mining. Moreover, promoting the sustainability of research is needed. In Finland, ministry of education's open science and research intiative has been important. Finland aims to be a leading country in this area in 2017 which is a very ambitious goal. Funding, changes in legistlation, education, and research are needed.

Hormia-Poutanen stressed the need for raising awareness, training, clarifyin concepts like open sience, and stating clearly what kind of benefits we can gain with data and text mining. Regarding open access, different stakeholders need to collaborate.

As an expert of crowd funding, Loikkanen mentioned that he is eager to have discussion with the audience. He emphasized innovation as a trial and error process.

A number of questions and comments were presented by the audience, for example the following.

  • Professor Tuukka Petäjä (University of Helsinki) told about a large body of research on the athmosphere at the University of Helsinki. Petäjä stressed the global potential of such work and called for means to support the means that could lead into wide application of the methods and data created in the research efforts.
  • The president of the University of Helsinki, Professor Jukka Kola asked how does the openness of science in the EU compare with that of the USA and other parts of the world.
  • A member of the net audience posed a question related to the cost of openess.
Carlos Moedas described the traditional form of publishing business. He stressed the negative effect of the pay walls to innovation. The way of doing the business has to change. Regarding the comparison with the USA, the commissioner stated that unfortunately it is not as good as in the USA. An important example is opening of the data genome. This created a lot of new research and business. A side remark was that he sees Open Science synonymous to Science 2.0. Moedas said that the education system can be open, business-based, or a mixture of those as the education system is up to each member state. There will be a new layer in education will increased use of digital tools. The experience in the USA has shown that online courses need to be associated with personal contacte. The importance of the personal contact remains. Horizon 2020: going from fundamental research to combining …

Valto Loikkanen stressed the importance of open data in the case of publicly funded research. He also raised the issue of making close methods abd data open in various ways to boost innovation processes. In computer science, open source software has long tradiotion. In Finland, Linux originating from the University of Helsinki can even be viewed as a national pride. In a private discussion after the event, Loikkanen stressed the global importance of crowd funding regarding its volume and status as a means for reaching rational decisions.

Kristiina Hormia-Poutanen stressed that we need copyright legistlation that allows data and text mining.

The first part of the event closed with a poll. For the question, does opening research data foster scientific excellence and innovation, more than 90 percent of the audience replied yes.

The second topic, Open Innovation, or more specifically, Private investment in Research and Innovation, and the European fund for strategic investments (EFSI), seemed to be less familiar to the mostly academic audience than the first one. The commissioner summarized the situation by saying that Governments do well in Europe regarding research funding but Europe is lacking in the private side. Companies cannot be forced to invest but conditions can be more favourable. For instance, Europe cannot have 28 different markets. Moedas referred to Juncker and Katainen when discussion the investment plan to boost the European economy. The plan includes three key areas:

  • mobilising investments of at least 315 billion euros in three years,
  • supporting investment in the real economy, and
  • creating an investment friendly environment.

Loikkanen mentioned that wider understanding of risk investment processes are needed. Digitalizing investing is becoming increasingly popular through crowd (sourcing) methods. They offer scalable value creation. Loikkanen strssed that investors want to invest to teams that have necessary skills. This kind of list should be given to all who wish to build a start-up company and consider how they build the core team. Loikkanen mentioned that the cost of innovation is actually low, and the investments in early stages are usually low. However, when the business is to grow, substantial investments are needed. Loikkanen used Google as an example. At Stanford University, investors are at a walking distance from the inventors and developers.

Carlos Moedas answered to the concerned to the role of basic research. He said that the commission does to very good projects regardless of their nature, even basic research (cf. CERN). For innovators, European investment council is needed. The commissioner concluded nicely that "you invest to people, people make the difference!"

Friday, August 28, 2015

Oskar Kohonen: Weakly Supervised Learning of Morphology

Oskar Kohonen defended his dissertation ”Advances in Weakly Supervised Learning of Morphology” in Aalto University, School of Science. As the opponent served Professor Lars Borin (Språkbanken, Institutionen för svenska språket, University of Göteborg, Sweden) and as the custos Professor Emeritus Erkki Oja, Aalto University. Among other things, professor Borin is the director of SWE-CLARIN, the sister organization of FIN-CLARIN, directed by Dr. Krister Lindén. Thesis advisor has been Dr. Krista Lagus who is the originator of the Morfessor method with Dr. Mathias Creutz. Morfessor was introduced as an unsupervised method for discovery of morphological segmentation of words in a data-driven manner. Dr. Creutz defended his thesis on this topic in 2006. Dr. Sami Virpioja extended the scope and the method. He defended his thesis "Learning Constructions of Natural Language: Statistical Models and Evaluations" in 2012.

In morphological segmentation, unsupervised methods do not typically model allomorphy, that is, non-concatenative structure. In English, one can given pretty/prettier as an example of allomorphy. In Finnish this phenomenon is very common. Moreover, the accuracy of unsupervised methods remains far behind rule-based methods. In this thesis, Oskar Kohonen studies the use of weakly supervised methods in order to alleviate these problems. With his colleagues, Kohonen has propose a novel extension to the Morfessor Baseline method to model allomorphy via the use of string transformations. Moreover, Kohonen has with his colleagues examined the effect of weak supervision on accuracy by training on a small annotated data set in addition to a large unannotated data set. Two novel semi-supervised morphological segmentation methods have been developed. First, a semi-supervised extension of Morfessor Baseline has been introduced, and, second, a means for morphological segmentation with conditional random fields (CRF) has been developed. The methods have been evaluated on several languages including English, Estonian, Finnish, German and Turkish.

The opponent, professor Borin represented the computational linguistics aspect of the work. He paid attention to a number of linguistic, methodological, terminological and conceptual issues. Borin first set Kohonen's work to a broader context mentioning, for instance, the fact that there are about 7,000 languages in the world. The discussion took place in a sophisticated manner. Both the defendant and the opponent speak Swedish, Finnish and English and therefore the examples could be related to any of these languages. As a conceptual topic, Borin asked to clarify what has been meant by morphological analysis.

In the evening party, the opponent and I realized that we had met in a conference already quite some time ago. First we thought it would have beeen in Gothenburg but adterwards I realized it must have been a conference in Copenhagen in 1987. At that time, we both had a shared interest on morphology. As a young researcher I got the idea that one could try to learn morphological analysis rules through inductive inference. Almost thirty years ago the attempt did not give convincing results. Therefore, from a personal point of view, it has been a pleasure to see how substantial developements have taken place.

Wednesday, April 22, 2015

Enhancing digital humanities at National Library of Finland

The National Library of Finland organized an internal seminar in which both organizational and content matters were presented and discussed. The director of research library Liisa Savolainen gave a presentation on the developments related to the Digital Humanities area. Digital humanities is a natural active area for the National Library as an increasing proportion of material is in digital form and the lirary itself digitizes large quantities of materials. Savolainen discussed a conceptual model. She also gave examples of international and Finnish digital humanities projects and institutions including
  • styly analysis of Sharepeare's texts,
  • Old Bailey corpus of London central criminal court decisions, published from 1674 to 1913
  • FIN-CLARIN,
  • VARIENG,
  • Bible version comparison, and
  • sea traffic in the antiquities.
Savolainen concluded that library's natural role is to provide materials. Is was also discussed that availability of easy-to-use tools can be important for researchers, many of which have only limited skills in computer science.

Jean Sibelius is the internationally best known Finnish composer who lived 1965-1957. Tuija Wicklund gave a presentation on a large-scale project called JSW - Jean Sibelius Works in which a critical edition of Sibelius' works is compiled. The editions includes both musical scores and and associated texts such as letters. Wicklund gave as an example Lemminkäinen Tuonelassa and described the different stages of composition and the information is transferred from composer's table to åublisher manuscrisher's hands where the score is presented for each 27 players separately. In the critical edition, different information sources are integrated. For example, potential errors in the original score are corrected but in an open and transparent manner.

Thursday, April 16, 2015

Consumer Research Center becoming a part of University of Helsinki

The former autonomouis National Consumer Research Center is becoming a part of the University of Helsinki. This move was celebrated today at the new premises of the institution whose researchers described ongoing research and future plans.

One collaboration project with the Depertment of Modern Languages is called Citizen Mindspaces. In the project, social scientists, text analysts and computational linguists will study a large collection of social media discussions in the Suomi24 service and develop research questions, methods, tools in order to provide means for deeper understuding of citizens' thoughts about the state of affairs.

Friday, April 10, 2015

Lauri Lahti: Computer-Assisted Learning Based on Cumulative Vocabularies, Conceptual Networks and Wikipedia Linkage

Lauri Lahti defended his PhD thesis "Computer-Assisted Learning Based on Cumulative Vocabularies, Conceptual Networks and Wikipedia Linkage" in the field of Computer Science and Engineering. As the opponent serves Associate professor Piet Kommers, University of Twente, the Netherlands, and as the custos Professor Jorma Tarhio, Aalto University, Department of Computer Science. In the thesis, it was found that conceptual networks of students, common language and Wikipedia inherently emphasize different themes that should be addressed when developing learning methods.

In his lectio precursoria, Lahti discussed the motivation and different aspects of his work. His motivation stems from education. A central task he considers is how to travel through a conceptual network during learning. At later stages of the work, Wikipedia became an important resource. In the work, educational methods developed that are inspired by the collaboratively maintained knowledge structure of Wikipedia. Moreover many of its features and contents related to representing, exploiting and mimicking were used. Due to Wikipedia’s many unique characteristics, Lahti considered Wikipedia to offer much more than just a mere encyclopedic reference for factual information but a holistic framework for knowledge representation. One can, for example, compare mind maps created by students at different stages of learning and the Wikipedia as a socially constucted holistic resource.

The opponent discussed the relationship between literature and the keywords that are used to characterize the knowledge contained in the documents. This is an old question that libraries have solved in various ways. The basic question is how to link individual words with objects like books the contents of which are very complex and multifaceted.

Prof. Kommers had made the defence easy to follow by preparing the questions beforehand and by presenting them in one slide. The questions were discussed one by one, however widening the scope or delving into details whenever necessary. Among other things, various aspects related to networked representation of knowledge, students' use of it, measuring learning results, and different theories of learning were discussed. The opponent specifically mentioned Gordon Pask's work. With Heinz von Foerster and others, Pask was an early cybernetician who paid careful attention to systems theoretical aspects of complex phenomena.

A central critical point by the opponen was related to the limitations of using recall as a means to study learning. He asked about the potential of using the knowledge, for instance. in active problem solving. In general, the opponent found the work substantial and warmly recommended it to be accepted.