I am currently sat on a suitably firm bed in a very-nice-indeed room at Merton College, Oxford, having travelled here yesterday to attend this year’s Digital Humanities @ Oxford Summer School (DHOXSS). The intent of the five-day workshop is to introduce delegates to a range of topics suitable for those who are interested in the creation, management or publication of digital data in the humanities. The facilities in Merton are excellent, and it’s an astonishing feeling to know that you’re in a college which was first established in 1264. We’re certainly not in Kansas anymore, Toto.
Our first session today was given by Dr Chris Lintott, Postdoctoral Researcher at Oxford and co-presenter of the BBC’s The Sky at Night. Lintott’s lecture was entitled “Crowdsourcing in the Humanities”, and was intended as an overview of several crowdsourcing projects, one of which, Galaxy Zoo, was a project developed by Lintott and his team. The original Galaxy Zoo was launched in 2007, with a data set made up of a million galaxies imaged with the robotic telescope of the Sloan Digital Sky Survey. Within 24 hours of its launch, the site was receiving over 70,000 classifications an hour. It received 50 million classifications in its first year, and involved the efforts of 150,000 people.
Lintott suggested that the general public are in some ways better than experts at classifying the galaxies because they get “cleaner” results – meaning, they’re not distracted by a great deal of knowledge of the field. If you ask them to find and classify a swirly type shape, that’s what they’ll do – and they won’t be distracted by colour or size, because they don’t know that those things are relevant. However, in some ways the human ability to be distracted is useful as it allows us to look at things beyond what has been requested, something a computer wouldn’t do. Lintott showed us an image of a galaxy with a swirling green gas cloud, which he likened to Kermit the Frog. In actual fact the Kermit cloud (my name, obviously, not its scientific moniker) was quite a rare phenomenon, and would have been overlooked if a computer were doing a search based on the perimeters of the original classification request. Casting a human eye over an image, therefore, is sometimes preferable to allowing a computer to do it, and allows for a richer data set.
What the team hadn’t expected was how much members of the public would delve into the professional literature. Lintott conjectured that this was because as they became more involved with “their” galaxies they became more proprietorial, and were motivated by forum discussions to make conclusions on objects seen in those images. 40% of the people involved in classifications did it because they wanted to contribute to research, compared to 2.8% who wanted to do it simply because it was fun. It’s what Lintott refers to as “Citizen Science”.
A further example of this was a thread on the Galaxy Zoo discussion forum as to the sometimes odd shape or colour of things discovered in the images. A discussion was initiated amongst the forum contributors as to the nature of an object, a green round shape, which began to be observed in some of the images. This led to someone entitling a forum post as “give peas a chance!”, and the pun became a title for a whole host of jokes about the odd little items. They were referred to as “random, pea-like objects”, and no-one knew what they were. The forum contributors continued their investigations, however; these investigations have eventually led to the discovery of a new class of galaxy, based on the contributions of citizen scientists and their desperate enjoyment of a bad pun.
Lintott also name-checked the Old Weather resource, another member of the Zooniverse school of citizen science projects, which details models of past weather systems via information obtained from worldwide weather observations made by Royal Navy ships around the time of World War I. Contributors are asked to transcribe ship logs. What has become apparent is that Old Weather has ceased to be just a climate project, mainly due to the fact that the logs contain a lot of idiosyncratic data. An annotation on one said: ” Held impromptu concert on boat deck”. Another read: “Captain gave a talk on the evolution of man”. The potential for data mining is huge. For example, on one Navy vessel we can chart the rise in cases of Spanish Flu amongst the crew.
Of course, there are downsides to encouraging participation from members of the general public, not least how to ensure they remain engaged in the topic. Often, participants would complete a tutorial and then leave. Lintott suggests that maybe throwing them in the deep end, to a degree, is more useful in ensuring those who visit stay afterwards. Another unexpected issue was the sudden panic of some of the contributors upon realising they were making a real contribution to the project, and left abruptly. One wonders whether the idea that they were doing something real and tangible, and not simply playing a game, was perhaps too much responsibility. This probably explains the statistics Lintott supplied us with, which explained that whilst it has taken 100 million hours to create Wikipedia, 200 billion hours A YEAR are wasted by Americans watching TV. The secret, he suggests, is perhaps in getting people to contribute almost without their knowledge: making Angry Birds useful, for example. Scores, he said, are very seductive. This probably explains the success of Digitalkoot.
Digitalkoot is a joint project run by the National Library of Finland and Microtask. Their goal is to index the Library’s enormous archives, so that they are searchable on the internet, thus providing easy access to their cultural heritage. “You can help by playing games”, they suggest. Playing these games fixes mistakes in their index of old Finnish newspapers, thus greatly increasing the accuracy of text-based searches of the newspaper archives.
In Mole Bridge, you build a bridge for moles by typing words, as fast as you possibly can. Your answer must match exactly the word shown. If it doesn’t, the bridge blocks explode, and your moley friend plunges to his doom into the frozen wastes below. Correct answers turn to solid steel as soon as they are verified. When enough moles are saved, the score levels are counted. In Mole Hunt, you compare original words to those already recognised, clicking the tick if they match or the cross if they do not. The words have to match exactly. You complete the level when you have decimated the mole population and the flowers in the meadow can bloom once again (without fear of molestation?) It is simple, engaging, and bloody addictive, whilst at the same time contributing enormously to the indexing of the Library’s archive.
Lintott doesn’t like the term “crowdsourcing”: he believes that as a piece of terminology it is too cold, and doesn’t properly convey the rich experience that is collaborating on a project as a member of a community. I tend to agree, but I can’t claim to have any alternatives. What I can say definitely is that the crowdsourcing projects I have become acquainted with during the last nine months of my project research (including the Transcribe Bentham initiative) seem to me to be representational of the very heart of digital humanities – the collaboration between academic and technologist, expert and amateur; entering into a dialogue which hopefully engenders a useful, enticing resource which can be used to answer outstanding research questions, and to postulate new ones.
I think I can probably speak for all the attendees at today’s lecture when I offer my thanks to Dr Lintott for providing us with an entertaining and extremely interesting introduction to the DHOXSS. I hope all the lectures are as engaging, enlightening and humorous!