Wednesday morning’s plenary lecture was given by Professor David DeRoure, a Fellow of the British Computer Society and interim Director and Professor of e-Research at the Oxford e-Research Centre. He is also the National Strategic Director for the Digital Social Research project. Professor DeRoure epitomises what I believe the digital humanities are all about, in that he is unafraid to collaborate with multiple disciplines to ask new questions, and seek new answers.
I have to confess that his illustrious position makes me feel rather better about the fact that I struggled with some of the concepts he was explaining, but actually this ongoing struggle is the basis for my attending this week’s conference. Coming from a literature background I sometimes find it difficult to engage with techniques which owe more to the Social Sciences, or to IT, but the application of these research techniques and the language they use is something I need to engage with, and to become comfortable with. Ho hum, I digress.
Professor DeRoure began by asking us to consider the Web in a variety of different ways: as an infrastructure of research, as a source of data, as a subject of research, and as a web of scholarly discourse. He commented that the data deluge has moved away from being an issue only for social scientists and scientists generally, but it is the science community who have reacted to this emphasis on data-led research by announcing a paradigm shift from hypothesis-driven research (the Fourth Paradigm I mentioned in my previous blog post). In fact, Science magazine went as far as to announce the end of theory (which rather brings to mind the great Mark Twain quote “Rumours of my death are an exaggeration”).
Supporting this Big Data, as I understood it, are computers with sophisticated-enough technology to sort through the masses of data – or Big Compute, as it’s known. And as the Science magazine article explains, we need to have this kind of technology – we’re children of the Petabyte Age, and we need to adapt accordingly. The Web should be about co-evolution – society and technology working together.
The problem with Big Data is that the temptation is to work within a sub-set that concentrates on proving your own personal theories. But we simply can’t work in that way anymore. We are, in the words of Nick Cave, merely “a microscopic cog”. We need to realise we can’t work in isolation and we can’t ignore other data simply because it doesn’t say what we want it to. One of the ways which DeRoure suggests this can be avoided is with the use of linked data. Linked data enables us to discover more things; we need to realise that our questions are often similar to those being asked within other disciplines, and that linked data can broaden our areas of understanding.
“Wait a second, back up!”, I hear you wail. What’s linked data? There’s a possibility the librarians amongst you will have started to sit up and take notice, as the idea of linked data is closely related to concepts like controlled headings in library catalogues. Essentially, the idea of linked data is that information can become linked, and therefore more useful. It needs a standard format, which is “reachable and manageable by Semantic Web tools”, and tools to make either conversion or as-necessary conversion achievable (for further clarification, the Semantic Web is simply “a collaborative movement led by the World Wide Web Consortium (W3C) that promotes common formats for data on the World Wide Web”)
An example of a project which will be published as linked data is the Digital Music Collections (SALAMI). SALAMI will analyse 23,000 hours of digitised music in order to build a resource for musicologists, drawing on a range of music from the Internet Archive. Students will annotate the structure of songs based on what Professor DeRoure termed their “ground truth” – meaning, what those annotators is say the structure of the song at the time in which they’re annotating it.
In addition to the sheer scale of the information we’re receiving, the nature of that data is changing. Twitter has generated a lot of energy within the Social Sciences community as to how useful the data they collect from Twitter can be. Some areas of the field have rejected the data on the basis that it was not collected correctly, or collated properly, but other areas of the field have embraced this new rich area of data, and are establishing new methods to deal with it. Professor DeRoure suggested that it may very well be a case of whether you consider your data cup half full, or half empty.
Whichever way you look at it, the loop is closing – social theory is being used to describe the data. But the need to create intermediaries for this form of data remains. The Web Science Trust proposes the creation of a global community which looks not at how you represent your data, but how you describe it.
So, let’s take a breath. I need one, frankly, and I’m guessing you do too. I’m omitting great swathes of Professor DeRoure’s lecture and endeavouring to stick with the nuts and bolts of what I think he was saying, so you will have to bear with me whilst I process my thoughts. I think, at this juncture, what he is suggesting is that because of Big Data and the need to process large amounts and different kinds of data (such as the information one could glean from collating tweets, for example), we are more in need than ever of a linked, coherent system that communicates with itself and doesn’t come up against any barriers in the learning process. I’m thinking now of the Web as a maze, in which one suddenly finds oneself at a dead end simply because a program is hosted on a different system, for example. Obviously the concept of the Semantic Web and linked data are steps to enabling this open process.
We are links in that chain too – it’s not just the machinery of the computer. We are as much a part of it as the hard drive is. Our interaction with computers is changing. SOCIAM (the Theory & Practice of Social Machines) is a project proposed by the University of Southampton which will attempt to research the best means of supporting “purposeful” human interaction on the Web. This interaction, they claim, is:
“…characterised by a new kind of emergent, collective problem solving, in which we see (i) problems solved by a very large scale human participation via the Web (ii) access to, or the ability to generate, large amounts of relevant data using open data standards (iii) confidence in the quality of data and (iv) intuitive interfaces.
The boundary between programmers and users has been dissolved by the Web, and our participation with it. This is mainly typified by social websites such as Facebook and Twitter. We are now merely a component of the Social Machine.
The picture here is of Ory Okolloh, the founder and executive director of Ushahidi: an example, cited by Professor DeRoure, of the social machine in action. Developed shortly after the Kenyan elections on the 27th December 2007, Ushahidi was created to map incidents of violence and peacekeeping in the country after the elections, based on reports submitted by mobile phone and via the Web. The incident was a catalyst for the website team to realise that there was a need for a platform which could be used worldwide in the same way. Ushahidi (Swahili for “testimony”), the social machine, was born.
An example of the way in which the website is used was given in The Spectator magazine in 2011:
“At 6:54 pm the first bomb went off at Zaveri Bazaar, a crowded marketplace in South Mumbai. In the next 12 minutes two more followed in different locations in the city…The attacks added to the confusion just as millions of people were returning home from work. With telephone lines jammed, many Mumbaikars turned to a familiar alternative: they posted their whereabouts, and sought those of their close ones, on social networks.
Facebook doubled up as a discussion forum…users on Twitter, meanwhile, exchanged important real-time updates. Moments after the explosions, a link to an editable Google Docs spreadsheet was circulated frantically on the microblogging site. It carried names, addresses and phone numbers of people offering their houses as a refuge to those left stranded. The document was created by Nitin Sagar, an IT engineer in Delhi, 1,200km (720 miles) away.”
Problems (of any description, be they the classification of galaxies or a bomb going off in a city centre) are solved by the scale of human participation on the Web and the timely mobilisation of people, technology and information resources. And those websites which refute the traditional idea of the “layperson [as] irrational, ignorant…even intellectually vacuous” are the ones which are the most successful: the ones who tell people what they’re about, and treat participants as collaborators, not as subjects. We are even coming to a stage where we consider human interaction with the machine as a sub-routine: a human-based computation, outsourcing certain steps to humans. Professor DeRoure cited Wikipedia as a good case in point – an interesting combination of automation and assistance rather than the replacement of the human.
And there are many dimensions to our social machines: the number of people, and of machines; the scale and variety of data – and how does one measure the success of a social machine? By the way it empowers groups, individuals, crowds? We are moving away from the idea of the Turing machine to one in which humans and machines are brought together seamlessly.
We are at Big Data/Big Compute right now. In fact, if I understand Professor DeRoure correctly, WE are Big Compute: “The users of a website, the website and the interactions between them, together form our fundamental notion of a ‘machine’”. Thus, we find ourselves on the edge of a new frontier. Technology isn’t transforming society alone, but people will, and the behaviour of machines over time will evolve because of its involvement with humans. In order to facilitate those changes we need to understand how to design social computations, provide seamless access to a web of data and to consider how accountable and trusted the components should be. Ultimately, we are citizen-scientists and human-computer integrations.
I hope this has made some sense to you – please feel free to comment via my Twitter profile and let me know whether you think I’ve accurately assessed the tone of Professor DeRoure’s lecture, or whether I’m barking up entirely the wrong digital facsimile of a tree.