LitLong uses natural language processing technology informed by literary scholars’ input in order to text mine literary works set in Edinburgh and to visualise the results in accessible ways.
We have created a very large database of place-name mentions in more than 600 books that use Edinburgh as a setting. We have then extracted the sentences immediately surrounding each mention and included those as an excerpt in our database. The data has then been mapped onto the city via the place-name mentions, and can be explored through a mobile app and online interface. With LitLong, you can walk your own paths through the resonant locations of literary Edinburgh.
Our aim in creating LitLong was to find out what the topography of a literary city such as Edinburgh would look like if we allowed digital reading to work on a very large body of books. Edinburgh has a justly well-known literary history, cumulatively curated down the years by its many writers and readers. This history is visible in books, maps, walking tours and the city’s many literary sites and sights. But might there be other voices to hear in the chorus? Other, less familiar stories? By letting the algorithms do the reading, we’ve tried to set that familiar narrative of Edinburgh’s literary history in the less familiar context of hundreds of other works. We also want our maps and our app to illustrate old connections, and forge new ones, among the hundreds of literary works we’ve been able to capture.
To create LitLong:Edinburgh we have used text-mining and georeferencing on extremely large and diverse collections of digitised books made available to us by – among others – the British Library, the National Library of Scotland and the Hathi Trust. In addition, some publishers and authors have shared their lists with us. We searched these collections for texts which, in the range and frequency of their use of place-names, showed all the signs of making Edinburgh their setting. A combination of algorithmic and manual curation then filtered these texts for ones that matched our criteria, giving us a dataset of hundreds of narrative works which explore the city or use it as a backdrop for their action. The Edinburgh places mentioned in these texts were then georeferenced using a bespoke gazetteer created to register the very different ways in which place might be named in fiction or memoir.
LitLong: Edinburgh isn’t comprehensive. We needed to use freely available corpora of digitised texts that could be text-mined relatively easily. We have also been constrained by copyright restrictions, and by the difficulties that Optical Character Recognition and current text-mining technologies have with poetry. We have also been confined, sadly, to works in English and, to some extent, Scots – we have not been able to adapt our language processing tools to Gaelic. Nonetheless, our methods have given us a dataset of more than 600 published works, by around 300 different authors, and amounting to more than 47,000 excerpts from their books.
Creating a gazetteer to recognise and plot the literary uses of place-names is a tricky business. Our ordinary ways of using proper names to pin down place are pretty various, and it isn’t easy to capture all the ways in which writers of narrative use location names in a single list. What’s more, some places have had lots of different names and variant spellings over the centuries — Edinburgh itself can be Auld Reekie, Edenborough, Edinborrow, and Embra. So there are inevitably some names that slip through the net, but which you might find in an extract associated with another place entirely. If so, let us know!
Time for an unsurprising confession: we haven’t read every excerpt in our database. Even worse: we’ve done that on purpose. We wanted to see how the computer would read our books, and to be able to explore the results of their reading. Whats more, we’ve used collections of books that have been digitised automatically, rather than being transcribed and encoded by humans. So sometimes you’re going to encounter strings of characters that veer in and out of sense around a place-name mention, though we've tried to filter out the most scrambled texts. Try saying any gobbledygook out loud, to see what it does to the sentence its in. Imagine its another verse in the Loch Ness Monster's Song.
Nobody’s georeferencing is perfect. Edinburgh has the annoying habit of sharing place-names – Haymarket, George Square, the High Street, for example – with lots of other places worldwide. So some of our extracts may well not be referring to an Edinburgh location at all, despite what our map claims. There are also some Edinburgh places – like the Parliament – that have moved over time, and which our gazetteer might therefore have put on the wrong spot. You could call these embarrassing mistakes. But we prefer to think of them as wormholes – points where the literary topography of contemporary Edinburgh touches other times or places through the coincidence of a name. They make our maps a bit like a grand and literary game of snakes and ladders, which doesn’t seem entirely like a bad thing. Feel free to report such errors to us using the Report Mistake link. We might be reluctant to shut the wormholes down, though!
Proper names are tricky. Not only are there common nouns that can also function as names – butcher, baker, though not candlestick-maker – but place and personal names are often linguistically identical. For example, the football manager Justin Edinburgh would give us a problem if he turned up in one of our books (our code would probably place him somewhere near the city limits). There are also plenty of titled folk in the books we’ve mined, and since titles are so often geolocated (all those earls and lairds of here and there), we’ve sometimes misread a person as a place and thus generated a phantom entry in our database. Also bars, pubs and restaurants: our gazetteer includes these kinds of places, and sometimes they take their names from common phrases, like The Waiting Room, the Hill Station, and the Golden Rule. So a few of these have crept in too, giving an unanticipated bit of random literary allure to otherwise unliterary places. Feel free to let us know if you find one of these, though an Edinburgh purged of its ghosts and its cultural allusions (even accidental ones) wouldn’t really be worthy of its own name.