For much of last year I worked on a project in the BBC called Programme Information Pages. Gavin Bell, Matt Biddulph and I recently gave a talk on the subject at ETech. The project was/is about creating coherent data structures and data stores for information about television and radio programming (and creating web pages from them). It sounds a bit dull when I say it like that, but it was an enormously complex and intricate piece of work. More importantly I think it’s one of the most important projects I’ve ever worked on in terms of its implications. I’m not going to go into much detail about it here, but I’ll put up the presentation at some point for those of you who are interested.
Anyway, one of the core parts of the work was to decide what we were modelling and what we wanted to represent with a web page online. Your normal EPG schedule or whatever represents what we came to term “Broadcast Instances” – an individual broadcast of an episode of a programme – the thing you need to navigate to when dealing with non-time-shifted broadcast media. But on the web such a approach makes no sense – we’re aspiring to create a permanent and navigable archive of programme information online. It wouldn’t make much sense to have to navigate through five hundred pages about the same episode of Only Fools and Horses, just because it’s popular enough to be repeated a lot.
The same applies to how you’d want to handle and enhance the data. There’s only a certain amount of information that you can usefully attach to a particular broadcast – because by its very nature this information cannot be reused for any subsequent broadcast. So fundamentally the core of the enterprise has to be larger than the broadcast – so we decided that the core cocept for both data and its representation online was the unique “programme episode” that could be broadcast any number of times.
Of course (as we say in the ETech paper), it’s not always particularly easy to say what constitutes an episode. Many programmes have different versions. At one level there are all the small variations like whether they have subtitles or closed captioning or whether they have minor edits for time. Above that are more complex variations like major edits to make programming child-friendly or that recontextualise it. And then you’ve got things like ‘Director’s Cuts’ – distinct versions of the programme that have been marketed as such to the general public. And then you’ve got other problems – what about programmes within programmes – small dramas inside magazine programmes, or cartoons cut into chunks within Saturday morning kids TV? And then you have questions of scale – is a news bulletin a programme? What about a weather forecast? There’s a lot of complexity here.
But once you have decided what constitutes a programme episode then something really significant happens – you can give it a name, make it addressable, you can – for the first time point at it. Better still, you can move from pointing at something to glueing handles onto it. And once you have such a handle, then you can pick up the programme and throw it around and stick labels on it and join it together with other programmes with bits of semantic string. You’ve moved your engagement with the programme from only being able to look at it to being to manipulate it and do things with it. And there is almost no end to the things you can do once you’ve uniquely identified a television or radio programme. It’s foundational. It’s like there are two views of the world – the solid one around us and the Matrix-style flowing green lines one. In this second world, until you give a thing a name – until you can point at it in greenspace – it simply doesn’t exist.
Since working on the PIPs project I see the same problem everywhere. When I use iTunes, the interface encourages me to believe that I’m buying unique songs, but actually their database has no idea about whether “You Can Call Me Al” on Graceland is the same song as the one on Paul Simon’s Greatest Hits. (Graceland was the first album I ever bought for myself, if you’re wondering about the example.) So I can very easily buy the same song twice. The whole thing operates as a shallow attempt to copy and extend the principle of the album that we’ve got used to through vinyl, cassette and CD formats rather than to clarify the principle from scratch. You can’t buy the same track on the same album twice via iTunes, but you can buy the same track from two separate albums, even though that’s astonishingly dumb. That wouldn’t happen if the songs were uniquely identified.
Worse still – the database that iTunes uses is completely distinct from the one that lies behind MusicBrainz. It’s completely distinct from the databases used by rights managers or by record companies as well. None of these databases have the slightest idea when they’re talking about the same song or not. None of them are capable of connecting usefully to each other – except by guesswork based on audio signatures or human-entered metadata. Inevitably this will be rife with clumsiness. Things will go wrong. Probably a lot of things.
When I buy books off Amazon, I’m always frustrated that they’re never completely able to show me the various editions of the same work. And why? Because they don’t actually know what a work is, they only know what an edition of a book is. Sure there are human-created links, but fundamentally there is only a limited collection of things that you can do with ISBNs. The ISBN is like creating an identifier for a television broadcast rather than an episode – it’s kind of useful in certain circumstances, but it makes it impossible to do really really simple stuff like ask, “when will this next be broadcast?” or “show me other versions – I want one with a spanish language track”.
Geographic space, of course, has addresses – and with longitude and latitude (and the fact that the same place can’t be in two different, er, places) – it makes an enormous range of things possible. And as UpMyStreet made clear, once you’ve got a relatively precise identifier / address for a semantically useful concept with some form of data structure around it, then you can build an enormous range of things on top of it. In fact, geographic space is almost one of the most solid proofs of the power of the identifier – think how many new and old pieces of technology rely on something so obvious as to be almost not worth mentioning – that a place can be identified and pointed at and referenced in some way. Maps, postcodes, property – just a few of the concepts that rest on those principles.
Now I know that the creation of universal and world-unique identifiers for things must seem one of the most tedious concepts or projects known to man. But I believe that it’s fundamental to our technological development – and particularly our ability to take our ever-increasing computing power and increasingly interconnected appliances and merge them seemlessly with the environment around us. The greenspace of the Matrix needs to merge with the physical – they need to become indistinguishable. Until we can point at, until we can pick up, until we can handle, we will never be able to use these concepts around us effectively.
(All of which is not to deny how that there are some things to which we should be nervous about attaching data handles. The concept of universally applicable identifiers feels creepy and wrong to many many people when it’s applied to humans. The idea that our identities should be reduced to a unique string of numbers or a hash creeps people out – and with reason. It’s absolutely clear that all the things that make universal addressability and unique identifiers (and I know these are very different things) so powerful for material and conceptual objects, make them equally scary and open to abuse when applied to people. But we simply may have no choice – because governments are right when they say that the possibilities are enormous. What you can simplify and develop and build and create above, around and between people (and on top of these identifiers) could change the world for the better or the worse. And we need to get used to the idea that movement in this direction may be irresistable and may be motivated by the desires of consumers and citizens demanding the ability to do things which we’re only not building because we’re nervous that they’ll be abused, not because they’re not useful. And that might mean we may have to say goodbye to the idea of being in any way invisible or untraceable to enter the world ahead.)
In decades to come, I think the time will live in now will be understood as the moment that the real and the map started the final stages of a merging that has been going on for hundreds of years. In this future world, all of our discrete objects (physical or conceptual) will be annotatable, or linkable to, referencable. Each ‘thing’ will be built upon in non-physical dimensions of data. And that final process of merging must start with addressability. It must start with identifiers. And it’ll need individuals and collective projects and business and government to undertake taht work, because it’s the foundation of everything that follows. The true power of our technology will not reveal itself until we know what a ‘thing’ is. We will not be able to work with concepts until we have pointed at each new thing (and like Adam in the Bible, I guess) given it a unique identity. From birthing concept into language, we’re now moving from moving concept into data. This is an age of naming, it is an of age of pointing at things.