For much of last year I worked on a project in the BBC called Programme Information Pages. Gavin Bell, Matt Biddulph and I recently gave a talk on the subject at ETech. The project was/is about creating coherent data structures and data stores for information about television and radio programming (and creating web pages from them). It sounds a bit dull when I say it like that, but it was an enormously complex and intricate piece of work. More importantly I think it’s one of the most important projects I’ve ever worked on in terms of its implications. I’m not going to go into much detail about it here, but I’ll put up the presentation at some point for those of you who are interested.
Anyway, one of the core parts of the work was to decide what we were modelling and what we wanted to represent with a web page online. Your normal EPG schedule or whatever represents what we came to term “Broadcast Instances” – an individual broadcast of an episode of a programme – the thing you need to navigate to when dealing with non-time-shifted broadcast media. But on the web such a approach makes no sense – we’re aspiring to create a permanent and navigable archive of programme information online. It wouldn’t make much sense to have to navigate through five hundred pages about the same episode of Only Fools and Horses, just because it’s popular enough to be repeated a lot.
The same applies to how you’d want to handle and enhance the data. There’s only a certain amount of information that you can usefully attach to a particular broadcast – because by its very nature this information cannot be reused for any subsequent broadcast. So fundamentally the core of the enterprise has to be larger than the broadcast – so we decided that the core cocept for both data and its representation online was the unique “programme episode” that could be broadcast any number of times.
Of course (as we say in the ETech paper), it’s not always particularly easy to say what constitutes an episode. Many programmes have different versions. At one level there are all the small variations like whether they have subtitles or closed captioning or whether they have minor edits for time. Above that are more complex variations like major edits to make programming child-friendly or that recontextualise it. And then you’ve got things like ‘Director’s Cuts’ – distinct versions of the programme that have been marketed as such to the general public. And then you’ve got other problems – what about programmes within programmes – small dramas inside magazine programmes, or cartoons cut into chunks within Saturday morning kids TV? And then you have questions of scale – is a news bulletin a programme? What about a weather forecast? There’s a lot of complexity here.
But once you have decided what constitutes a programme episode then something really significant happens – you can give it a name, make it addressable, you can – for the first time point at it. Better still, you can move from pointing at something to glueing handles onto it. And once you have such a handle, then you can pick up the programme and throw it around and stick labels on it and join it together with other programmes with bits of semantic string. You’ve moved your engagement with the programme from only being able to look at it to being to manipulate it and do things with it. And there is almost no end to the things you can do once you’ve uniquely identified a television or radio programme. It’s foundational. It’s like there are two views of the world – the solid one around us and the Matrix-style flowing green lines one. In this second world, until you give a thing a name – until you can point at it in greenspace – it simply doesn’t exist.
Since working on the PIPs project I see the same problem everywhere. When I use iTunes, the interface encourages me to believe that I’m buying unique songs, but actually their database has no idea about whether “You Can Call Me Al” on Graceland is the same song as the one on Paul Simon’s Greatest Hits. (Graceland was the first album I ever bought for myself, if you’re wondering about the example.) So I can very easily buy the same song twice. The whole thing operates as a shallow attempt to copy and extend the principle of the album that we’ve got used to through vinyl, cassette and CD formats rather than to clarify the principle from scratch. You can’t buy the same track on the same album twice via iTunes, but you can buy the same track from two separate albums, even though that’s astonishingly dumb. That wouldn’t happen if the songs were uniquely identified.
Worse still – the database that iTunes uses is completely distinct from the one that lies behind MusicBrainz. It’s completely distinct from the databases used by rights managers or by record companies as well. None of these databases have the slightest idea when they’re talking about the same song or not. None of them are capable of connecting usefully to each other – except by guesswork based on audio signatures or human-entered metadata. Inevitably this will be rife with clumsiness. Things will go wrong. Probably a lot of things.
When I buy books off Amazon, I’m always frustrated that they’re never completely able to show me the various editions of the same work. And why? Because they don’t actually know what a work is, they only know what an edition of a book is. Sure there are human-created links, but fundamentally there is only a limited collection of things that you can do with ISBNs. The ISBN is like creating an identifier for a television broadcast rather than an episode – it’s kind of useful in certain circumstances, but it makes it impossible to do really really simple stuff like ask, “when will this next be broadcast?” or “show me other versions – I want one with a spanish language track”.
Geographic space, of course, has addresses – and with longitude and latitude (and the fact that the same place can’t be in two different, er, places) – it makes an enormous range of things possible. And as UpMyStreet made clear, once you’ve got a relatively precise identifier / address for a semantically useful concept with some form of data structure around it, then you can build an enormous range of things on top of it. In fact, geographic space is almost one of the most solid proofs of the power of the identifier – think how many new and old pieces of technology rely on something so obvious as to be almost not worth mentioning – that a place can be identified and pointed at and referenced in some way. Maps, postcodes, property – just a few of the concepts that rest on those principles.
Now I know that the creation of universal and world-unique identifiers for things must seem one of the most tedious concepts or projects known to man. But I believe that it’s fundamental to our technological development – and particularly our ability to take our ever-increasing computing power and increasingly interconnected appliances and merge them seemlessly with the environment around us. The greenspace of the Matrix needs to merge with the physical – they need to become indistinguishable. Until we can point at, until we can pick up, until we can handle, we will never be able to use these concepts around us effectively.
(All of which is not to deny how that there are some things to which we should be nervous about attaching data handles. The concept of universally applicable identifiers feels creepy and wrong to many many people when it’s applied to humans. The idea that our identities should be reduced to a unique string of numbers or a hash creeps people out – and with reason. It’s absolutely clear that all the things that make universal addressability and unique identifiers (and I know these are very different things) so powerful for material and conceptual objects, make them equally scary and open to abuse when applied to people. But we simply may have no choice – because governments are right when they say that the possibilities are enormous. What you can simplify and develop and build and create above, around and between people (and on top of these identifiers) could change the world for the better or the worse. And we need to get used to the idea that movement in this direction may be irresistable and may be motivated by the desires of consumers and citizens demanding the ability to do things which we’re only not building because we’re nervous that they’ll be abused, not because they’re not useful. And that might mean we may have to say goodbye to the idea of being in any way invisible or untraceable to enter the world ahead.)
In decades to come, I think the time will live in now will be understood as the moment that the real and the map started the final stages of a merging that has been going on for hundreds of years. In this future world, all of our discrete objects (physical or conceptual) will be annotatable, or linkable to, referencable. Each ‘thing’ will be built upon in non-physical dimensions of data. And that final process of merging must start with addressability. It must start with identifiers. And it’ll need individuals and collective projects and business and government to undertake taht work, because it’s the foundation of everything that follows. The true power of our technology will not reveal itself until we know what a ‘thing’ is. We will not be able to work with concepts until we have pointed at each new thing (and like Adam in the Bible, I guess) given it a unique identity. From birthing concept into language, we’re now moving from moving concept into data. This is an age of naming, it is an of age of pointing at things.
11 replies on “The Age of Point-at-Things…”
And now the PIP concept really clicks. Thanks, Tom.
Now, my immediate thought — w/r/t what you’ve done with the R3 PIP identifiers — is that it doesn’t matter what you name something as long as it can be differentiated from other things. This goes back to Saussure’s work on grammar, which is profoundly un-Biblical: the idea that language is a system of signs with no positive terms. Make the identifier an arbitrary differentiator, and you can hang facets of metadata on it, create family relationships, etc. Trying to determine the identifier from the thing, though, is doomed to failure: it’s a bit like Swift’s nation of people who don’t speak, and just carry around a huge cargo of things to point at. Even when we don’t have to carry the things — and that’s the big difference — language is more subtle and tricky than pointing, and the way to make it point is to get over the idea that things disclose their own names.
Great post, very true.
One of the side benefits from being able to point at things is that you can hang tags on them – wouldn’t it have been easier to use radio 3 for the phonetagging project, over 6? You can’t tag a time, really.
Although 9.30 at home with the radio on would be “relaxing content preferable” right now.
Regarding Nick’s points on the Swiftian language found on Laputa; there’s also the whole “making a map of the world the size of the world” problem (wherein the most accurate map is ultimately a 1:1 replica).
Fortunately, with things like PIP, even if you have one page per broadcast that’s still more succinct than (say) exact transcripts, but given it’s the act of broadcasting (and not the program itself) to which PIPs refer to, this is basically a 1:1 scale model.
Now, in the case of TV listings, any model is going to be quite close to 1:1, but this brings about the problems Tom mentioned – how far do you go to define what is a “new programme” or merely an “updated version”?
The best answer I came to was some kind of fuzzy checksumming system for organic data (obviously impossible). Basically, a checksum assures that something is what it says it is – an accurate credit card number, a perfect download – usually by performing arithmetic on the data within.
So maybe a fuzzy checksum system would say “well, this programme is pretty much 99% the same as that one, so it’s actually a repeat and I’ll point to the same PIP”, whereas a 70-90% checksum could be “this is basically a director’s cut and should be treated as a seperate programme”.
Of course, then when you get a checksum of 1-10% (as opposed to 0) it could say “this programme contains an excerpt of another programme”, and then you can link “names” that are quoted by other “names” to one another.
Obviously this checksumming idea is pretty much impossible to implement for video/audio, etc, but it does begin to acknowledge that as well as minor modifications (update, director’s cuts, pre-watershed edits), major modifications (excerpts, sampling, quotation, reference) play some kind of important role in connecting named objects.
(Incidentally, I use a PIP above as an example of any “named object”).
Thanks – very interested in this, and please do put up your presentation whenever you can.
I wonder about what you could call ‘para-content’ for episodes (stuff other than the regular tracklistings and programme info that is related to the episode, but usually not broadcast as part of the episode), and how that content is tied to the PIPs for those episodes.
Example: the Mixing It web site has photos of Matmos recording a session, which are linked from the PIP for the episode that included the session. Is that content as permanent as the PIP, and is it navigable/findable other than going through the PIPs? I ask because some of this material has included valuable audio and video in the past.
This is a brillaint post. Thanks so much for clarifying what’s been running around my head for months now.
The lack of a PIP (and the inability to even artificially generate one) is what makes systems like AllConsuming, which is dealing with read-world items, so fruitless. Yes, five other people may be reading Dante’s Inferno, but only the paperback Penguin edition, who’s reading all the others?
The International Federation of Library Associations and Institutions created Functional Requirements
for Bibliographic Records, which would probably be of interest to you. In it, they describe a hierarchy of entities very much like what you’re talking about here. You might also be interested in OCLC’s work with FRBR. The xisbn work is a good demonstration.
Interesting post, it reminded me of the book ‘Point It’
The amount of use I’ve had out of this over the years is an excelent testament to the power of pointing at things.
Are you using a standard identifier type for the PIP project? Say, Extensible Resource Identifiers (XRIs)? (There are others; XRI is just the one I’m most familiar with since the company I work for is involved with it.) The idea of making everything labeling and thus addressable and linkable is something the digital identity folks have been working on for the last few years. It’d be a shame to reinvent the wheel.
excellent, tom, this is your best explanation yet of the core intention and importance of PIPs. this piece boils it all down to the raw crux of the issue: addressibility, i.e. ensure that all BBC programmes have a web-native handle.
i’m left with a few follow-up thoughts and/or questions that i imagine you’ve been thinking about too. i think it would be useful for you to write more about:
1) i sense that, for you, this “identification” occurs, *by defintion*, in the medium of the Web. in other words, if the beeb had some sort of internal ID numbers, that wouldn’t do the trick, would it? this is about the ability for the rest of the world (via the medium of the web) to refer to BBC programmes? it would be very interesting to hear more of your thinking in this area.
2) with my present understanding of the notion, i’d propose that PIPs doesn’t, by definition, solve any “data interoperability” issues of the kind you allude to with the example of iTunes/MusicBrainz. what we see there is that ANY organization with an interest can go around “identifying things” without much regard for the need to merge their data sets later. it’s not like iTunes or MusicBrainz went to a central repository of “names of things” and used those for their local databases; there isn’t such a resource currently, and arguably there should never be. so, i think it would be useful for you to help us understand what, if anything, would be the difference between, say:
in other words, is there something more “canonical” about the BBC-owned URI in terms of identifying the “Only Fools and Horses” thing? you imply that PIPs is a move toward “universal and world-unique identifiers for things”, but does an organisation inherently have a higher authority to identify its own products universally? do you have a vision of how this will unfold?
3) the use of a URI as the identifier is, of course, a no-brainer, BUT: the mental-model of nearly every person is that there is a “webpage” at the “end” of that URI, not a BBC programme episode. if we accept this default mental-model for a moment, then, of course, there is a webpage *about* the BBC programme at the other “end” of a URI like:
so is that URI the “name” of the “webpage” or the “programme”? is that distinction even relevant anymore? and if it isn’t relevant, then perhaps you’ve got some mental-model massaging to do…
blogs seem different to me, especially leaving the filter of this mental-model on for the moment. for a specific blog entry, say:
it’s more clear that the URI identifies the *content* which one actually finds on that “page” when you load it into your browser. to me, this seems to be because the actual text which makes up that specific blog entry actually appears and can be read from that “page”.
i suppose i’m wondering if there is, on the web, some sort of inherent difference between, “about the thing”-ness and “the thing”-ness? i think this is a complicated question, and that it provides the subtext around the editorial debate around what actually appears on a PIPs page. i’m concerned that the “webpage about the programme” and the “programme itself” are not synonmous in this mental-model, and i wonder if you have any concerns that this might prove problematic?
i should note that, when i take away the filter of that default mental-model, then i’d perhaps propose that there *is no difference* between “about the thing” and the “thing” on the web. but then we are left with a profligacy of URIs scattered around the Web which can identify the same thing (see #2 above)
The map is not the territory.
From Sylvie and Bruno Concluded by Lewis Carroll 1893:
“What a useful thing a pocket-map is!” I remarked.
“That’s another thing we’ve learned from your nation.” said Mein Herr, “map-making. But we carried it much further than you. What do you consider the largest map that would be really useful?”
“About a six inches to a mile.”
“Only six inches!” exclaimed Mein Herr. We very soon got to six yards to the mile. Then we tried a hundred yards to the mile. And then came the grandest idea of all! We actually made a map of the country, on the scale of a mile to the mile!”
“Have you used it much?” I inquired.
“It has never been spread out, yet,” said Mien Herr: “the farmers objected: they said it would cover the whole country, and shut out the sunlight! So we use the country itself, as its own map, and I assure you it does nearly as well.”