Developing a URL structure for broadcast radio sites…

06/27/2004

One of the most common questions I’ve had about the Radio 3 redesign work that we’ve been doing has been about the URL structures that we have used to identify individual episodes of individual programmes. I’m really keen to address these questions with a full and maniacally over-detailed post because I think the issue of how we map broadcast programming to web URLs is a really interesting one, and because I think we’ve done some good work here that other people might find useful or interesting. Drew McLellan writes:

I see URLs like /radio3/showname/pip/randomcode which, as I understand it, would require a user to locate a particular show through the site’s navigational system. It looks like there’s no way of guessing a URL. Is that right? What’s ‘pip’? That makes no sense to me. My preference for date-based material is a path with the date in it – like /radio3/showname/2004/06/27/ Is there a reason why a URL format similar to this wasn’t chosen?

So the first thing to explain is that Radio 3’s new site is particularly interesting and ground-breaking because it doesn’t just have a page for every broadcast, it has a page for every episode. This is way cooler than having a page for every broadcast, but the full implications of it aren’t immediately easy to digest. Basically it means that there would only be one page for any documentary no matter how many times that documentary is repeated. That one specific page then becomes the definitive home for that episode of that documentary on the BBC and all subsequent information or supplementary material that is relevant to that episode can be stuck onto that page at any point in time. Imagine it as being a bit like having an entry in IMDB for that particular radio episode. It’s like creating the basis for an ever growing encyclopaedia of Radio 3 programming, and it should make it really easy to search for information about a programme without getting overwhelmed by dozens of versions of the same page, each containing little odds and sods of information, none of which are aware that they’re all talking about the same thing.

Having said all that, lots of programmes don’t ever get repeated on Radio 3. Let us take as an example, “Morning on 3″. This is basically the equivalent of the DJ-led shows that we’re all familiar with and which are common to radio networks the world over. These things are just broadcast live. That’s the whole point! It wouldn’t make any sense for it to be repeated. Some of the music on it will clearly be repeated – just like any popular music radio show, but the programme itself will not. For programmes like “Morning on 3″ Drew’s URL structure (which is familiar to all of us who run weblogs) would work perfectly. You can imagine very easily getting to today’s episode of Morning on 3 via the URL bbc.co.uk/radio3/morningon3/2004/06/27/. That would be the perfect weblog-like kind of programme, where every individual entry/episode could only be connected to one moment in time.

But if wouldn’t work if they programme ever got repeated. By definition a programme that gets repeated has been broadcast on multiple occasions in time. Imagine a programme that was originally broadcast on June 27th 1985 and which is then repeated the following evening and then again nineteen years later (tonight). What would be the date-based URL for a programme like that? Well one approach would be to go for the date on which it was first broadcast. But what’s the experience of that for a user? They’ve gone to a schedule page for today (say) and they’ve clicked on the link to a programme that’s on this evening and found themselves with a URL from 1985. A plausible reaction would be to think that you’d got lost somewhere along the line and were on the wrong page. How did I end up here?. This situation gets worse when you consider that since we started capturing programmes on the 4th of June, any programme that was originally broadcast before that date would be assigned a URL based on a fairly meaningless broadcast date…

So, a date-based URL structure would work fine for programmes that never get repeated, but wouldn’t work very well for any programme that did get repeated. Immediately, we’ve got a problem then, because even though 99.9% of the time we know that “Morning on 3″ won’t get repeated, we can’t exactly guarantee it. Just recently on the BBC we’ve had an unedited re-broadcasting of the live coverage of the 1979 General Election and the daily re-broadcasting in real-time of the Home Service’s commentary on the D-Day landings. So even those topical programmes we’ve talked about could quite easily be repeated.

But let’s pretend for a moment that isn’t too much of a problem. Let’s also pretend that we can easily distinguish between those programmes that almost certainly won’t get repeated on the one hand (and say they might work with a date-based URL structure) and those that very easily could or will get repeated on the other (say anything that’s pre-recorded before it goes out on air). What kind of URL structure should we use for the latter?

One obvious and simple answer is that we should use episode numbers. The Radio 3 show Composer of the Week is broadcast each weekday around lunchtime and then is repeated the following week at midnight. This means that there are two episodes broadcast on each day (another place where date-based URLs might get confusing or seem broken). If we used episode numbers, however, that wouldn’t be so much of a problem. So you can imagine the URL being something more like bbc.co.uk/radio3/cotw/episode/2345. This would allow you to predict sequence and order and would make the URL structure nice and hackable by users. Except then you have to think about what you should base that episode number on. Should you base it on the definitive numbers for that episode – ie. the ones that the makers of Composer of the Week use? How should you source that number? Do you trust that numbering scheme to be consistent and reliable? On the other hand should you start with an arbitrary number? And what happens if your system for determining repeats isn’t fool-proof and you accidentally assign the wrong number to an episode at some point? The worst eventuality would be that you end up with episode numbering schemes that start to wander out of sync with one another because someone pulls and episode or a schedule changes. And then you get gaps in your URL structure, or programmes out of order. Imagine a circumstance where after six months of perfect running you accidentally pick something up as being a repeat when it isn’t… Suddenly that episode has to be reinserted into the scheme somewhere by hand, or you have to change the URLs for any episodes that have been made into pages before you realised. The URLs break or what they point to change, and that whole part of the site stops being human hackable or readable and starts becoming institutionally and forever broken.

Or you could do it by subject for some of the URLs. Again – Composer of the Week is broken into five part weekly chunks. You could have a URL structure for programmes like this which highlighted those divisions: bbc.co.uk/radio3/mozart/part/4 or bbc.co.uk/radio3/mozart/4. Here the problems are potential URL length and namespace issues. And while they might remain human-readable, they’re not machine predictable in any way. So even this kind of URL structure has its problems.

I want to make something clear at this point – each one of these URL schemes could have worked very nicely for that particular kind of programming. But in the end that’s not enough. Because fundamentally as soon as you’ve decided to use different URL structures for different kinds of programming you’re immediately in trouble – because radio programming isn’t a static thing, it changes and evolves – an individual programme brand (say Choral Evensong) might change format, change frequency or be cancelled. Another programme might be created with the same name ten years later. And each week there will be a number of specials and one-offs and schedule fillers (this week on Radio 3 there were around seven one-offs, including tonights zeroPoints) as well as regular short-series or new brands. Suddenly there’s a time-consuming and fairly-skilled job that has to be undertaken every day – which URL structure should this new programme use… And you’re never going to be one hundred percent correct. And so pages are going to be moved and URLs break and all hell will break loose…

Which brings us to the URL structure that we went with in the end and the rationale for it. Our first principle was that in order to stop URLs breaking and to stop the possibilities of human error in assigning URL structures to brands incorrectly (and to deal with the possibility of random repeats et al) the URLs should all follow exactly the same structure. Fundamentally, this meant that date-based URLs had to go out of the window straight away because they weren’t suitable for every episode of every brand. The only URL structure that we could identify that didn’t actually break in any circumstances is one that’s based on an episode number or identifier of some kind. After careful consideration we decided that we didn’t want to give the impression of human readability or order or structure where that structure was inevitably likely to be broken or flawed or mismatched with other identifiers. And we decided that whatever additions to the URL that we made had to be short – it had to be able to be appended onto the end of a brand name without sprawling out of control. More importantly still, we decided that it shouldn’t break any naming conventions already used around the site or make the site harder to maintain.

Which is where ‘pip’ comes in. We’d already decided that we didn’t want to have the episodes sitting in the top directory of the brand. We’re in this for the long-term, and we wanted to make sure that we could guarantee that whatever future changes were made to the content management of the site, however many new things or features were added to it, we’d never have collisions between these features and the episode pages. We decided to place all episode pages into a subdirectory, and after much discussion of what that should be called (episodes – too long, not always an obvious term for a news programme / eps – too likely to already be used and too close to the name of a file format for us to be sure that it wouldn’t overwrite anything at any time in the future etc) we eventually decided to stake our claim on the directory name /pip/ meaning (if you really want to know) nothing more than ‘programme information page’. [PS. In a few weeks time, this directory should contain a list of all the episodes for each brand, meaning that you can hack back the directories and keep going up a level in the site heirarchy from individual episode to all episodes to brand to network to broadcaster.]

With the final part of the URL – the episode number itself – having taken into account all the problems that we might have with sourcing and guaranteeing the integrity of the ‘definitive’ numbers for any given series of programmes, and having considered the problems associated with any and all possible bugs that might emerge (what if two random programmes started to be considered as repeats of each other and had to be broken apart – what URLs to give them? What if the programmes were broadcast out of sequence oor we started running the site halfway through the broadcasting of a run and had to move around the episode numbers later etc) we came to the conclusion that the actual episode number should be a non-human readable short code. After much deliberation we came to the conclusion that a five-character alphanumeric hash would be short enough to not break URLs in e-mail and long enough to give us up to 60 million different identifiers. And of course we’ve kept it as a directory level URL to future proof the URLs against changes in the technology that we’ve used to build the site. (You’ll notice some index.shtml’s around the place, but we’re going to clear that up).

The alphanumeric short code that we’ve got now also opens up a whole range of new possibilities. Because these identifiers are unique across all of Radio 3, we suddenly have a way to point to (and potentially manipulate) every episode that’s broadcast on the network. We’re still looking into the various affordances that this identifier might provide us with and we’ll let you know what we come up with.

So – in summary – we have a URL structure that is eminently suitable for dealing with the breadth and wealth of programming that could come out of a radio network – a URL that will shortly be totally hackable to the extent that each and every level of the directory structure will contain content appropriate to its place in the site’s structural heirarchy ( broadcaster / network / programme brand / episode list / individual episode), and which is human readable as far down its length as is practical. Drew’s quite right – in order to guess the URL for an entry you do need to use the site’s inbuilt navigational systems. However, it’s almost impossible to be able to build URLs for radio programming that are completely human guessable and as reliable and stable as we’re determined to make them.

We’re thinking five to twenty-five years in advance here, making sure that the URLs of pages about radio programmes on Radio 3 could conceivably last as long as the web does. We’re in this for the long-haul…