Categories
Design Net Culture Radio & Music

Developing a URL structure for broadcast radio sites…

One of the most common questions I’ve had about the Radio 3 redesign work that we’ve been doing has been about the URL structures that we have used to identify individual episodes of individual programmes. I’m really keen to address these questions with a full and maniacally over-detailed post because I think the issue of how we map broadcast programming to web URLs is a really interesting one, and because I think we’ve done some good work here that other people might find useful or interesting. Drew McLellan writes:

I see URLs like /radio3/showname/pip/randomcode which, as I understand it, would require a user to locate a particular show through the site’s navigational system. It looks like there’s no way of guessing a URL. Is that right? What’s ‘pip’? That makes no sense to me. My preference for date-based material is a path with the date in it – like /radio3/showname/2004/06/27/ Is there a reason why a URL format similar to this wasn’t chosen?

So the first thing to explain is that Radio 3’s new site is particularly interesting and ground-breaking because it doesn’t just have a page for every broadcast, it has a page for every episode. This is way cooler than having a page for every broadcast, but the full implications of it aren’t immediately easy to digest. Basically it means that there would only be one page for any documentary no matter how many times that documentary is repeated. That one specific page then becomes the definitive home for that episode of that documentary on the BBC and all subsequent information or supplementary material that is relevant to that episode can be stuck onto that page at any point in time. Imagine it as being a bit like having an entry in IMDB for that particular radio episode. It’s like creating the basis for an ever growing encyclopaedia of Radio 3 programming, and it should make it really easy to search for information about a programme without getting overwhelmed by dozens of versions of the same page, each containing little odds and sods of information, none of which are aware that they’re all talking about the same thing.

Having said all that, lots of programmes don’t ever get repeated on Radio 3. Let us take as an example, “Morning on 3”. This is basically the equivalent of the DJ-led shows that we’re all familiar with and which are common to radio networks the world over. These things are just broadcast live. That’s the whole point! It wouldn’t make any sense for it to be repeated. Some of the music on it will clearly be repeated – just like any popular music radio show, but the programme itself will not. For programmes like “Morning on 3” Drew’s URL structure (which is familiar to all of us who run weblogs) would work perfectly. You can imagine very easily getting to today’s episode of Morning on 3 via the URL bbc.co.uk/radio3/morningon3/2004/06/27/. That would be the perfect weblog-like kind of programme, where every individual entry/episode could only be connected to one moment in time.

But if wouldn’t work if they programme ever got repeated. By definition a programme that gets repeated has been broadcast on multiple occasions in time. Imagine a programme that was originally broadcast on June 27th 1985 and which is then repeated the following evening and then again nineteen years later (tonight). What would be the date-based URL for a programme like that? Well one approach would be to go for the date on which it was first broadcast. But what’s the experience of that for a user? They’ve gone to a schedule page for today (say) and they’ve clicked on the link to a programme that’s on this evening and found themselves with a URL from 1985. A plausible reaction would be to think that you’d got lost somewhere along the line and were on the wrong page. How did I end up here?. This situation gets worse when you consider that since we started capturing programmes on the 4th of June, any programme that was originally broadcast before that date would be assigned a URL based on a fairly meaningless broadcast date…

So, a date-based URL structure would work fine for programmes that never get repeated, but wouldn’t work very well for any programme that did get repeated. Immediately, we’ve got a problem then, because even though 99.9% of the time we know that “Morning on 3” won’t get repeated, we can’t exactly guarantee it. Just recently on the BBC we’ve had an unedited re-broadcasting of the live coverage of the 1979 General Election and the daily re-broadcasting in real-time of the Home Service’s commentary on the D-Day landings. So even those topical programmes we’ve talked about could quite easily be repeated.

But let’s pretend for a moment that isn’t too much of a problem. Let’s also pretend that we can easily distinguish between those programmes that almost certainly won’t get repeated on the one hand (and say they might work with a date-based URL structure) and those that very easily could or will get repeated on the other (say anything that’s pre-recorded before it goes out on air). What kind of URL structure should we use for the latter?

One obvious and simple answer is that we should use episode numbers. The Radio 3 show Composer of the Week is broadcast each weekday around lunchtime and then is repeated the following week at midnight. This means that there are two episodes broadcast on each day (another place where date-based URLs might get confusing or seem broken). If we used episode numbers, however, that wouldn’t be so much of a problem. So you can imagine the URL being something more like bbc.co.uk/radio3/cotw/episode/2345. This would allow you to predict sequence and order and would make the URL structure nice and hackable by users. Except then you have to think about what you should base that episode number on. Should you base it on the definitive numbers for that episode – ie. the ones that the makers of Composer of the Week use? How should you source that number? Do you trust that numbering scheme to be consistent and reliable? On the other hand should you start with an arbitrary number? And what happens if your system for determining repeats isn’t fool-proof and you accidentally assign the wrong number to an episode at some point? The worst eventuality would be that you end up with episode numbering schemes that start to wander out of sync with one another because someone pulls and episode or a schedule changes. And then you get gaps in your URL structure, or programmes out of order. Imagine a circumstance where after six months of perfect running you accidentally pick something up as being a repeat when it isn’t… Suddenly that episode has to be reinserted into the scheme somewhere by hand, or you have to change the URLs for any episodes that have been made into pages before you realised. The URLs break or what they point to change, and that whole part of the site stops being human hackable or readable and starts becoming institutionally and forever broken.

Or you could do it by subject for some of the URLs. Again – Composer of the Week is broken into five part weekly chunks. You could have a URL structure for programmes like this which highlighted those divisions: bbc.co.uk/radio3/mozart/part/4 or bbc.co.uk/radio3/mozart/4. Here the problems are potential URL length and namespace issues. And while they might remain human-readable, they’re not machine predictable in any way. So even this kind of URL structure has its problems.

I want to make something clear at this point – each one of these URL schemes could have worked very nicely for that particular kind of programming. But in the end that’s not enough. Because fundamentally as soon as you’ve decided to use different URL structures for different kinds of programming you’re immediately in trouble – because radio programming isn’t a static thing, it changes and evolves – an individual programme brand (say Choral Evensong) might change format, change frequency or be cancelled. Another programme might be created with the same name ten years later. And each week there will be a number of specials and one-offs and schedule fillers (this week on Radio 3 there were around seven one-offs, including tonights zeroPoints) as well as regular short-series or new brands. Suddenly there’s a time-consuming and fairly-skilled job that has to be undertaken every day – which URL structure should this new programme use… And you’re never going to be one hundred percent correct. And so pages are going to be moved and URLs break and all hell will break loose…

Which brings us to the URL structure that we went with in the end and the rationale for it. Our first principle was that in order to stop URLs breaking and to stop the possibilities of human error in assigning URL structures to brands incorrectly (and to deal with the possibility of random repeats et al) the URLs should all follow exactly the same structure. Fundamentally, this meant that date-based URLs had to go out of the window straight away because they weren’t suitable for every episode of every brand. The only URL structure that we could identify that didn’t actually break in any circumstances is one that’s based on an episode number or identifier of some kind. After careful consideration we decided that we didn’t want to give the impression of human readability or order or structure where that structure was inevitably likely to be broken or flawed or mismatched with other identifiers. And we decided that whatever additions to the URL that we made had to be short – it had to be able to be appended onto the end of a brand name without sprawling out of control. More importantly still, we decided that it shouldn’t break any naming conventions already used around the site or make the site harder to maintain.

Which is where ‘pip’ comes in. We’d already decided that we didn’t want to have the episodes sitting in the top directory of the brand. We’re in this for the long-term, and we wanted to make sure that we could guarantee that whatever future changes were made to the content management of the site, however many new things or features were added to it, we’d never have collisions between these features and the episode pages. We decided to place all episode pages into a subdirectory, and after much discussion of what that should be called (episodes – too long, not always an obvious term for a news programme / eps – too likely to already be used and too close to the name of a file format for us to be sure that it wouldn’t overwrite anything at any time in the future etc) we eventually decided to stake our claim on the directory name /pip/ meaning (if you really want to know) nothing more than ‘programme information page’. [PS. In a few weeks time, this directory should contain a list of all the episodes for each brand, meaning that you can hack back the directories and keep going up a level in the site heirarchy from individual episode to all episodes to brand to network to broadcaster.]

With the final part of the URL – the episode number itself – having taken into account all the problems that we might have with sourcing and guaranteeing the integrity of the ‘definitive’ numbers for any given series of programmes, and having considered the problems associated with any and all possible bugs that might emerge (what if two random programmes started to be considered as repeats of each other and had to be broken apart – what URLs to give them? What if the programmes were broadcast out of sequence oor we started running the site halfway through the broadcasting of a run and had to move around the episode numbers later etc) we came to the conclusion that the actual episode number should be a non-human readable short code. After much deliberation we came to the conclusion that a five-character alphanumeric hash would be short enough to not break URLs in e-mail and long enough to give us up to 60 million different identifiers. And of course we’ve kept it as a directory level URL to future proof the URLs against changes in the technology that we’ve used to build the site. (You’ll notice some index.shtml’s around the place, but we’re going to clear that up).

The alphanumeric short code that we’ve got now also opens up a whole range of new possibilities. Because these identifiers are unique across all of Radio 3, we suddenly have a way to point to (and potentially manipulate) every episode that’s broadcast on the network. We’re still looking into the various affordances that this identifier might provide us with and we’ll let you know what we come up with.

So – in summary – we have a URL structure that is eminently suitable for dealing with the breadth and wealth of programming that could come out of a radio network – a URL that will shortly be totally hackable to the extent that each and every level of the directory structure will contain content appropriate to its place in the site’s structural heirarchy ( broadcaster / network / programme brand / episode list / individual episode), and which is human readable as far down its length as is practical. Drew’s quite right – in order to guess the URL for an entry you do need to use the site’s inbuilt navigational systems. However, it’s almost impossible to be able to build URLs for radio programming that are completely human guessable and as reliable and stable as we’re determined to make them.

We’re thinking five to twenty-five years in advance here, making sure that the URLs of pages about radio programmes on Radio 3 could conceivably last as long as the web does. We’re in this for the long-haul…

18 replies on “Developing a URL structure for broadcast radio sites…”

Thanks for such a detailed response, Tom. Not being an avid fan of such radio programming styles, I wasn’t even aware that programmes got repeated! (how cheap!) 😉
I can see how going for a friendly URL just wasn’t possible and how a practical URL had to do. For me, however, it raises a different question. Should a page have only one URL?
I guess that’s a discussion for another day.
Thanks for going into so much depth.

Quick note on another issue – comparing and contrasting our URL scheme with some other ones from large scale sites:

Out of these, only BBC News makes any attempt to make the URL’s human readable and although it can be hacked back to the front page and to the UK section, it’s clearly having to convey a lot of other material as well, presumably whether the site is the UK or International edition and whether it’s high or low-bandwidth. But each of them recognises the importance of exposing an identifier for the film / programme / book itself. I think we’ve managed to strike a really nice balance between pure human readability and this kind of global identification system.

Just curious why you didn’t consider having multiple addresses for programs .. one set using the unique identifiers; another set date based with redirects. I can understand not wanting multiple identifiers; yet a date based url scheme could be considered as a sort of API.

It’s nice to see some detailed thought gone into designing URL schemes. I have one question – what happens if a ‘programme brand’ changes? Say, for example, that the Lunchtime Concerts brand becomes ‘Afternoon Concerts’. Of course, you could consider that this new name would be a new brand and ignore any connection, but this would be losing the history of the brand. You could keep the old URLs for the old episodes and try to explain away the confusion, but people ‘hacking back’ to the level of the old name wouldn’t get links to new episodes of the renamed brand.
Perhaps the benefits of having the programme brand in the URL (readability, hacking-back, etc) outweight the (perhaps small) likelyhood of this happening though.

Tom: sound Key definition and good explanation of your reasoning.
But Key definition needn’t drive the user-interface (corollary: CSS): why not have an additional alternate URL tree which maps/implements the Many:One relationship of showname+broadcastdatetime : uniqueepisode. The machine has access to the unique episode, the humans have guessable access to the episode they heard. Best of both worlds.

That was very interesting, thank you.
I see only one problem: what if Radio 2 repeats one of Radio 3’s programs? I’m thinking of the situation where a documentary on 6 is replayed on another, more accessible channel.
It seems to me that the BBC as a whole should have identifiers… or at least have some way of linking to shows originating elsewhere.
Still, I think you’ve come up with an excellent solution. I was thinking right up until you explained it that ‘pip’ was something to do with “bip, bip, bip, beeep” 😀

Answering a few comments – to Andrew first – the point of the URL structure is that it’s human readable up until the last moment where it can be. Admittedly /pip/ doesn’t mean much, but once you know what’s kept in that directory you can hack the URL from the top level down to the penultimate one. There’s nothing human unreadable about http://www.bbc.co.uk/radio3/morningon3/ for example and the /pip/ will mean the same thing in every circumstance. The only bit that’s resolutely not human readable is the final five character code – and as I said we debated for a long time what the potential benefits of exposing that might be…
To Mikel and the other people suggesting multiple URLs, I’m afraid I just don’t agree. The first problem with having multiple URLs in a circumstance like this one is that it breaks a lot of the things that we were trying to create. We wanted to create a page that would become the definitive URL and source for that programme episode and would show up cleanly and without spamming search engines. Sure, we could set up redirects on date-based URLS, but then the question is how do you expose these redirects to people without them also becoming URLs spidered by Google. I don’t want to go to Google and search for zeroPoints and get returned twelve pages for every time the same programme was repeated. And worse still, I don’t want to go to Google and search for it, but not find it because twelve different people have all referenced different URLs rather than the one central stable one. No – if we want a site that works well in the ecosystem of the web, then we want it not to clog up search engines with dozens of pseudo-duplicates of itself, obfuscating the whole process for users. Better just to the one default URL that will actually always work.
At which point, the other issues start to emerge – is it (for example) a reliable supposition that all radio programmes will actually have a broadcast date – if not now then in the future? Do we think that in twenty years time we’ll still be distributing these programmes exclusively through radio waves and at distinct times? I would think that’s extremely unlikely. And if we still were, then should we be creating a separate URL for programmes that are repeated on the same day? How would that look? Like http://www.bbc.co.uk/radio3/cotw/2004/12/03/a and /b? Does that still human readable? And what about those programme brands that have different episodes that are on the same day, the first one of which is a repeat of a programme from the previous week? Composer of the Week is broadcast first at midday and then the following week shortly after midnight. That means on any given day two episodes are broadcast and the ‘repeated’ one is first. So which one gets the ‘a’ URL for that day? And to what extent is that human readable?
And what about the schedules themselves? Radio and TV schedules are notoriously unreliable. They sometimes change right up until the actual day that they are broadcast. At what point would you actually assign the URL to them? And what would happen if the programme suddenly was postponed to the following week? Would you break the old URL or would you leave it there in public, still working but for a programme that wasn’t even broadcast on that date? And what if one episode was replaced with another one from the same day? Which one gets the URL then?
No – it’s actually totally impractical to use date-based URLs for programming – that is unless you’re prepared to wait until several days after the programme has been broadcast and then pay someone to check every broadcast and confirm that the schedule was exactly as expected. And that’s even if you ignore the desire to weed out repeats and to not spam the search engines with dozens of multiple URLs. And I can’t see any broadcast site thinking that it was okay not to talk at all about upcoming shows… It just wouldn’t be useful for them.

This is fascintating – and the site itself is a vast improvement on its predecessor, so congratualations on both the design and the structured thinking which lies behind it.
In the circumstances, it seems horribly churlish to suggest that the scheme is already a little bent, if not yet broken.
On today’s [28/6] playlist on the R3 homepage, there is a programme of music by the Freiburg Baroque Orchestra at 2115. Clicking on the link takes me to the PIP, which tells me that the programme is 35 minutes long. Terrific: it’s 2135 so I can catch the last 15 minutes, a simple matter of clicking on the ‘listen live’ link – but that says that Night Waves is on, starting at 2130.
The only explanation which makes any sense is that only one of the two works in the concert described on the PIP was played. But since the PIP isn’t broadcast specific, there seems to be no way of knowing that, still less of knowing which of the two works was actually played.
In short, the PIP seems to be based on the assumption that the integrity of the original programme survives the process of repeats, but in this case (and not uncommonly) that is not the case. Is there a need a further coding of the instance of the programme, which is separate from the underlying event, in this case the recording of a concert?
It’s still a great site though.

I have faced similar dilemmas. I wonder if this solution makes sense: make a canonical url for the programme, as you’ve done; in addition, create a separate date url that points to a separate page whose only significant content is the title of the programme and a link to the canonical programme page.

>> Sure, we could set up redirects on date-based URLS, but then the question is how do you expose these redirects to people without them also becoming URLs spidered by Google.
Use robots.txt to tell Google which directories are forbidden from indexing.

Anally Retentive URLs
Developing a URL structure for broadcast radio sites… (via Simon Willison Blogmarks). I love it when sites pay extra close attention to make sure their URLs lack cruft and are logistically aligned. It just makes sense. Gone are the days…

Tom, I agree that you’ve set up a strong foundation for radio3 by using unique identifiers for each episode and making them guessable up to a point. However, I wouldn’t want you to lock yourself into one structure.
I like to imagine URLs on dynamically built sites as being one of the simplest programming languages, instead of being simple reference id’s.
Your current url structure has two variables, unique id and showname. Using the showname pulls up a page linking to multiple episodes for that show. Using the showname and unique id produces a single episode.
However, limiting yourself to two variables seems overly restrictive when you have access to much more metadata. The date being one of the more popular variables, but I imagine you track more than just that. Using your example, I would love to be able to dynamically pull all of the segments devoted to Mozart by referencing /radio3/mozart.
As far as Google goes, it should be simple enough to give your “program” urls a “noindex, follow” meta tag asking Google to ignore the current url, but continue poking around for a good one.

Well that’s trickier than I can really go into – but in a nutshell you have to understand that the BBC doesn’t really do a lot of dynamic publishing and without dynamic publishing, the idea of a command line URL interface doesn’t really make a lot of sense since. You’d have to prepopulate and publish all the possible combinations! I know it sounds like madness but I can assure you that there are clear and strong reasons for this way of operating that aren’t likely to change over the next few years.
Also – of course – the idea of there being something ELSE sitting at http://www.bbc.co.uk/radio3/mozart is actually quite high as well – a programming related site includes not only different ways of slicing through programming, but also a large number of supportive / marketing / publicity-related materials, features and the like. So we couldn’t use it as a command line in that kind of way anyway. Potentially we COULD do something like that in areas that are served specifically by the part of the system we are using for programme information – ie. we could bracket off an area in the same way we do elsewhere by creating a pip/ directory – but it’s not obvious to me that this URL is human readable or makes a lot of sense as a command line interface.
Finally, of course, should the architectures change, we could very easily ADD such mechanisms into the site. Should a directly dynamically published site become practical which allows us to investigate keywords, descriptions, metadata and the like on-the-fly then it’s entirely possible that we would be able to find other ways of creating on-the-fly indices (like search results) or guides to BBC programming around these areas by some kind of URL command line – all we’ve committed to is a solid home for the concept of ‘programme episode’ which will never change. If this is the kind of thing you’re talking about – I should point out that it doesn’t sound that different from a search engine and we already have one of them…
PS. The robots.txt no-index / no-follow options sound like a good idea in theory, but they don’t stop people linking to the pages by different URLs, so they minimise the impact that any individual page can have in a search engine by making the canonical URL less ‘findable’ and ‘linkable’.

Work at the BBC (again)
There’s a programmer’s job being advertised at BBC Radio and Music Interactive in London, in the team where I work. It involves data munging XML, digital radio and other interesting technologies, and involves programming in Python, Perl or Java. You’d …

Work at the BBC (again)
There was a programmer’s job being advertised at BBC Radio and Music Interactive in London, in the team where I work. It involves data munging XML, digital radio and other interesting technologies, and involves programming in Python, Perl or Java. Appl…

Comments are closed.