How to build on bubble-up folksonomies…


[This post takes up some of the themes that Matt Webb, Paul Hammond, Matt Biddulph and I talked about in our paper at ETech 2005 on Reinventing Radio: Enhancing One-to-Many with Many-to-Many. A podcast of that talk is available.]

A few days ago I wrote about Phonetags, an experimental internal service that we’ve been running inside the BBC which allows you to bookmark, tag and rate songs you’ve heard on the radio with your mobile phone. Now I want to talk briefly a bit about one interesting way of using folksonomic tags that we developed conceptually while building the system.

The concept is really simple – there are concepts in the world that can be loosely described as being made up of aggregations of other smaller component concepts. In such systems, if you encourage the tagging of the smallest component parts, then you can aggregate those tags up through the whole system. You get – essentially – free metadata on a whole range of other concepts. Let me give you an example.

In Phonetags, we allow users to bookmark, rate and tag songs. They do so for a combination of personal gain and to add their voice to the collective. But music radio shows can be loosely understood as a collection of songs, and music radio networks can be equally understood as a collection of shows. So if ten songs that are well-rated and tagged with ‘alternative’ and ‘pop’ are played on one specific radio show, it’s quite plausible to argue that the show itself could be automatically understood as being tagged with ‘alternative’, ‘pop’ and that it should be considered well-rated. Similarly if all the shows are equally tagged with ‘alternative’, then it’s likely that you could describe the network that broadcasts them as an ‘alternative’ station.

How you handle the aggregation up the chain is an interesting question. My first instinct is that you would aggregate all the tags for a song, slice off the top ten or twenty and then throw away the rest and all the quantitative information. Then you do the same for all the other songs played in a show, and then reaggregate to see which tags have been played over the most songs. The alternative would be to simply add together all the tags that people sent in during that timeslot, but I think that would skew things towards the popular songs that people tagged a lot and wouldn’t necessarily reflect the character of the show itself. But that’s up for debate.

Another, and perhaps more intriguing, way of aggregating tags up through a conceptual chain would be to view albums as collections of songs and artists as a collection of albums/songs. This would mean that from the simple act of encouraging people to tag individual songs you were getting useful descriptive metadata on radio shows, radio networks, artists and albums:

The upshot of all of this is that you start getting a way of navigating between a whole range of different concepts based on these combinations of tags and ratings. The tags give you subject related metadata, the ratings give you qualitative metadata and from this you can start finding new ways to say, “If you liked this song, you may also like this album, network, album or artist“. You can start to generate journeys that move you from network to that networks most popular songs, through to the best albums on related themes (or which conjure similar moods or associations even if they’re by radically different artists) and so on.

And because you have a semantic understanding of the relationship between concepts like a ‘song’, an ‘album’ and an ‘artist’ you can allow people to drill-down or move up through various hierarchies of data and track the changes in an artist’s style over time. For me, this is a pretty compelling argument that understanding semantic relationships between concepts makes folksonomic tagging even more exciting, rather than less so, and may indicate a changing role for librarians towards owning formal conceptual relationships rather than descriptive, evocative metadata. But that’s a post for another time.

Are there other places where this kind of thing could be applied? Well, off the top of my head I can’t think of anything useful you could do with photographs, but I think folder structures on web-sites could prove an interesting challenge. I’d be fascinated to see if it would be possible to find well-structured websites with usefully nested folders and to aggregate tags from the individual pages up to section homepages and eventually to the site homepage. A little over a year ago I wrote about URL structure we developed for broadcast radio sites at the BBC built on the Programme Information Pages platform which you can see in action on the Radio 3 site. The URL structure mirrored a formal heirarchy much like the song / album / artist one – except for episode / programme brand and network. I’d be fascinated to know whether you could get a useful understanding of what Performance on 3 was about by aggregating all the tags from each of the episodes contained within its folder, and whether aggregating still further up to the frontpage of Radio 3 would give you a good description of the network’s philosophy and approach. One for Josh at, perhaps?

Now it’s over to you guys – can you think of any other heirarchies or places where we could encourage the tagging of the smallest practical component part and then derive value from aggregating up the semantic chain? Could the same thing work for non-heirarchic relationships? Anyone?