Social whitelisting with OpenID…


My ex-colleague Simon Willison has recently been doing some profoundly good work out in the wilds of the Internet promoting and explaining OpenID. In fact, the best articulation I’ve seen anywhere on the Internet of the OpenID concept is his screencast which I think neatly sums up the value of the concept as well as how easy they are to use.

You’re going to need to understand OpenID before I go much further, so if it’s an area that’s new to you, this is the point where you need to either go and watch that screencast or follow carefully the simple description of the service that follows…

OpenIDfundamentallyis a solution to the problem of having a million user accounts all over the place. Instead of getting hundreds of user names all over the place you go to a site that provides OpenIDs and choose one username and password. These sites then give you a pretty simple web address which is probably easiest to think about as a profile page for you. Then when you want to sign into any other site on the Internet with an OpenID all you do is type in the address of this profile page. The site you’re on wanders over to that address, the other site asks you for your password, you tell it your password and then you’re bounced back to the original site where you are logged in and can get on with your business unfussed. Sometimes the local site will ask you if you want a different user name. That’s all there is to it.

Having the same ID across a number of sites can also make a number of other things possible. You could hook up all the stuff you do over the internet really easily, and aggregate it and get a handle on it. You wouldn’t have to share your passwords with lots of different people either. All good. From my perspective, given my long term interest in technologies of moderation and social software, Open ID also provides one super-significant thing – relatively stable, repurposable identities across the web as a whole that can develop levels of trust and build personal reputations. But more on that in a moment.

Of course with new solutions come new problems and the most obvious problem associated with OpenID is ‘phishing’, which is to say that a site could ask you for your OpenID and then pretend to be your central provider. You type in your password thinking that you’re safe, but in fact you’re giving out your details to a rogue third party, who now can use it across all of your registered sites and services. Thislet us be clearis a very real problem and one that Simon talks about again in his piece on OpenID phishing. I’ve heard some really interesting ideas around how you might do this stuff effectively, but I’m still not completely sure that I’ve heard one that I think is totally convincing. This isn’t such a problem for the phishing-resistant old dogs like the people who read my site, but could be an enormous problem for real people. In the endif such a project is to take offI suspect this problem is going to be solved by a combination of design and education. People are going to have to get their heads around web addresses a bit more. Or we’re going to have to build something into browsers that handles distributed identities a bit more effectively.

The area that I really wanted to talk about today though was social whitelisting, which Simon and I were discussing the other day and which Simon has already written about on his site. This emerged out of some conversations about a very weblogger-focused problem, ie. comment spam. I’ve written about problems that I’ve been having with trackback and comment spam before, but every single day it seems to get worse. I get dozens of comment spams every single daysometimes hundreds. And this is even though I use extremely powerful and useful MT plugins like Akismet. And the spam is profoundly upsetting and vile stuff, with people shilling for bestiality or incest pornography, or apparently just trying to break comment spam systems by weight of empty posts.

Over the last year or so, it’s stopped being a problem that I’ve been able to deal with by selectively publishing things. Now every single comment that’s posted to my site is kept back until I’ve had a chance to look at it, with the exception of the few people that I’ve marked as trustworthy. It has now very much become a problem of whitelisting for meof determining which scant number of users I can particularly say it’s okay to post. And if this is where I am now, with my long weblog history and middling okay technical abilities, I can only dread where everyone else is going to be in a couple of years time. This is unsustainable and we have to change models.

Which is where the social whitelisting concept comes in. Most whitelisting has been around approving specific individual people, but this doesn’t scale. A large proportion of the people who post to my sites are doing so for the first time and may never post again.

The solution that Simon and I came up with was really simple and sort of the opposite of Akismet. Jason Kottke deals with a hell of a lot of comments every day. So does Techcrunch and GigaOM. Every day they approve things that people have written and say that it’s okay for them to be more regular posters. So each of these people is developing their own personal whitelist of people that they trust. More importantly I trust Jason and Techcrunch and GigaOM along with Matt Biddulph and Paul Hammond and Caterina Fake and about a thousand other people online. So why shouldn’t I trust their decisions? If they think someone is worth trusting then I can trust them too. Someone that Caterina thinks is a real person that she’s prepared to let post to her site, I should also trust to post on mine. This is one of the profound benefits of OpenID – it’s more reliable than an e-mail address that people can just spoof, but it’s just as repurposeable. You can be identified by it (and evaluated and rewarded for it) all across the whole web.

So the idea is simplicity itself. We switch to a model in which individual sites publish lists of OpenIDs that they have explicitly trusted in the past. Then individually, site owners can choose to trust anyone trusted by other site owners or friends. People who are trusted by you or your friends or peers can post immediately while the rest are held back in moderation queues for you to plough through later. But with any luck the percentage of real comments held back over time would rapidly shrink as real people became trusted and fake people did not.

Another approach to this idea would be to create a central whitelisting service with which you could share your specific trusted OpenIDs and associate them with your weblogs. Through a central interface you could decide to either accept a generic trusted set of whitelists from the top 100 weblogs on the planet to get you going, or add in the specific weblogs of friends, family and colleagues that you know share the same interests or readers. And of course individual weblogs can be rated subsequently for whether they let through people who subsequently turn out to be troublemakers, or rewarded for the number of real people that they mark as trustworthy. I want to make this particularly clear – I’m not talking about one great big web of trust which can be polluted by someone hacking one whitelist somewhere on the internet. I’m not talking about there being one canonical whitelist anywhere either. I’m definitely and specifically talking about you deciding which site owners (or groups of site owners) that you trust and that being the backbone to your personal service. People that your peers trust may be different to the people that my peers trust. And so it should be.

There’s even a business model here. I’d pay (a small amount) for any service that allowed me to have vibrant and enthusiastic conversations on my weblog without having to manually approve every single message. I’m sure other people would too. And of course, much like OpenID itself, there’s no reason that there should only be one such whitelist provider online. There could be a whole ecology here.

So what do people think? Does this have legs? Is it a sufficiently interesting idea to play with further? Where is the OpenID community on this stuff at the moment? Could social whitelisting of OpenIDs be the thing that rescues distributed conversation from death by spam? There’s a discussion over on Simon’s original post on the subject, or feel free to post below (but be warned that it may take me a while to approve your messages…)