Categories
Social Software Technology

Social whitelisting with OpenID…

My ex-colleague Simon Willison has recently been doing some profoundly good work out in the wilds of the Internet promoting and explaining OpenID. In fact, the best articulation I’ve seen anywhere on the Internet of the OpenID concept is his screencast which I think neatly sums up the value of the concept as well as how easy they are to use.

You’re going to need to understand OpenID before I go much further, so if it’s an area that’s new to you, this is the point where you need to either go and watch that screencast or follow carefully the simple description of the service that follows…

OpenID—fundamentally—is a solution to the problem of having a million user accounts all over the place. Instead of getting hundreds of user names all over the place you go to a site that provides OpenIDs and choose one username and password. These sites then give you a pretty simple web address which is probably easiest to think about as a profile page for you. Then when you want to sign into any other site on the Internet with an OpenID all you do is type in the address of this profile page. The site you’re on wanders over to that address, the other site asks you for your password, you tell it your password and then you’re bounced back to the original site where you are logged in and can get on with your business unfussed. Sometimes the local site will ask you if you want a different user name. That’s all there is to it.

Having the same ID across a number of sites can also make a number of other things possible. You could hook up all the stuff you do over the internet really easily, and aggregate it and get a handle on it. You wouldn’t have to share your passwords with lots of different people either. All good. From my perspective, given my long term interest in technologies of moderation and social software, Open ID also provides one super-significant thing – relatively stable, repurposable identities across the web as a whole that can develop levels of trust and build personal reputations. But more on that in a moment.

Of course with new solutions come new problems and the most obvious problem associated with OpenID is ‘phishing’, which is to say that a site could ask you for your OpenID and then pretend to be your central provider. You type in your password thinking that you’re safe, but in fact you’re giving out your details to a rogue third party, who now can use it across all of your registered sites and services. This—let us be clear—is a very real problem and one that Simon talks about again in his piece on OpenID phishing. I’ve heard some really interesting ideas around how you might do this stuff effectively, but I’m still not completely sure that I’ve heard one that I think is totally convincing. This isn’t such a problem for the phishing-resistant old dogs like the people who read my site, but could be an enormous problem for real people. In the endif such a project is to take offI suspect this problem is going to be solved by a combination of design and education. People are going to have to get their heads around web addresses a bit more. Or we’re going to have to build something into browsers that handles distributed identities a bit more effectively.

The area that I really wanted to talk about today though was social whitelisting, which Simon and I were discussing the other day and which Simon has already written about on his site. This emerged out of some conversations about a very weblogger-focused problem, ie. comment spam. I’ve written about problems that I’ve been having with trackback and comment spam before, but every single day it seems to get worse. I get dozens of comment spams every single daysometimes hundreds. And this is even though I use extremely powerful and useful MT plugins like Akismet. And the spam is profoundly upsetting and vile stuff, with people shilling for bestiality or incest pornography, or apparently just trying to break comment spam systems by weight of empty posts.

Over the last year or so, it’s stopped being a problem that I’ve been able to deal with by selectively publishing things. Now every single comment that’s posted to my site is kept back until I’ve had a chance to look at it, with the exception of the few people that I’ve marked as trustworthy. It has now very much become a problem of whitelisting for meof determining which scant number of users I can particularly say it’s okay to post. And if this is where I am now, with my long weblog history and middling okay technical abilities, I can only dread where everyone else is going to be in a couple of years time. This is unsustainable and we have to change models.

Which is where the social whitelisting concept comes in. Most whitelisting has been around approving specific individual people, but this doesn’t scale. A large proportion of the people who post to my sites are doing so for the first time and may never post again.

The solution that Simon and I came up with was really simple and sort of the opposite of Akismet. Jason Kottke deals with a hell of a lot of comments every day. So does Techcrunch and GigaOM. Every day they approve things that people have written and say that it’s okay for them to be more regular posters. So each of these people is developing their own personal whitelist of people that they trust. More importantly I trust Jason and Techcrunch and GigaOM along with Matt Biddulph and Paul Hammond and Caterina Fake and about a thousand other people online. So why shouldn’t I trust their decisions? If they think someone is worth trusting then I can trust them too. Someone that Caterina thinks is a real person that she’s prepared to let post to her site, I should also trust to post on mine. This is one of the profound benefits of OpenID – it’s more reliable than an e-mail address that people can just spoof, but it’s just as repurposeable. You can be identified by it (and evaluated and rewarded for it) all across the whole web.

So the idea is simplicity itself. We switch to a model in which individual sites publish lists of OpenIDs that they have explicitly trusted in the past. Then individually, site owners can choose to trust anyone trusted by other site owners or friends. People who are trusted by you or your friends or peers can post immediately while the rest are held back in moderation queues for you to plough through later. But with any luck the percentage of real comments held back over time would rapidly shrink as real people became trusted and fake people did not.

Another approach to this idea would be to create a central whitelisting service with which you could share your specific trusted OpenIDs and associate them with your weblogs. Through a central interface you could decide to either accept a generic trusted set of whitelists from the top 100 weblogs on the planet to get you going, or add in the specific weblogs of friends, family and colleagues that you know share the same interests or readers. And of course individual weblogs can be rated subsequently for whether they let through people who subsequently turn out to be troublemakers, or rewarded for the number of real people that they mark as trustworthy. I want to make this particularly clear – I’m not talking about one great big web of trust which can be polluted by someone hacking one whitelist somewhere on the internet. I’m not talking about there being one canonical whitelist anywhere either. I’m definitely and specifically talking about you deciding which site owners (or groups of site owners) that you trust and that being the backbone to your personal service. People that your peers trust may be different to the people that my peers trust. And so it should be.

There’s even a business model here. I’d pay (a small amount) for any service that allowed me to have vibrant and enthusiastic conversations on my weblog without having to manually approve every single message. I’m sure other people would too. And of course, much like OpenID itself, there’s no reason that there should only be one such whitelist provider online. There could be a whole ecology here.

So what do people think? Does this have legs? Is it a sufficiently interesting idea to play with further? Where is the OpenID community on this stuff at the moment? Could social whitelisting of OpenIDs be the thing that rescues distributed conversation from death by spam? There’s a discussion over on Simon’s original post on the subject, or feel free to post below (but be warned that it may take me a while to approve your messages…)

30 replies on “Social whitelisting with OpenID…”

“So why shouldn’t I trust their decisions? If they think someone is worth trusting then I can trust them too. Someone that Caterina thinks is a real person that she’s prepared to let post to her site, I should also trust to post on mine.”
Probably, but even one-degree trust relationships like this can break. What if Caterina also runs a blog where she applies–in your view, at least–different standards about who can comment, but uses her single whitelist for both? Does GigaOm maintain a consistent set of criteria for allowing comments, and how would I know if it changes?
What if a friend’s post gets slashdotted and suddenly attracts a huge number of legitimate, though possibly annoying, comments from unknown users? I always think of this post at Dan Hill’s site, which is still gathering entertaining comments after four years. Not intelligent comments, but not spam.
I worry that a system like this would have to suddenly start developing a layer of rules on top of it just as onerous to manage as the original problem: I’ll trust anyone to comment on my site who’s been allowed to comment on Tom’s site, except for the people I manually delete from that whitelist, and so on.

From the commenter’s perspective, the nice thing about this is that it’d get your comment approved faster, and no extra work is required. As long as you’re on my list, or on the list of someone I trust, you’re set. If you’re not, I can still approve you if I want.
I picture an interface in Movable Type (or other software) that lets me add in the RSS feeds of people’s white lists I want to trust, and my software would check these lists periodically to look for new or deleted names.
I’d imagine over time you’d get a set of conscientious bloggers who’ve worked hard to moderate their comments and cultivate good lists rising up the “trustworthy white lists” lists, which could result in thousands of loads for that whitelist ever hour. A service like FeedBurner could aggregate the lists to give me a private white list feed containing all my white lists, which would cut down on having each person’s software check each person’s list.

So in theory I think it’s a great idea. I could basically put everyone on my bloglist as an approved commentator and anyone who is on their list would come through too.
However in execution is where it gets hard. I would imagine fake whitelists get on the system sooner than everyone thinks. Or people unfairly don’t get on the whitelists. I think a lot more work would need to really go into the edge cases to see if the system could be robust enough.
This system is not new is it? Isn’t this the whole thing around trusted certificates. I trust this certificate because someone else more important says it’s ok. So there must be some work done on this to figure out what the failures are.
I also think their is room here for a ‘trustability’ index. So I trust anyone who is in my blog roll more than I trust people who are in the blog rolls of people I trust. I trust you and JasonK, but then you also get a lot more trolls than I might. How do I weight trustability and how do I infer that on.
Also there some one who comments all the time on a friends blog should be more trustable than someone who has commented for the first time. Otherwise all you have to do is get one comment approved on one blog (which you can do b writing a genuine comment) and then you can spam everyone on the whitelist.
Still I think it’s a great idea. Just hard (not impossible) to implement. Lots of details.

All we really need is a simple standard to publish ones trust list, perhaps using relationships in FOAF? (Although you can’t define relationship types.) You’d also want to add an authentication type, obviously we all love OpenID, but some old fogies still use TypeKey 😉 and some people may want to use email address verification. You could also have a trust level that may climb the more a user posts comments, to stop spammers posting one good comment to gain your trust.
Just a couple of extensions to the FOAF namespace and you’re done, I’d love to get involved in this if you need some help (I’m an ASP.Net developer)

If this approach ever became widespread, you’d see the same problem that current spam prevention schemes have: hackers will 0wnz0r people’s accounts and use them to spam all the top weblogs in the world, bypassing their moderation system.

My opinion. OpenID is techy mumbo jumbo and not a serious ID platform, but for nerds.
One day, perhaps, they’ll invite the real world to use it, but right now: my email address, HTTP server, name, Windows Live ID, Google Account, Yahoo ID, drivers license, webcam, fingerprint, voice, social security number.. are just a few ways to better identify myself than that bullshit, distributed, GNUesque, ID service.
I bet it will hit the world like Linux Desktop OSes, that is, be ignored by everyone except a few per thousand.

I think Toms point is that you trust a few key people who you share similar commenters with, and that will cover the vast majority of your comments. You can then use filtering and moderation for the rest. There’s no need for a massive web of trust that can be compromised.

Absolutely, and because it’s a whitelist it doesn’t matter if some people mark some stuff as spam. You hold one list locally for a first check, if it’s whitelisted, it’s published straightaway. If not it checks online at one or other service for your whitelisted commenters list, which is assembled by aggregating the whitelists of people you trust. If it’s on that list, it’s published (you could probably score em by how many lists they’re on if you’re really protective). If not, it sits in your moderation queue.
If you want to be extra careful, then you could explicitly mark something as spam, which could take precedence over anything from outside. The hope would be that humans get through the process really quickly, where spammers don’t. Moreover any central site could offer up bundles of trustworthy people, or user names it believes to be whitelisted by default and run that as a service.
I’m definitely not saying that you operate on a peer to peer web of trust model that can be polluted down the line. This is very much ‘you’re currently using Kottke’s whitelist and Caterina’s whitelist of explicit decisions they’ve made about people they’re comfortable posting on their sites – change this preference by…’
There are other problems of course. You can still swamp stuff by throwing enormous amounts of spam at a system so it’s impossible to filter through to find the real people, so other mechanisms would have to be in place (much like today) to try and get rid of obvious spam attacks, but I really feel this could work as part of an ecosystem of prevention methods.

i have the issue with trackback spam. last week i had to go and close all posts over a few days old which is saving me time cleaning out spam per day but irritating that i have to do it. i would pay a small fee for a whitelist system. the idea that someone could do a legitimate comment, then be added to the whitelist and then turn into a spammer is a much less worry to me than a spammer being able to steal a trusted person’s info.

*Love* this idea, Tom.
And hate the centralization idea. Social networks are too diverse, expectations are too contextual. I’d trust everyone that Caterina adds, yes. Not necessarily everyone that Mike Arrington adds.

I think the best way, is to have a “trusted rank” that allows people to show how much they trust a user (could be set with number of posts, or personal relationships) then, another blog owner could set ceiling (only allow un-moderated posts from people with 50% or higher trust rank or something.
Just an idea.

Hello Tom, this is the same idea as our HEARTBEAT-ID Personal Internet Portal. You have your own side, there is a directory service to store your trusted application providers, OpenId providers, other trusted web-sites, and your white-list of persons who are allowed to post to your side /blog, call it your cross-site reputation list. The only difference with your idea is, that people can use their applications from this Personal Internet Portal, so avoiding most security-related risks depending on end-point devices.
Have a look at my blog at http://www.thinsia.com/blog
Roland Sassen

But the question here is the level of trust. I mean this is an anti-spam mechanism. All you’re saying is that Mike wouldn’t want to put spam on his site, not that you like his readers. And again, my intimation is that you precisely choose whose whitelists you want to use, with some being packaged up in clumps by whitelist providers as particularly trustworthy.
I think it’s a mistake to call it a social network though. You’re literally just sharing a resource with other site owners, much like you might share lists of blacklisted IPs with a friend. Only here, other site owners just say which of their comments come from real people.

Great idea, I reckon. It would work especially well for weblogs which share regular commenters, though I’m really not sure I’d want a whitelist from a popular, comment-heavy site.
All we really need is a simple standard to publish ones trust list, perhaps using relationships in FOAF?
There’s a proposal for grouping OpenID URLs, which looks like it might be pretty useful for this sort of thing.

Actually, this makes good sense IMO!
The danger with the “web-o-trust” model, with more than 1 degree of separation, is from propagation of errors in a hard-to-spot and hard-to-fix way… with only 1 degree of separation, I think it’s likely to work a lot better. In effect, you’re specifying a short list of trusted third parties (your social group) who you trust to approve posters for you.
There is still another issue, though. I don’t think this could work as the sole anti-spam mechanism… wouldn’t one still have to dig through the non-whitelisted “possibly spam” list now and again, to pick out the good comments, anyway?

It’s a great idea – basically Ebay trust mapped onto commenting. I’d accept a comment from someone on my blog according to the same value judgements I apply to the displayed trust during my Ebay buying. But as discussed above, management of this is unbelievably tricky.
Two things:
1 – Would be relatively easy to spoof/hack. If trust is a measure based on amount of blogs that trust me to comment, it would be very easy to create a hundred fake blogs to create trust in my id so I could then get into high-traffic blogs and spam.
2 – If we allow users to select the blog recommendations they trust to prevent the scenario above, does this not create an ever-tightening and closing circle of ‘elite’ bloggers and commentors, thus destroying the original open nature and purpose of what we’re doing?

I think pro account microcredits is the best way to deal with spam in email and comments. We’ll be going for that with our weblog platform that’ll prelaunch as a text-only, free*, weblog platform on February 14 at wordly.org.

* wordly.org will be founded direct out of my own pocket, as good will. Starting with 1000 free accounts. So if you think this is commercial spam – I can assure we won’t have any ads, “powered by” messages, nor sponsor messages. Only ultra-pure text logs, wholeheartedly respecting the latest W3C recommendations.

Hmm. So to my knowledge, the only already existing closed silo of trusted human entities is at Ebay.
Trust by its nature should consist of a trustee and a truster, so when you realize that ‘someone’ you trust is a bot, your negative reaction should degrade the original truster of the bot.
Also letting in trusted parties should be controlled with a slider most easily measured in ‘degrees of separation’.
Centralization: unfortunately it’s quite useful, I believe that Google spam filtering and Akismet both work well because of their centralized structure.
OpenID is quite fine, consider also mIDm.

Thereto, I just must say, I think it’s better if the “whitelist” is at the reader’s end and not the publisher’s.
Since some readers are easily offended by virtually anything (like me just being transsexual), while people like myself don’t mind direct death threats too much (since I get real, physical death threats, anyway) and totally defend freedom of thought and speech.

The reason that this won’t work is in your inbox every morning: those chain emails forwarded by well-meaning friends.
Trusting that someone is a real person is emphatically not the same as trusting their decisions about who else is a real person.
Like you, there are probably a thousand or more people online that I’d happily whitelist. A spammer will have a hard time getting on to my personal whitelist, because I’m quite savvy that way. However, that’s not the case for my well-meaning but hapless friend Credulous Bob, who sends me all those emails about Craig Shergold and mysterious Nigerian benefactors.
The more people I’m trusting to make the whitelist decision, the greater the chances that one of them is like Credulous Bob. Or even has a momentary Credulous Bob moment (it can happen to us all) when whitelisting someone. A whitelist-sharing scheme will only be as robust as the very weakest link, and any mistake will be propagated throughout the community. Evil Edna only needs to hoodwink one Credulous Bob and she can spam freely across gazillions of blogs.

This will not work. All it will do is create a market in manufacturing identities which have gotten past one of the ‘whitelist club’ participants. Think about it from this point of view and it’s obvious how the system will be gamed within days of its being set up.
The real barrier to spam will have to be some sort of micropayment you have to make to post a comment. A good idea would be to have a checkbox associated with your identity – something like (a) give the money to the site owner, (b) donate it to offset carbon emissions, (c) give it to the Gates Foundation, or some such.

Okay, people are again not actually reading the post. To ZF first – you’re thinking about this all wrong. I’m not proposing one centralised whitelist that everyone in the world uses, I’m proposing individuals posting their personal whitelists and you choosing which of those individuals you trust. An organisation could come along and say, “I am going to combine the white lists of Jason Kottke, Ev Williams, Techcrunch and ten other sites and make that available for everyone” if they wanted, but no one would be compelled to use it. The absolutely worst case scenario is that you as a weblog owner simply use the whitelists of two of your closest friends and them alone. Some comments that are then posted get published immediately and you have almost total trust that they’re real people. Best case, certain sites get a reputation for handling a lot of different comments and the vast majority of humans are allowed through immediately.
To Michael Mouse, I think the problem you’ve got there is the idea of propogation through a system. I’m not saying that things on Kottke’s whitelist get added to my own whitelist. My whitelist would be restricted to only things I’ve explicitly and personally said I believe to be from real people. There is no propogation at all, and should you discover that there is a weak-link in your network (ie. that someone else keeps trusting people who turn out to be spammers), you’d simply turn off using their whitelist in future. I’d recommend having a piece of functionality in the weblog publishing software that wrote against the name of the commenter which whitelists he was approved against. That way whitelists could get a reputation for trustworthiness or untrustworthiness. Sure in these circumstancesif Kottke approved a bunch of spammerswhile I’d got his whitelist turned on, I’d get spam. Butas I’ve saidonce I noticed this, all I’d have to do is stop trusting his whitelist. Easy.
The biggest problem with this as an approach would be individuals writing real comments with an identity around the place to get on people’s whitelists and then turning it over to automation for spam. But to be honest, even this isn’t such a big problem. First off, it really raises the effort bar on posting spam in the first place. To get on a bunch of people’s sites you’d have to get on a bunch of whitelists, which means you’d have to post a number of real comments on a number of people’s sites and get approved by them all. Only then would you be able to spam those sites at all.
I’m also interested in the possibilities of global despamming based on Open ID. Given that the same open ID would have to be used across multiple sites to get on whitelists, and could only then spam people, you’d really only need someone you trusted to describe them as a spammer for your local service to automatically unpublish them and ask for your approval again. Could be very easy indeed.

Certainly an interesting and worthwhile idea. I’ll be writing up my thoughts shortly on this.
I do worry about creating a two-tiered web, where the well-connected and well-known get their comments published first and the less frequent or lurking folks, who have just as relevant things to say, will have their ideas relegated to a high-latency waitlist. If you’re new to blogging or to commenting, I could see this as being quite a hostile environment for breaking into the ranks.
At the same time, I think it has its use as part of a full litany of anti-spam measures, and not simply taken alone. As a publisher, I would like more help wading through Akismet’s pages upon pages of spam — separately out the comments that *might* be spam from the ones that most certainly are.
Gmail already uses a similar whitelisting approach with your email, looking at your address book as one criteria for detecting spam. I know that when I created a new Gmail account with an empty address book, the incidences of spam that ended up in my inbox were much higher than when I imported my address book. As such, the social whitelist, powered by XFN + OpenID, could be a great asset for facilitating the processing of one’s comment inbox.

Great idea, and having scanned the comments above, something that could be implemented without too much pain.
There’s still the issue of working out whose comment moderation skills are to be trusted, but that’s solvable.
My main experience with this has been in the email marketing sphere, which is already facing and tackling the issues of authentication and reputation. The major sea-change in that industry is the move away from pure authentication systems (SenderID, DomainKeys, etc.) towards reputation-based systems.
So, tracking authenticated systems over a period of time, builds a reputation. The central resources to build this, and the human intervention involved is relatively substantial and whilst there’s an obvious commercial angle for email marketing, that could present more of a challenge in the blog world.

>>>The real barrier to spam will have to be some sort of micropayment you have to make to post a comment. A good idea would be to have a checkbox associated with your identity – something like (a) give the money to the site owner, (b) donate it to offset carbon emissions, (c) give it to the Gates Foundation, or some such.

Simon Willison was great at FOWA. The excitement of riding a viral project with massive adoption really came across. He certainly won the crowd over; Tariq (netvibes) and Kevin Rose (digg) committed to OpenID.
A big piece of the online community, reputation puzzle is coming together.

What if people had to pay to leave a comment. You know, like offering their 2-cent’s worth. They could buy comment credits (from PayPal, for example) and then pay with these credits. The credits could even be charged on a character basis so that longer comments would be more expensive. You could even charge extra for people to leave images or links. Would spammers pay to leave spam? I wouldn’t mind paying a tenth-of-a-cent to leave this comment.

I think it’s a mistake to call it a social network thougn.You’re literally just sharing a rescouce with other site owners,much like you might share lists of blacklisted IPs with a friend.
So I trust anyone who is in my blog roll more than I trust people who are in the blog rolls of people I trust. I trust you and JasonK, but then you also get a lot more trolls than I might.
You hold one list locally for a first check, if it’s whitelisted, it’s published straightaway. If not it checks online at one or other service for your whitelisted commenters list, which is assembled by aggregating the whitelists of people you trust.

Comments are closed.