Tuesday, 15 September 2009

RSS never blocks you or goes down: why social networks need to be decentralized

RSS never blocks you or goes down: why social networks need to be decentralized: "

Recurring outages on major networking sites such as Twitter and

LinkedIn, along with incidents where Twitter members were



mysteriously dropped for days at a time
,

have led many people to challenge the centralized control exerted by

companies running social networks. Whether you're a street

demonstrator or a business analyst, you may well have come to depend

on Twitter. We may have been willing to build our virtual houses on

shaky foundations might when they were temporary beach huts; but now

we need to examine the ground on which many are proposing to build our

virtual shopping malls and even our virtual federal offices.





Instead of the constant churning among the commercial sites du

jour
(Friendster, MySpace, Facebook, Twitter), the next

generation of social networking increasingly appears to require a

decentralized, peer-to-peer infrastructure. This article looks at

available efforts in that space and suggests some principles to guide

its development.





Update: a few days ago, OpenID expert Chris Messina and

microblog developer Jyri Engeström published



an article with conclusions similar to mine
;

clearly this is a felt need that's spreading across the Net.

Interestingly, they approach the questions from a list of what

information needs to be shared and how it needs to be transmitted; I

come from the angle of what people want from each other and how their

needs can be met. The two approaches converge, though. See the

comments for other interesting related blogs.


The peer-to-peer concept





The Internet was originally a parliament convened among peers. Every

host was a server, almost always providing file downloads and usually

email as well. To this day, ISPs 'peer' when they accept data from one

ISP's customer and delivers it to the other ISP's customer.





To peer doesn't mean simply to be of equal status--in fact, that

notion could be misleading, because two systems with vastly different

roles and resources can peer. More importantly, to peer means to have

no intermediary.





When the architecture requires an intermediary, it should play as

unobtrusive and minimal role as possible. For instance, Napster and

Skype have central servers, but they are used just to sign up

participants and set up connections among them.





Napster's and Skype's partial decentralization won them a key benefit

of peer-to-peer networking that Twitter could well take note of: they

offload most traffic from their central servers to the users and the

ISPs that connect them.





But being partially centralized means the service can still be

disrupted as a whole. Napster was shut down by a court ruling; Skype

shut itself down once through a programming error that it never

clearly explained to the public.





The Internet itself quickly developed into this hybrid model as well.

Modems and terminals created a new layer of second-class citizens,

vastly expanded by the PC revolution. These Internet users were tucked

away behind firewalls and blocked from using any services not approved

by system administrators.





By the year 2000, new companies springing up in the dot-com boom found

themselves frustrated by these restrictions, and designed their

innovative protocols to deliver data over port 80 because everybody

kept that open for Web traffic. When the practice started, traditional

Internet developers derided it as 'port 80 pollution.' Now it's called

Web Services.





As happens so often, the way forward proved to be the way

backward--that is, to restore the democracy of the early Internet--and

also predictably, was pioneered by outlier movements with dubious

legality, ethics, and financial viability. Napster made the first

impact on public consciousness, followed by services that rigorously

avoided any hint of central servers (see my 2000 article,



Gnutella and Freenet Represent True Technological Innovation
).





By the end of 2000, the term peer-to-peer had become a

household word. But the movement quickly went into retreat, facing

difficult design problems that were already under discussion in the

O'Reilly book

Peer to Peer,

published in February 2001. I summarized the problems, which remain

ongoing, in the articles



From P2P to Web Services: Addressing and Coordination
and



From P2P to Web Services: Trust
.





The issue of addressing would arise right away for a social network

developed in a pure peer-to-peer fashion. How would you check whether

your old college buddy was on the network, if you couldn't query a

central server? And how could you choose a unique name, without a

single place to register? Names would have to be qualified by domain

names or some other identifiers--which is actually a step forward

right there. It seems to me ridiculous that a company would plan to

provide a service to the whole world using a flat namespace. And while

we're at it, you ought to be able to change your name and bring along

all your prior activity.





Trust would also become an issue in decentralized social networks. You

could ban a correspondent from your personal list, but you couldn't

inform a central authority about abuse. And the problem Twitter has

recently started to tackle--preventing random users from impersonating

well-known people--would be a challenge.





But decentralization brings many benefits. A failure at one person's

site, or even on a whole segment of the network, would have no effect

on the rest of the world. A misconfigured router in Pakistan could not

keep everyone from accessing the most popular video content on the

Internet. And because each peer would have to obey common, understood

protocols, a decentralized social network would be transparent and

support the use of free software; nobody would have to puzzle over

what algorithms were in use.





Visiting many different sites instead of central server to pull

together information on friends would increase network traffic, but

modern networks have enough bandwidth to stand up to the load. Even in

places with limited bandwidth, service would degrade gracefully

because messages would be small.





The

StatusNet

project, which underlies

identi.ca,

represents a half-way step toward the kind of full decentralization

illustrated by RSS. StatusNet can power a variety of microbloggin

services, each signing up any number of members. The services can

interchange data to tie the members together.





The rest of this article looks at two possible models for a

distributed social network (RSS and XMPP), followed by an examination

of the recurring problems of peer-to-peer in the social networking

context.



Possible models





Many examples can be found of filesystems, version control systems,

and other projects that lack central servers. But I'm just going to look

at two protocols that other people are considering for decentralized

social networking.





When thinking of decentralized systems for sending short messages, RSS

and Atom have to come to mind first. They're universal and work well

on a large scale. And Dave Winer, the inventor of RSS, has created an

enhanced version called



rssCloud
,

recently



incorporated into WordPress
.





Given the first question I asked about decentralization--how do you

find the people you're looking for?--the RSS answer is 'by

serendipity.' Like everything else on the Internet, you could come

across new treasures in many ways: surfing, searching, friends, or

media outlets. Lots of bloggers provide links from their sites to

their own faves. And RSS has developed its own ecosystem, sprouting

plenty of aggregators that offer you views into new fields of

information.





rssCloud is meant to carry more frequent traffic and more content than

the original RSS and Atom. It maintains an XML format (making it

relatively verbose for SMS, although Winer tries to separate out the

rich, enhanced data). Perhaps because of the increased traffic it

would cause, it's less decentralized than RSS, storing updates in

Amazon S2.





XMPP was invented about the same time as RSS by a programmer named

Jeremie Miller, who wanted a standard instant messaging protocol with

tags that could support semantics, and therefore powerful new

applications. Most important, his creation, Jabber, made it possible

for individual users to run their own servers instead of depending on

America Online or Yahoo!. Jabber had the potential to complement Tim

Berners-Lee's idea of a Semantic Web.





Because Jabber used XML, it was seen as a bit heavyweight, and the

servers were reportedly hard to configure. But the possibilities were

too promising to pass up. So the IETF formalized it, gave it a clumsy

name suitable for a standard, and released a set of RFCs about it.

Unfortunately, XMPP languished until Google adopted it for their Talk

and Wave services. These high-profile applications suggest that it has

the scalability, flexibility, and robustness for social networking.



The P2P problems, in today's context





Even if decentralized protocols and clients were invented, there will

be a long road to democratizing social networks. The messages are

expected to be lightweight, so photos and other large batches of

content would have to be stored somewhere outside the messages. Most

users wouldn't trust their laptops (much less their mobile devices) to

store content and serve it up 24 hours a day, so they would need a

cloud service, which might or might not be distributed.





A backup service is also necessary in order to recover from a local

disk failure or other error that wipes out several years of your

accumulated identity.





Problems such as impersonation and unsolicited communications (spam)

are hard to solve in decentralized systems because trust is always a

hierarchical quality. This is true everywhere in life, beyond the

level of a family or neighborhood. We expect our professors to be good

because they were hired by the college, and expect the college to be

good because it was accredited by a centralized facility, whose

managers were in turn appointed by elected officials. This system can

and does break down regularly, so mechanisms for repair are always

built in.





Nobody can be banned from a decentralized social network because

there's nothing to ban them from. But there are ways to re-introduce

enough centralization to validate credentials. For instance, the

American Bar Association could register lawyers in good standing, and

you could check whether someone claiming to be a lawyer in the US was

registered. But we wouldn't want to take this process too far and

create a web of accreditations, because that would devalue people

whose skills and viewpoints lie outside the mainstream.





You could still check whether someone shares friends with you, because

one person's claims of friendship could be traced back to the sites he

claims to be friends with. Someone could game the system by setting up

fake sites claiming to be people you know and linking back to them,

but this is a huge amount of work and leaves the perpetrator open to

arrest for fraud. Free software developer Thomas Lord suggests that

identity could also be verified through 'a fairly shallow and

decentralized hierarchy of authentication like the system of notary

publics in physical life.'





All in all, the problems of finding people and trusting people

suggests that there's role for aggregators, just as in the case of

RSS. And these aggregators could also offer the kind of tracking

services (who talked about me today?) and statistical services (is

Michael Jackson's death still a major topic of conversation?) that get

business people so excited about Twitter. A decentralized social

network could still be business-friendly, because traffic could be

analyzed in order to target ads more accurately--but hopefully,

because peering clients are under the users' control, people who

didn't want the ads could configure their systems to screen them out.





When you set up an account, you could register with aggregators of

your choice. And whenever you connected to someone, you could

automatically register his account with a list of your favorite

aggregators, in case he hadn't registered himself. If people wanted

control over where they're aggregated, I supposed something equivalent

to a robots.txt file could be invented. But it's not sporting

to refuse to be counted. And there's no point in invoking privacy

concerns--face it, if the NSA wants to read your tweets, they'll find

a way.





So those are some of the areas where the problems of P2P and social

networking intersect. Let's remember that current social networks are

far from solving problems of findability, trust, and persistence as

well. I don't check how many followers I have on Twitter; I figure

most of them are spam bots. (Apologies to any of my followers who

actually are sufficiently embodied to be reading this.)





Could

OpenSocial

be used to implement a P2P social network? It's based on a single

object that is expected to query and update a single server. But the

interface could probably be implemented to run on a single user's

system, registering the users or aggregators with whom she

communicates and querying all those users and aggregators as

necessary.





Industry analysts have been questioning for years whether Twitter is

financially viable. Well, maybe it isn't--maybe this particular kind

of Internet platform is not destined to be a business. Responsibility

for the platform can be distributed among millions of sites and

developers, while business opportunities can be built on top of the

platform as services in analytics, publicity, and so forth.





Like Google, Twitter and the other leading commercial Internet sites

have made tremendous contributions to the functionality of the

Internet and have earned both their popularity and (where it exists)

their revenue. But the end-to-end principle and the reliability of

distributed processing must have their day again, whenever some use of

the Internet becomes too important to leave up to any single entity.


No comments: