Re: [Social-discuss] Which of four projects are we doing?

2010/3/27 Carlo von Loesch <address@hidden>

I'm catching up with social-discuss and still see the four
projects described by elijah, although (2) the protocol and
(4) the local daemon are closely tied to each other. I have
some questions and comments on many topics that have been
discussed... PHP, P2P, DHT, XMPP etc. I'm presuming that
our grand objective would be to do something like Facebook
in a decentralized high scalability free software way -
and maybe even better, so it provides incentive for people
to enter the next dimension of social networking.

Henry Litwhiler wrote:
> It'a all data, and the data is on the web.

I profoundly disagree with this. We are talking about
private data of private people. The web needs special
requirements in order to support private data, and the
web has no concept of the web of trust it takes to
figure out how much of a person's data should be made
available to this or that person requesting it. So
as it stands the data we need is NOT on the web.
We can MAKE it available via HTTP, but we first need
to discuss whether HTTP is an acceptable protocol to
this purpose. I'd prefer to think we can extend the
concept of the "web" to better suited protocols, and
still support formats like RDF where it makes sense
even if HTTP is not employed.

I think HTTP should be supported, im not saying it's the only protocol that should supported, but it's something that's had some success in scaling in a decentralized way.

For those not familiar with RDF, it's quite a simple concept, something not always explained very well.

When the web was invented back in 1989, the basic idea was to make the concept of a document a global identifier (URI), so that you can link from one document to another via hyperlinks. This simple architectural principle (global namespaced links) has enabled the distributed web of documents. Note that URIs are not HTTP specific, but that's what much of the web is built on.

RDF applies the same simple principle to data. You give data a global identifier (you can make them local, but the real power comes from making them global), and each data point is allowed to have one or more key/value pairs (which in turn can also contain other links). That's basically it. Simple but powerful.

It's really quite simple but since there's so many tools taking advantage of RDF, it's often easy to be distracted from the underlying principle. The advantage of making datapoints global, though it incurs an overhead, means you have much more scope for creating distributed architectures.

What makes you think the web is unable to do a web of trust? There's over 100 million foaf profiles out on the web (including google foafs) each showing links to people that are known or followed. That is a form of a web of trust already, and there's much more that can be built out quite easily, for example, with the proposed web of trust system : ( http://xmlns.com/wot/0.1/ ). OK, more needs to be built out, but there's nothing architecturally that prevents us from building a global web of trust too. In fact, I think that will quickly emerge as both a requirement and key advantage of GNU Social, but I think this is probably a discussion for a later date ...

Matt Lee wrote:
> GNUnet is not a good choice and it immediately breaks the design goal of
> having this run on commodity webhosting.
> That's why having it work in PHP with a minimal database is key.

No Matt, I have to disagree with this impossible design goal.
If we stay in PHP playground land without a proper backend protocol
we will only be able to make yet another web-based social engine
without a scalable real-time link to the rest of the world.

Not 100% sure I follow this, I thought facebook was built mainly on PHP?

A social network in a Facebook style generates events a go-go.
Each time a user adds a comment somewhere, each time a user likes
something, writes an update, joins a group or adds a friend.
Every time a notice needs to be distributed to all peers.
This is a one-to-many operation that hasn't got a ghost of a chance
of scaling if implemented as a round-robin series of HTTP calls.

It's an architectural and mathematical challenge for sure, perhaps one reason why no one has yet solved this problem. I'm not saying HTTP is the best protocol for this, but sending an update to 40 friends is probably no more complex than loading a web page with 40 images on it, something most people probably do every day.

Is there some magic trick I am not familiar with that allows us to do
real protocols on persistent TCP connections on so-called "commodity
webhosting" or should we rather create such a profoundly important
technology that will influence "commodity webhosting" in such a way
that it will become common to support gnu social?

Matt Lee reported from LibrePlanet 2010:
> Two members of the conference, Ian Denhardt (from GNU social) and Ryan
> Prior, a student from Wisconsin, gave an impromptu brainstorm on GNU
> social, with PubSub, OAuth, OpenID and FOAF mentioned.

Several of these interfaces are HTTP-based and not fast enough
for inner federation real-time interactions. This will not scale.
But we can use them to gateway to external applications.

Ted Smith wrote:
> I fully agree with this - I see GNU Social as most optimally having a
> server daemon that does "real work" whatever that is, with UI's in
> various languages/models (web UI, GTK/Qt/Whatever UI, etc.). GNUnet does
> this, and I think it's the right choice.

Yes, this is the architecture we should build. We need a protocol that
makes it efficient and easy to:
* send many events to many recipients in a shotgun fashion like a
multiplayer distributed computer game
* supports basic concepts of subscription, friendship, web of trust
* easy addressing scheme that can be exchanged in a bar or a bus
* encryption friendly on various levels
* ability to share binary data with people?
* data agnostic?

What are the candidates?
* GNUnet? I'm not familiar with this yet.
* PSYC (native multicast support, binary transparent, lean)
* XMPP (popular, verbose, no native multicast, no binary transfers)
* various computer game protocols maybe?

Both PSYC and XMPP already have several social features, an easy addressing
scheme (PSYC is a bit nerdy, XMPP is spam-friendly by using email syntax -
we had a discussion on the usefulness of http urls for people earlier)
and possibly web tools available already. PSYC has a builtin notion of the
web of trust. Don't know about other protocols. There is also the question
of architecture.. decentralized intelligent servers or P2P with a DHT?

Sylvan Heuser wrote on the cons of a P2P architecture:
> Apart from caching (which would generate new problems), when the
> machine or connection (This also applies for SheevaPlugs) of a user is
> down, the profile of this user will also be inaccessible.

Both can be addressed. If profile data isn't cached but rather multicast
actively to all intended recipients, then they can view and work with it
at any time. Henry Litwhiler's description of GNUnet suggests that it would
do such a job. It does however require a server-based spooling facility
somehow when recipients of messages are currently offline to ensure they
will later operate on the current set of profile data. If this is the case,
people can even leave comments on a status update of someone who went
offline - the drawback here is, such a comment would wait in the message
queue of the update's author and only be multicast to his subscribers
when the author comes back online - so it wouldn't feel like Facebook.

Sylvan Heuser wrote on the cons of a traditional servers architecture:
> It is nothing really new, just a mild form of centralization.

I disagree on calling the traditional Internet architecture where
everyone runs her own server if she likes to, "mildly centralized."

Concerning the privacy of your data I presume we agree that we want
all transactions to be encrypted so that only the intended recipients
of our data receive the data they are intended to receive - it is
however also clear that you shouldn't have to keep your data secret
from your own server, as it is the same data you intend to share with
your friends anyway. I mean, you can't share things with your friends
without trusting them. A truly P2P trust approach is feasible even if
distributing servers are involved - in that case however you can't
easily have a web-based Facebook-like interface. It would have to
run on your own computer to be truly private. We should consider this
scenario, where a gnu social web-app is deployed as a local private client
for the purpose of keeping data off servers as best as possible.

I think you've raised a key point, you should be able to choose who gets to see what, based largely on who you trust (e.g. your friends).

elijah wrote:
> The simple fact is that p2p applications are horrible at storing
> persistent data. You need to do what wuala does: provide a reason to
> keep the client open while also providing enough cloud servers to keep
> the distributed hash table from degrading with the intermittency of
> clients.

How much load can a DHT really take? How slow or fast does it operate?
And can we get a DHT to push and multicast information? It is not enough
if the peers of a person have to poll a DHT to find out what's new.
We need information to reach the peers as fast as possible and as
efficiently as possible. A DHT would have to provide multicast distribution
trees and inform subscribers of changes - but that doesn't make sense,
since a subscriber wouldn't know which DHT entry to subscribe to, or
would she?

Once we have figured out how we can solve the very fundamental problem
of getting all information everywhere on time we can start thinking on
how to make it look good - be it with php-based web interfaces or
native client applications. Then again, if you PHP friends are already
plugging together a GUI (don't we already have several GPL social web
apps?) that doesn't harm if you are ready to pass all network transactions
to a lower layer, an independent gnu social daemon.

>From what I gather about NEPOMUK/semanticdesktop.org it serves a different
purpose and could be integrated as a sort of client to the gnu social
network daemon.

The KDE stuff is getting more and more impressive, ideally, it should be immediately compatible with gnu social. For example, I should be able to add a rating to a file in my file explorer (dolphin) then choose to securely share it with one of my contacts on GNU Social all as part of an interlinking subsystem. Since Nepomuk talks RDF it's immediately compatible with all other systems that use linked data.

I guess we need to sort out the list of requirements in the wiki
to ensure we have the complete picture of what needs to be done
and what the options are, before we go into voting and pick our choices.
I'll start playing with http://groups.fsf.org/wiki/Group:GNU_Social/Ideas
soon. Maybe we should even employ some liquid democracy for more elaborate
voting than just +1 or -1 on the mailing list.

Pablo Martin wrote:
> and the best we can hope is they will be
> interconnected by having solid protocols for federation, and that
> everyone will be able to choose the server, software and language they
> like more to host their data, so... if we can connect the ones we have
> we will be there. as such maybe gnusocial should focus more on the
> federation technologies or protocols that will be considered, and
> sanctioning softwares that comply, then each "team" can implement it on
> their own software. Statusnet (php), crabgrass (ruby), elgg (php), pinax
> (python) ,psyc (weird language :P)...

Full ack.

I'm inclined to agree, but I think you have to start somewhere and do something really well, so the proposed PHP solution seems a good place to start, then allowing ports to other languages etc.

Story Henry wrote:
> As you know, I believe the web is a p2p decentralised system. So I
> believe that foaf+ssl build on a p2p architecture. It's just that people
> think of it too much in terms of client server.

The things you show that can be done today with a client certificate are
very interesting. You won't be able to do a decentralized Facebook-like
interface, but I can tell foaf+ssl allows for a lot of applications.
It could be yet another way to access the data pool. But I wouldn't say
it's P2P if the data resides on a web server. I presume you don't intend
to shut down the web server each time you put your laptop to sleep, right?

Andrew Gray wrote:
> What if a given university wants to provide a GNU Social server for its
> students? Suddenly, the university has the power to censor any and all
> data that ends up on it, because it is distributing the data.

That is a good question.. in an "intelligent server" approach we need to
tell the server as much data of us as it needs to make our friends happy
which includes knowing who our friends are. We can have extra encrypted
data distributed transparently (binary data multicast useful here) on
top of that, though. That would be a nice beyond-Facebook bonus.

Dan Brickley wrote:
> I'd suggest having a principle that identifiers should be portable and
> associated with domain names that users can control/own. I believe
> XMPP supports this level of indirection via DNS; if you put certain
> structures in your DNS, you can express a delegation to commercial
> services eg. GTalk without that transient commitment to google (or
> your university) being encoded in your friends' buddylist entries.

Good idea, but with XMPP and DNS you need to transfer entire servers
or "domains" which aren't sufficiently lightweight to have one for
each person. PSYC has account forward and redirect functionality.
XMPP would need to be retrofitted with that AFAIR.

Ooph. End of my catch-up.
Thanks for your attention if you made it all the way down here. :-)

Thanks for sharing, enjoyed reading :)

--
___ psyc://psyced.org/~lynX ___ irc://psyced.org/welcome ___
___ xmpp:address@hidden ____ https://psyced.org/PSYC/ _____

From:	Melvin Carvalho
Subject:	Re: [Social-discuss] Which of four projects are we doing?
Date:	Sat, 27 Mar 2010 16:14:44 +0100