A new way to look at networking.
Link: http://video.google.com/videoplay?docid=-6972678839686672840
I find this video incredibly interesting. It's talking about the phase transition that occurred between telecoms networks and packet switched networks, and questioning whether the same can happen between packets and data.
In a routed telecom network, your number 1234 meant move first switch 1, second switch 2 and so on - it described a path between two endpoints. In TCP your packet has an address, and each node in the mesh routes in to the node it thinks is closer to the endpoint with the target address.
Currently I have two or three home machines (netbook, desktop and file server) where data lives locally, plus data 'in the cloud' on gmail.com and assembla.com, or on my site here at tincancamera.com . Though as far as I'm concerned, that data isn't in the cloud - it's on whatever server at those addresses resolve to. If instead I google for 'pete kirkham', about half the hits are me. If I put a UUID on a public site, then once indexed I could find that document by that ID by putting it into a web server.
In the semantic web, there's been a bit of evolution from triples without provenance to quads - triples and either an id or some other means of associating that id with the URI of the origin which asserts it. It's considered good form to have the URI for an object resolvable to some representation of that object, so there's a mix of identifier and address. The URI for kin is http://purl.oclc.org/net/kin, which uses HTTP redirection to point it to the current kin site. This depends on OCLC's generous provision of the persistent uniform resource locator service, and maps an address to another address rather than quite mapping an identifier to an address.
There isn't an obvious analogous mechanism for data to the TCP nodes routing the request to a node which might be closer to having the data, though is some ways a caching scheme with strong names might meet many of the requirements. It sort of brings to mind the pub-sub systems I've built in the past, but with a stronger form of authentication. Pub-sub replication isn't caching, as in a distributed system there isn't a master copy which the cached version is a copy of.
There's also a bit of discussion between broadcast and end-to-end messages; I've got to work out how to do zero-config messaging sometime, which again sort of makes sense in such systems - first reach your neighbours, then ask them for the either the data, or the address of a node which might have the data. But there still isn't an obvious mapping of the data space without the sort of hierarchy that TCP addresses have. (although pub-sub topics are often hierarchic, that hierarchy doesn't reflect the network topology even to the limited extent that TCP addresses do.) It also has some relation to p2p networks such as bit-torrent, and by the time you're talking about accessing my mail account in the cloud, to webs of trust. A paper on unique object references in support of large messages in distributed systems just popped up on LtU, which I'll add to my reading list this weekend.
I've also started reading Enterprise Integration Patterns. There's a bit of dissonance between the cocept of channel in these patterns, and I guess in the systems they are implemented in, and the concept of channel in pi-calculus. In EIP, a channel is a buffered pipe for messages, and is a static entity created as part of the system configuration, in pi-calculus a channel is not buffered, and the normal mechanism for a request/response pair is to create a new channel, sent that channel to the server of the request, and receive the response on the channel. The pi-calculus model is the pattern for communication I'm intending for inter-process communication in kin (though I'm not quite sure whether giving explicit support to other parts of pi-calculus is something I want). I'm not sure that having to mark request messages with an ID (as in XMPP IRQ) rather than having cheap channels is a good programming model; though there are reasons that pretending that an unreliable connection isn't a pi-calculus channel is a good idea. I'll see what I think after I've tried coding the channels. Certainly returning a channel from a query given a strong name which lets the requester communicate with an endpoint which has the named data could be a model worth pursuing. REST is a bit different in that the response channel is assumed by the protocol, but the URLs returned by the application are channels for further messages to be sent to, and of course can be addresses anywhere allowing the server for the next request to be chosen to be the one best suited to handle that request (eg, a list of the devices in a home SCADA system could list the URIs of the embedded servers on the sensor net rather than requiring the client to know the topology prior to making the initial request).
Labels: distributed, kin, network, programming