All You Need is TCP - EtherSAN and Storage Networks, Part II
John Wakerly of Cisco was most kind in his comments - nice to get someone of his stature to read my paper ("All You Need is TCP: EtherSAN and Storage Networks"). His issue is latency. Time is money, right?
Both John Wakerly and Greg Pfister (who's also come alive on this topic earlier this week) have the same issue, but are approaching it from different angles - with Greg, it's geography, and with John I think it comes down to time. But I think it's really the same answer, just like position is momentum, and energy is time.
John is right in saying that iscsi works. I don't dispute that. It's a good enough solution within the enterprise. And it was a very fast way to produce a product through reductionism. I approve of fast product cycles, since most of the time you're just building the same thing with a small variation on the theme, so reductionism is the way I'd do it for a normal product cycle.
But the reason I did the paper was precisely because it's about something that hasn't happened yet - global network storage, so I'm not tied to reductionism. I've noticed when the slipshod send-it-again overallocate-the-bandwidth habits of the Internet meet the obsessive-compulsive control-freak tell-me-what-went-pop-and-fix-it-now enterprise, things just don't seem to mesh right. So elements get thrown out and thrown in, depending on corner cases and product need. But that's not really the way to get a spanning set across products - it only takes care of the current product crisis.
So, like I told Greg a few days ago, when he drilled down on the "What's EtherSAN and why should I care?" question, I went back to minimalism versus reductionism. I know, I know, that old stuff, but that's the way I was trained - if I could introduce Maxwell's equations at the beginning I'd be happiest, just like every classic physics paper, but you know, it doesn't quite work...
Also, and I know this is kind of weird to folks, I've always been an admirer of the way Intel is able to reuse and resell their processor core, not because they used reductionism in the initial design, but because they used a spanning set design that they could then vary as needed - it only seems to be running out of steam now, because when I talk to some of the Intel guys it almost seems they've forgotten their own history and how they got successful in the first place.
So, the uniqueness of EtherSAN is in its being a single, comprehensive technology object, used in processors, networks and peripherals, whose expression in each is different, yet each uses the same mechanism from a different perspective. Like Intel's Pentium, the same core can be expressed with strikingly different products, yet is the same core.
The effect of this is to remove contradictory communications mechanisms that fragment the use of the Internet, replacing them with a fundamentally simpler mechanism that is more effective that any of the prior ones, but only when it is comprehensively deployed.
As an example of this, jitter and congestion are mitigated at the point they enter the path, not when they have compounded to impact the transfer. Also, instead of risking data integrity to signal transfer rate back-off, flow control is adjusted to directly constrain dynamic over-capacity instead. I've had several discussions with my co-author on the primary patent, Todd Lawson, over the years about this, as I found Dr. Tim Wu's University of Virginia draft paper discussing jitter's impact on the Internet as most useful.
The effect is more efficient use of the network by use of a comprehensive "end-to-end" solution that doesn't pay lip service to the principle, but embraces and enforces it stepwise along the communications path. Note that preserving the end-to-end principle is not the same as endpoint processing (I know John and Greg already know this, and so does Vint Cerf, but maybe not everyone else reading this), so one can use it in routers, as part of the spanning set. Nothing precludes the use of it "everywhere", as Vint himself said when he joined the Board of Directors at InterProphet - the progenitor of this technology. That's where Vint really got excited.
Nor is the end-to-end principle just WAN reductionism. The reliability of getting a web page across the network isn't like the reliability of storing files across disk drives, unless you think it's OK to resend and resend and maybe get through, and maybe a directory becomes unavailable during that time and you wanted to watch your CEO's latest speech and you can't. ipscsi relies on resources like storage software that takes advantage of rapid fault detection / recovery time within a geographic scale of an enterprise (usually less than a mile, maybe only a 100 yards).
Note that with Micah Beck's storage concept, he's allowing mutiple concurrent rights to merge as they pass on the network, so there's no thought given to an endpoint being the place where an enterprise integrity resiliance mechanism could be deployed - the network now must become responsible for the integrity of storage. There's nothing in ipscsi that anticipates this, and since you've left out integrity to reduce it to a solvable form, when it's pushed into the WAN are all the switches going to run some kind of storage integrity software like a copy of Veritas, running realtime, a few thousand to million virtual drives? How is it to be coordinated? How do we deal with congestion? Is it worthwhile using iscsi for this problem - aren't we being too reductionist here? Isn't this what TCP is good at?
I don't know - I like problems which have canonical solutions, so they can be solved. And I don't like to rely on the "someone else will solve it" problem - that's intellectually dishonest. You can do this in business, but it doesn't really become a scientist.
For example, Microsoft likes to make the packets fatter and fatter and fatter, so in increasing the transfer rate, they make an even larger series of packets and let the network figure out how to get it through. I guess this is the simplest way to deal with this in a reductionist sense, but is this the right approach from a telcom sense? If we built a phone system this way, customers may be annoyed that they can't talk for a minute while the buffer runs down. But it makes no sense to just get bits across the wire (reductionist) if I have a hideous situation at the endpoints and I lose too much along the way - a minimalist says I've got to get it through efficiently.
Reductionists say there's no way to deal with the speed of light issue so let's ignore it. A minimalist says that there's no way to deal with the speed of light issue, so let's make sure we don't have to retred old ground all the time.
Prior views of attempting to justify the end-to-end principle involved reductionism, not minimalism. What we mean by reductionism is that Layer 2 switches and Layer 3 networks need only be limited to those layers in scope in theory, by the selective addition of elements (or functional units). In practice, this is seldom true (example RED in routers, flow management in switches), and the limits of the performance of network communications many be traced to the "not good enough" compromises vendors have made here to mitigate the problems - for example, deep buffers and fat packets cause time skewing or high level jitter that affects media synchronization. (They also want to "own" these areas - it's not easy as you know to do a spanning set patent).
What we mean by minimalism is that the simple TCP/IP communications mechanism rules are the definition of what is preserved every step across the network (in migrating to this, steps between are compensated for as suboptimal, limiting the effectiveness of that path). Thus we are not minimizing the mechanism used at each step, but the rules used in communication and how we respond locally to transient/longer faults, in real time.
This is why I said in the title "All you need is TCP". I think Vint would agree. :-)
In this model, recovering from congestion by methods like adaptive inter-hop retransmission is allowed as long as they do not distort the smoothed bandwidth / propogation time of the hop as expressed as an increment of the total path. Interoperability is thus expressed more specifically as a bounds on TCP/IP protocol transformation operations allowed in each step across the path, rather than in just packet formats and protocol exchanges. Like the extensive timing diagrams of circuit switched telephone exchanges, a more rigorous specification of operation is implied, yet the effect is simpler than said exchanges because there is just one derivation of this, and that is the TCP/IP already in use - nothing different.