What, me worry?

This little article just in from a dedicated Cisco engineer. Looks like Cisco is taking a “broadside” from Broadcom in the “TCP offload” universe.

Of course, notice the weasel words of “selected network streams”. In contrast, at InterProphet we showed a 10x advantage of all network streams on NT at Microsoft in their offices in Redmond in 1998 with a patented design.

So it’s taken Broadcom and Microsoft working together about 6 years to kind of make something work but not really. Not very impressive.

And SiliconTCP works for Unix and other OS’s as well. You don’t have to use Windows to get the benefit. In a world of “choice”, shouldn’t a chip be OS-agnostic? Or do they think choice is a bad idea?

Don’t know how often I read about a TCP offload mechanism which doesn’t really do the job. If a network stack worked like these chips, the Internet would be a lot more frustrating a place.

But I doubt Cisco cares, even though their engineers do. With China’s massive Huawei on the bottom end and ruthless Broadcom moving in, don’t bet on Cisco to defend their turf. After all, they’re too big to beat, right? And if a company lets things go long enough, figuring that “someone else I like better will invent it”, they take the risk someone they don’t like will do it.

Flamethrowers and Memories

Alex Cannera dropped an interesting paper on my desktop discussing congestion control in grid networks. And it’s results confirm what I and others have seen over the years; Vint Cerf seriously saw in 1998 that hop-by-hop reliability preserving end-to-end semantics in the routers was the real key to handling this issue. Vint also is a renowned wine expert, and treated me and William to a wonderful tour of fine wines at the Rubicon Restaurant in San Francisco where we had a memorable discussion on exactly this issue.

Of course, their terms-of-art are different from mine, since we all seem to invent new terms. So their “network of queues” is my bucket brigade mechanism. And their test demo is similar to one at InterProphet we called FlameThrower devised by Senior Hardware Engineer Todd Lawson and Software Engineer Madan Musuvathi to literally flood the other side with packets and see if it falls over.

Todd and Madan built a wirewrap version of SiliconTCP on a DEC PAM card with a NIC wired on (and that’s exciting with 100MHz logic). We demo’d this to Microsoft, venture, and lots of other companies back in Summer 1998. I have the wirewrap on my wall alongside a production board.

But the solution presented in this paper to “back pressure” the plug by disabling TCP congestion control selectively is where we part company. Herbert and Blanc/Primet quite correctly point out some of the barriers to FAST, Highspeed TCP / Scalable TCP, and XCP, but then fall back on the old link layer solution approach (which we diddle in the stack software). If only it were that simple.

Reliable link layer isn’t enough, and Vint (in looking back) clearly knew this. That’s why he saw SiliconTCP as fitting best here. This was the key reason he joined the board of InterProphet so many years ago.

Many people have made reliable link layers. We’ve done it with the boards we have here right now. But no one else made a reliable network and transport layer that spans many hops, maximizing the capacity of the aggregate network. Our boards also do this. So we did demo Vint’s vision in practice.

It’s only now that people are starting to thrash the problem that Vint saw many years ago. But they lack his insight as to the real nature of the problem. It isn’t turning off congestion control — it’s using it effectively.

So long as engineers think the answer is a simple “stack hack” instead of rethinking how to more effectively meet the protocol demands — not new protocols, not turning off the congestion, not cheating by biasing fairness — but really simply doing our job better, we’ll continue to run into this problem.

Sometimes a Legend

And once again, an interesting item in the postel.org end-to-end group – “An interesting version of TCP was created a few years ago at a large data-storage-system company here — essentially, the TCP receive window was reduced to Go/No-Go, and startup was modified, so the sending Unix box would blast to its mirror at full wire rate from the get go. ACKs would have meaningless Window values, excepting 0, because sender and receiver had similar processing/buffering capability. Loss produced replacement, via repeated ACKs. Being LAN-based system overall made all these mods workable. But clearly, the engineers involved found normal TCP wanting in the ability to deliver data on high-speed links.”

Interesting how legends develop. This project was called the flamethrower demo done with a wirewrap version of SiliconTCP on a DEC PAM card with a NIC wired on (and that’s exciting with 100MHz logic).

We demo’d this to Microsoft, venture, and lots of other companies back in Summer 1998. One Microsoft exec (Peter Ford) noted that we were so overloading the standard NICs that an “etherlock” condition was likely to occur. Etherlock, for those who don’t know, occurs when all of the bandwidth is consumed and nothing else can communicate because there is no idle time effectively. And yes, we saw this occur.

One of the more interesting things we found is that many “standard” NICs were not standard compliant. I still have the wirewrap on my wall alongside a production board.

The Power of TCP is in its Completeness

Interesting line of discussion passed through my email regarding the future of TCP. In particular, Alex Cannara decided to take on a few of the more “conservative” elements on dealing with end-end flows by interior management of links.

As Alex puts it: “Apparently, a great bias has existed against this sort of design, which is actually very successful in other contexts”. Even a very “big old name in Internet Land” liked this type of approach, for the “…reason it [TCP] requires the opposite of backoff is because it doesn’t have the visibility to determine which algorithm to choose differently as it navigates the network at any point in time. But if you can do it hop by hop you can make these rules work in all places and vary the algorithm knowing your working on a deterministic small segment instead of the big wide Internet.”

Let’s take this further.

In math we deal with continuous functions differently than discontinous ones, and TCP algorithms know this – they have different strategies for each approach – but when you get a mixture across the network you’re limited to statistics. If we limit the inhomogeneity, then the end points of TCP can then optimize the remaining result. In this case, the gross aspects limiting the performance no longer dominate the equation.

So you can’t overtransmit or overcommit a link if you’re disciplined – you only fill in the idealized link of the puzzle from the perspective of what you know.

Has the hobgoblin of statistics ruined any ability to do a deterministic job (with metrics and cost value) of improving loss ratios and understanding what is really happening at any point along the way? If so, this would in turn validate / prove a statistical model. But think of all the projects that wouldn’t fly.

At InterProphet we proposed that for every hop we get the best possible effect –basically the same level of end-to-end principle in each segment instead of viewing all hops as one end-to-end segment — by deploying low latency TCP processing as a bucket brigade throughout the infrastructure. Now, the pushback from the manufacturers was cost, but we met all cost constraints with our dataflow design (that works, by the way, and is proven).

The power of this approach is amazing. Instead of simplistically thinking of end-to-end as just two cans and a string, we can apply end-to-end completeness on every segment.

Very few people have understood this — looks like Alex does. And I know Vint Cerf, the Father of the Internet, does. He joined InterProphet’s Board of Directors on the strength of the idea alone. Of course, he’s also a visionary and gentleman in every sense of the word. We should all be so gifted.