A ROSE is a ROSE - Reordering Segment Engine
Ashlee Vance wrote that Intel will be introducting "I/O Acceleration Technology" to "attack greedy TCP/IP stack" consumption - in other words, latency through the stack. "Customers often find that their servers spend an inordinate amount of time dealing with network traffic when they should be hammering away on application data." This sounds very familiar - we told them years ago that "all processors wait at the same speed".
Back in 1997, when I filed a provisional patent on just such an approach, I had an interesting meeting with Intel's processor side. We called the technique ROSE then, for Reordering Segment Engine for a product we envisioned called the Network Accelerator - and yes, this was before Adaptec and Alacritech and all those other TOE guys. It was the first in a series of parallel processing refinements, which dealt with the layer 2-7 issues of TCP/IP (the discussion was under NDA).
The ultimate low-cost solution was to build it into the southbridge. For high-end apps (going at a faster rate then the bus), you'd put it in the processor / memory interface itself. We even had a hardware socket interface. But in the meantime, as a first step, we could build Network Accelerator cards, just as the graphics accelerator cards changed the face of the graphics industry.
That same year, Dr. Vint Cerf (co-creator of TCP/IP) joined the Board of Directors of InterProphet, a funded company based in Silicon Valley which completed the patent filing based on a working prototype of the Network Accelator (see TCP/IP network accelerator system and method which identifies classes of packet traffic for predictable protocols, patent granted 2001). He joined because we'd solved the Internet TCP bottleneck problem that everyone said "couldn't be solved". Interestingly enough, companies like Alacritech formed *after* our company was formed, and reference us in their early patents as well. However, their TOE designs are not as efficient (high-cost) nor scalable.
We received the first patents in this area. Unlike other attempts to simply turn the stack code into Verilog (like Iready and others), we did a completely novel state machine implementation. It is an entirely scalable dedicated stream-oriented protocol processing mechanism. Everyone admitted we'd done it. We even on the $1M dollars funded not only did the prototype but also did a product and demo'd it to every major player in Silicon Valley (1998-1999).
There have been more patents since based on this technology, as well as papers and other work. At that time, stack latency wasn't considered as big a deal - big enterprise datacenter solutions with high-cost staff and technicians and big servers was still the norm. A low-cost, low-power chip wasn't considered important, as margins were so high and "would always be there".
Then came the dot-com bust, outsourcing, and the implosion of the enterprise market.
But Dr. Cerf even then spoke of the time when we'd see millions of Internet gadgets on the Internet, and doing older TOE or processor-intensive enterprise solutions would not be viable anymore. After all, how would we be able to power / control / manage 100 million transistor processors, each one embedded in a low-power sensor net? Or in your dress shirt for that matter?