Video@Scale: The New Demands are the Old Challenges – Power and Packet Drops

The challenge of creating seamless video experiences on demand has been a long-sought and long-fought dream. FaceBook video@scale brings specialists together to wrestle with the complexity of end-to-end technical tricks and user level satisfaction often at odds.

The morning was a blitz of corner cases and tightly wound insights.  Minutia of transmission of video and complexity, issues of detection of dropped frames in various browser decode, up/down scaling of video quality on-the-fly, issues in CODEC switching, video stream sizing, I-frame synchronization between different video codecs, which codec to use, network versus browser issues (often appear the same), and getting around browser video correction.

But the two items I am going to focus on are the old hard chestnuts: power and packet drops.

One key issue not commonly addressed by apps developers and video streamers is the fundamental trade-offs between download speed at the expense of mobile power. Unlike a plugged in device, mobile devices are extremely sensitive to power consumption.

Low power consumption is achieved by using fewer gates at any time. Yet mobile devices contain processors with 1B+ transistors and climbing. As a consequence, mobile device designers have created many tricks to reduce power consumption by essentially turning off elements of the die (the chip) when not needed. However, once an always-on application is involved, those power savings tricks are short-circuited, resulting in a peak load situation that burns the battery fast. Streaming downloads utilize massive packet processing which cannot be switched off, nor can be easily obviated.

Packet drops and retransmission are the end-to-end nightmare of video streaming companies. Twitch has attempted to harness their “long tail” of channels by owning the datacenter servers and SLAs with Internet providers, and has claimed 3 second “glass to glass” latency – a claim that is difficult to substantiate given “well, it’s your connection, or your browser or your ISP” for anyone outside a test bed environment. Yueshi Shen, principle software engineer at Twitch, conceded that eventually custom FPGA may be the way to go.

Olga Hall in her discussion of “Game Days” at Amazon – aka let’s blow something out of the water and see how good we are at recovering – produced a chart of different recovery times for different attacks. What was the worst case? Packet drops.
InterProphet logo

At InterProphet, we used to say that “2% packet drop results in 50% of packet congestion”. While our focus was on network switches and routers, the end-to-end nature of packet drop and recovery was a mission critical consideration. The construction, transmission, and decryption of packet communications is a processor-heavy mechanism. The view that the Internet is two cans and a string where latency is speed of fiber has long since given way to the plethora of routing and repackaging schemes to bulldoze it’s way through a crowded maze, anywhere of which it may end up dead-ended.

This weight of protocol processing has led people to 1) write new protocols which dispense with pesky correctness calculations, exporting the problems to some other part of the software or hardware, 2) create predictive algorithms to anticipate viewing habits and upload content covertly, 3) change the boundaries between application, emulator and device operating system to reduce checks and balances between these elements, and 4) increasing the size of the video packet encapsulation so you get a loaf of bread instead of a crumb – if the loaf wasn’t dropped in the garbage can on the way.

As a side-effect of all these techniques, the problem moves around, and the processor now has multiple demands which cannot be easily determined to be ignored. Processor load peaks, battery life diminishes, and people get upset when their iphone goes from 80% to 20% while watching a video, running Waze, playing a game, etc. Communications between streaming vendor and device become confabulated.

InterProphet SiliconTCP early network accelerator card for servers, 1999.

InterProphet dealt with processor stack offload. In less than 40,000 gates, we demonstrated the ability to process packets at peak load with only 2% of the CPU. In a mobile device, we found that we could achieve “megabits with microwatts” on traditional FPGA hardware. It is possible to reduce complexity. It just takes a little ingenuity. And courage.