The Minutia of Getting a Flash Video to Play Right Every Tme
OK - you've got it all together. The video is ready to download and play, it's tested, we've watched it, the flash works (or Quicktime or whatever vintage you prefer). We watch customers watch it over and over. Things are going great. Then, someone somewhere tries to download it over the web, and it fails. The refresh button is hit over and over, it continues to fail, and that disappointed person just gives up. Why didn't it play?
Looking over the logs today provides a window into just how difficult it is to provide 24/7 perfect video streaming to any type of computer anywhere. These problems vex the biggest and smallest vendor because they are based on architectural flaws so fundamental that these occasional failures are impossible to guard against.
So why did this one customer not get the same video experience everyone else did? Well, it was really just the luck of the Internet router draw.
It first failed because the session created from client to server did not have enough persisistence (yet) to ensure that subsequent content transfer could occur before the flash application timed out.
One common reason is that a distribution router in the server's ISP installed an entry for that particular packet flow that was deleted almost immediately after creation because of too much transient load - too many streams at that point in time. The allocator handed out an entry for the session on the inital syn, but that entry only lived for a packet or two before it was deallocated.
Subsequent retransmission ran into a different problem - no entries could be created and the client timed out with a partial content transfer which was not enough to play.
What happened on the client's refresh when it didn't play? The client's refresh refused to check the entry and kept using the local cached short file insufficient for playback. After hitting refresh 9 times, the viewer gave up and watched another flash movie from the same site using the same player - and this time it played perfectly! It worked because by this point in time enough activity was associated with the routing entry so it could not be easily purged.
The most significant flaw was the browser stale cache. The second flaw was not providing the initial route entry with enough lifetime for the TCP flow control mechanism to do its job.
The moral of the story - with the built-in fragility of transient conditions coupled with poor design and the need to transfer larger files like video - "You can't please all of the people all of the time".