Why are the LAN file transfers slower than the WAN file transfer?

Customer Scenario: At a local hospital, the X-ray technicians were complaining that sending an image from a CT scanner (a medical device that creates images) to a station locally took over 30 minutes, yet sending that same image file to a station across town took only 2 minutes. Vendors were pointing fingers and doctors were screaming. The hospital’s Network Manager had tried troubleshooting the problem with all of the normal tools – but with no success.

As a network consulting specialist I was called. I brought two of my laptops; one with a traditional protocol analyzer loaded and one with AppDancer/FA. I decided to use both and compare.

First I monitored the WAN client, and it took about 2 minutes for the file transfer. There were some retransmissions, but nothing serious worth looking into.

Then I monitored the same file transfer on the LAN. I found over 25% of the frames were retransmitted! So I had confirmed that the LAN was slow and could see that it was slow due to the large amounts of retransmissions. But why was it retransmitting? This was the next puzzle to solve.

I first ruled out a physical issue. Checking the snmp stats on the switches and routers revealed only a very small number of bad CRCs. I had to look deeper into the TCP layer.

Using my traditional protocol analyzer, I was only told which frames had been retransmitted and the frame number of the retransmission. AppDancer on the other hand, visually graphed out the flow of the network conversation in the “Flow Statistics” screen. I could easily see that not only frames that had not been acknowledged were retransmitted (just as they should), but that frames that had been acknowledged were also being retransmitted. AppDancer also tells me the time it took for each retransmission to occur.

The standard analyzer had that information in its detail, but it requires that I map it out myself – not exactly how I wanted to spend the afternoon. Once AppDancer made it so obvious that half of the retransmissions were unnecessary, I went back to the TCP handshake. The local client was using SACKS (Selective Acknowledgements) and the WAN client was not. While the medical scanner responded that it supported SACKS, it was in truth re-transmitting frames that had been acknowledged. These re-transmissions were causing buffer issues on the client, which caused the client to request the data again – creating a vicious cycle.

AppDancer laid out that flow for me clearly so I could quickly understand. My regular protocol analyzer requires that I add up sequence numbers and data lengths in Excel to determine the good acknowledgements and the missing acknowledgements (a slow process). Knowing that acknowledged frames had been retransmitted is what took me back to the handshake, and triggered the user interview.

Conclusion: the local client was using the latest version of the product’s networking code. I guess SACKS was supposed to be an improvement – but it was not working correctly. The hospital’s Network Manager rolled back to the previous version and performance increased dramatically.

Speaking as a long time traditional analyzer user, I am now an AppDancer convert.