Monday, July 21, 2008

Adventures in firmware development: You learn something new everyday

Ethernet and TCP/IP, the core technologies for the internet, have been around long enough that typically you plug things in and they "just work." So I have been pretty puzzled the last couple weeks trying to figure out why this little Ethernet development kit I'm working with had a 30 second delay on initial start up. Internally, I could see the controller thought everything was up and running, it just didn't see any packets even though the link and data lights would light up.

I contacted technical support for the board and they couldn't reproduce the problem. Hmm. I hate those kinds of issues. After a number of messages back and forth, we found that I was going through the corporate router and they used direct connections to their test PCs. I changed my configuration to talk directly to my PC and the problem went away. Well, this was a step in the right direction.

So, the next step was to talk to our network admin to see if he had any ideas. He immediately remembered a configuration parameter called PortFast on our switches that by default was disabled.

PortFast is an option on Cisco switches to turn STP, short for Spanning Tree Protocol, on and off. I'd never heard of it before but this is what I learned: when you have multiple switches, routers and hubs on a network, it is possible to create a physical loop. Without STP, a physical loop can cause each router to forward packets to the other, ad infinitum, flooding a network with an increasing number of identical packets leading to a non-responsive network. It's basically a DOS attack on yourself. STP is something implemented by switches to detect physical loops so they can filter problematic packets, eliminating the cascades that can take down a network.

STP has five steps it goes through, two of which take approximately 15 seconds each. This is where my delay came from. When you have something plugged into a switch that you know cannot create a physical loop, you can enable PortFast. This bypasses these two steps, changing the start up time from around 30 seconds to almost instant.

The network admin hasn't had a chance to change this setting, but in a number of tests, I'm almost certain this is the problem I was having. I can reliably set things up to keep the router from knowing the board has reset. When I do this, it restarts in a few seconds; the expected behavior. In the end, the problem had nothing to do with the new board and everything to do with the existing router. It happens all the time with my PC too, it's just that the computer takes long enough to boot that the network has finished its protocol negotiation by the time the computer is ready to go.

For more detailed information:

No comments: