Monday, January 5, 2015

Broadcast Storms And The Havoc (And Headache) They Reek

Sometimes, what looks like one problem, can actually be another.  I got a call from a school system saying that their network was down for one particular school.  No Internet.  No server access.  Nothing.
So when I got there, things seemed to be ok.  They could get to the Internet, servers, etc.  Everything appeared to be ok.  Then it went down again.  This problem was very inconsistent.  You couldnt even ping anything.  But then, after some time, it would be fine.
Well, when I was there, I really couldn't find anything wrong.  Except once, when the problem actually happened.  It appeared to me, at that moment, like the layer 3 core was acting up.  When the problem happened, I couldnt even ping the next hop MPLS router (on the same subnet).  This problem showed some really odd symptoms.  So, I broke the switch stack (Brocade ICX 6610s) and left the last switch in place to run as the core by itself.  The problem seemed to go away.  Until the next day, when the problem happened again.
I showed up and guess what.  The problem was gone again.  So I sit down and just start looking at configs, spanning tree, interface statistics, cpu utilization, etc.  What in the world is going on here.  Then it happened right in front of me.  CPU shot up to 33% (from 1%).  Internet was down and network was hosed again.  So I ran a packet capture.  Nothing on the first 4 VLANs that I looked at.  Then I found it.  The 5th VLAN I looked at was flooded with broadcasts.
6610#sho cpu-utilization
32 percent busy, from 9 sec ago
1   sec avg: 32 percent busy
5   sec avg: 32 percent busy
60  sec avg: 32 percent busy
300 sec avg: 32 percent busy

So, I disconnected the fiber to the closet that VLAN went to, and CPU dropped back down to 1% on the core.  Ok, leave that disconnected, and go to that closet.
I got to that closet and plugged in my packet analysis tool (seen above).  Again, flooded when I hit the right VLAN, but no other VLAN.  So, what do I do?  Its a stack of 2 ICX6450s.  I start unplugging one port at a time until I find the flooding settles down.  It just so happens the second port I unplugged calmed the network down right away.  It was a PC that was connected at the other end.  It appears the NIC was flaking out.  Intermittent broadcast storms.  Not a good situation, but with the right tools, I found the problem as it was happening.  Packet captures are your friend!

12 comments:

  1. Couldn't you just configure appropriate port security to protect from broadcast storms like you do on cisco access stacks?
    Just wondering. I have never configured a brocade network (besides vrouter).

    ReplyDelete
    Replies
    1. Good question. Yes, in Cisco, you can do the "storm-control" command. I have not in the past, but you can do that. In Brocade, yes, you can do the "broadcast limit" command. It serves as the same thing. I logged into my ICX6610 and "question marked" through to see the command:
      Core(config-if-e1000-1/1/4)#broadcast limit
      DECIMAL Multiple of 8192 Kbps for 1G, 65536 Kbps for 10G

      Very good question for sure. I appreciate it. Ill look at implementing and testing this out.

      Delete
    2. I'd be interested to know how your testing on this comes out...I'd like to implement some broadcast limits. Perhaps around the 70% range on the e1000 and e10000 ports.

      Thanks and love all the Brocade items on your blog!

      Delete
    3. I'll test and report back to let you know. It will be in the next few weeks.

      Delete
    4. Excellent, I look forward to your findings!

      What I'd like to do is implement this to eliminate possible storms, but leave the threshold high enough that it's not unnecessarily shutting down the port.

      There don't seem to be any good reference points for implementing this feature as far as reasonable settings.

      Thanks Shane!

      Delete
    5. Any progress on testing? :)

      Delete
    6. I have not had a chance to do this just yet. Will try when I can.

      Delete
  2. What tool are you using to analyze packets ?

    ReplyDelete
    Replies
    1. I used capsa by colasoft. Very handy tool.

      Delete
  3. I'd be interested to know how your testing on this comes out...I'd like to implement some broadcast limits. Perhaps around the 70% range on the e1000 and e10000 ports?

    Help?

    ReplyDelete
    Replies
    1. Still have not had a chance to do this, but I know is important. Will try to do this soon.

      Delete

Your comment will be reviewed for approval. Thank you for submitting your comments.