Sometimes, what looks like one problem, can actually be another. I got a call from a school system saying that their network was down for one particular school. No Internet. No server access. Nothing.
So when I got there, things seemed to be ok. They could get to the Internet, servers, etc. Everything appeared to be ok. Then it went down again. This problem was very inconsistent. You couldnt even ping anything. But then, after some time, it would be fine.
Well, when I was there, I really couldn't find anything wrong. Except once, when the problem actually happened. It appeared to me, at that moment, like the layer 3 core was acting up. When the problem happened, I couldnt even ping the next hop MPLS router (on the same subnet). This problem showed some really odd symptoms. So, I broke the switch stack (Brocade ICX 6610s) and left the last switch in place to run as the core by itself. The problem seemed to go away. Until the next day, when the problem happened again.
I showed up and guess what. The problem was gone again. So I sit down and just start looking at configs, spanning tree, interface statistics, cpu utilization, etc. What in the world is going on here. Then it happened right in front of me. CPU shot up to 33% (from 1%). Internet was down and network was hosed again. So I ran a packet capture. Nothing on the first 4 VLANs that I looked at. Then I found it. The 5th VLAN I looked at was flooded with broadcasts.
32 percent busy, from 9 sec ago
1 sec avg: 32 percent busy
5 sec avg: 32 percent busy
60 sec avg: 32 percent busy
300 sec avg: 32 percent busy
So, I disconnected the fiber to the closet that VLAN went to, and CPU dropped back down to 1% on the core. Ok, leave that disconnected, and go to that closet.
I got to that closet and plugged in my packet analysis tool (seen above). Again, flooded when I hit the right VLAN, but no other VLAN. So, what do I do? Its a stack of 2 ICX6450s. I start unplugging one port at a time until I find the flooding settles down. It just so happens the second port I unplugged calmed the network down right away. It was a PC that was connected at the other end. It appears the NIC was flaking out. Intermittent broadcast storms. Not a good situation, but with the right tools, I found the problem as it was happening. Packet captures are your friend!
This is the retired Shane Killen personal blog, an IT technical blog about configs and topics related to the Network and Security Engineer working with Cisco, Brocade, Check Point, and Palo Alto and Sonicwall. I hope this blog serves you well. -- May The Lord bless you and keep you. May He shine His face upon you, and bring you peace.
Monday, January 5, 2015
Broadcast Storms And The Havoc (And Headache) They Reek
Subscribe to: Post Comments (Atom)
Couldn't you just configure appropriate port security to protect from broadcast storms like you do on cisco access stacks?ReplyDelete
Just wondering. I have never configured a brocade network (besides vrouter).
Good question. Yes, in Cisco, you can do the "storm-control" command. I have not in the past, but you can do that. In Brocade, yes, you can do the "broadcast limit" command. It serves as the same thing. I logged into my ICX6610 and "question marked" through to see the command:Delete
DECIMAL Multiple of 8192 Kbps for 1G, 65536 Kbps for 10G
Very good question for sure. I appreciate it. Ill look at implementing and testing this out.
I'd be interested to know how your testing on this comes out...I'd like to implement some broadcast limits. Perhaps around the 70% range on the e1000 and e10000 ports.Delete
Thanks and love all the Brocade items on your blog!
I'll test and report back to let you know. It will be in the next few weeks.Delete
Excellent, I look forward to your findings!Delete
What I'd like to do is implement this to eliminate possible storms, but leave the threshold high enough that it's not unnecessarily shutting down the port.
There don't seem to be any good reference points for implementing this feature as far as reasonable settings.
Any progress on testing? :)Delete
I have not had a chance to do this just yet. Will try when I can.Delete
What tool are you using to analyze packets ?ReplyDelete
I used capsa by colasoft. Very handy tool.Delete
Thanks for your sharing. :)ReplyDelete
I'd be interested to know how your testing on this comes out...I'd like to implement some broadcast limits. Perhaps around the 70% range on the e1000 and e10000 ports?ReplyDelete
Still have not had a chance to do this, but I know is important. Will try to do this soon.Delete