Monday, April 8, 2013

Cisco ASA: How To Do Zero Downtime Upgrade On Active/Standby HA ASAs

Ive done a few zero-downtime upgrades before, and this process has worked for me every time.  If you have an Active/Standby configuration, here is what you should do, in this order, to do an upgrade.  Ill assume you have the images already TFTP'ed onto the ASAs (both of them).
1.  Tell the ASA to boot to the new image with the "boot system" command.
2.  Reload the standby unit to boot to the new image with the "failover reload-standby" command.
3.  When the standby unit comes up and is in 'Standby Ready' state, you want to force the active unit to fail over to be the standby unit, and make the standby unit the active unit.  On the current active unit, type in the "no failover active" command.
4.  Reload the now standby unit (which was the first active unit that has not yet been reloaded).  You will do this by SSH'ing into the secondary IP address instead of the main IP address.
5.  When the primary ASA comes back up, type in 'failover active' to get the primary unit to be the active unit again.


  1. Are you talking about minor upgrade or major upgrade?
    If major, how about when both units will have different image versions(after standby reload)?
    At that point of time both will work as stand alone and same IP address will have 2 different mac-addresses (active and standby units mac) and packets will drop.

    1. They wont work as standalone mode. Even though the firmware is different for a small amount of time, they will both still be as active/standby.

  2. I think your misunderstanding zero down time. There will be sessions loss it will be a quick blip.

    1. Im not sure where, in the above post, you get that I dont understand 'zero downtime'. Ive done many of these. Trust me, I know there is a 'blip'. But do you call that 'blip' downtime? No, you dont. You call it a 'blip'. But, if we go by your definition of 'zero downtime', then Cisco is the one you have the beef with. Simply because they are the ones that promote zero downtime, not me.
      But, lets explore that for a minute. If you have a site to site vpn that fails over to the other ASA, then the vpn simply comes back up. The data drop is forgiven and the user probably wont even notice.
      If you have a remote-access user that gets dropped (if his session doesnt come back), then the user will reconnect. Its no big deal.
      If internet traffic gets dropped, again, no big deal. Data is very forgiving and they see a small 'slow spurt'. The user will never think anything else about it once the page comes up.
      The only time this might be an issue is for voice, in which a call would get dropped. However, most companies dont do VoIP out to the Internet (unless they have a SIP provider for voice).

  3. Actually that blip is downtime you can upgrade to any version whatsoever and there will only be a minor blip. When they reference zero downtime they are stating that the sessions will all stay up and the end points will not see a blip. For example lets say you have an rdp session running through the firewall that session would stay up, but if you upgrade to any version you want then you will have to reconnect. Some entities state they can't be don't even less than a second which we all know is impractical to some degree because at some point in time whether because of a bug or ... whatever there will be a blip. I just wanted to point that out because I am researching the maze of codes right now to try to avoid that blip going from 8.2.5(41) to 9.0.2 or whatever version I need to go to. For 9.1.3 there is a path suggested of:
    Below is from a 9.1.3 release note:
    ASA Version First Upgrade to: Then Upgrade to:
    8.2(1) to 8.4(6) to 9.1(2.8) or 9.1(3) or later

    1. Look, Im glad you brought this up. I think Ill learn something here, as I think you will too. I think you need to read these Cisco documents. Im talking about stateless failover (regular). You are talking about stateful failover. We are both right, depending on which one you and I are talking about. I mean stateless, you mean stateful. Maybe Cisco needs to clearly define 'zero downtime'. To me, a blip is not downtime. To you, it is. You are a better man than I am, or maybe I just dont care about the blip. They will get over it in my book. Im cutting and pasting from Cisco's documentation here, below the links. But, Cisco doesnt say "stateless" when they talk about zero downtime (not from the documents I have seen). But, if you find where Cisco says that, please let me know.

      FOR THE ASAs:
      Regular Failover
      When a failover occurs, all active connections are dropped. Clients need to reestablish connections when the new active unit takes over.
      "For VPN failover, VPN end-users should not have to reauthenticate or reconnect the VPN session in the
      event of a failover. However, applications operating over the VPN connection could lose packets during
      the failover process and not recover from the packet loss."

      Stateful Failover
      When stateful failover is enabled, the active unit continually passes per-connection state information to the standby unit. After a failover occurs, the same connection information is available at the new active unit. Supported end-user applications are not required to reconnect to keep the same communication session.
      "When Stateful Failover is enabled, the active unit continually passes per-connection state information to
      the standby unit. After a failover occurs, the same connection information is available at the new active
      unit. Supported end-user applications are not required to reconnect to keep the same communication

      Now with this said, good for you for wanting no blip. I dont fault you for that. I probably should admire that, but frankly as I stated above, I just dont care about the blip. The site to site will reconnect on its own. The remote-access user will reconnect. The Internet user will just not really notice (maybe see a page cannot be found, but the will refresh). Either way you put it, my experience is that people just dont usually complain about a 3 second drop/rebound. They usually carry on without saying a word. Sure, stateless failover is the way to go for not blip, but you have to ask yourself if its worth the CPU cycles it will take to prevent that 'blip'. To me, no, its not. I mean all traffic is going through that Internet facing firewall. I just prefer less headache for the ASA in this case. But, I would like to hear your response to this. :) And look, this is interesting conversation, not an argument.

  4. I think I get what the poster is saying that it is not technically zero downtime. True zero downtime is nexus upgrade where no data is lost during the upgrade. NO as in NONE. In this case there is. Stateless connections will drop and if you bring this to management as zero downtime solution and your user base happens to be a bit more... ummm how shall you say PICKY. You will get dinged. In reading this I don't think you were suggesting doing this unadvertised to users, and without appropriate change window. I think that is all the other user was getting at. If you did this with my user-base they would tar and feather you if they were not warned.

  5. Then you never had to upgrade a pix in a critical environment like a trading environment where 1 second means a lot of money. You would care about the blip ;-) But I agree, most of the time the blib in a maintenance window is not so bad.


Your comment will be reviewed for approval. Thank you for submitting your comments.