What’s the bug?

Upon applying patches to an ESX 7.0.1 Build 16850804 host, the host fails to boot into ESX upon restart of the host.  

My VMware Setup

Server – Cisco UCS C220 M5SX Server
Firmware – 4.1(2b)
Boot Device – Cisco FlexFlash w/ 2 SD Cards
Networking – UCS VIC 1457
Image Profile Cisco-UCS-Custom-ESXi-70U1-16850804_4.1.2-b
VSAN for Storage 

What Broke It?

I should have patched my hosts months ago, but it’s been a crazy 6 months, and I’m a bit behind on patches.  After double checking the VMware Interoperability Matrix, I updated vCenter to the latest version.  That in itself had some issues as it decided to force a failover of all my hosts which put all my VMs in a bind with resource contention.  After working through that issue, I decided to start my ESX updates. I have a baseline for all patches that dynamically updates as new patches come out.  DRS is enabled on my cluster, so I ran through the normal process of remediating my cluster based on my All Patches baseline.  After the first host rebooted, I noticed after an hour it hadn’t come back up.  I checked the KVM on the Cisco CIMC and found the above error message.  I couldn’t find any specific info on why this had occurred (see update at end of the article), so I took a chance and tried a few things to get my host back up and running.  My fix for this is below.

Repairing the Downed Host

The easiest way to get this host back up and running was to boot the host off the ESX install ISO from the previous working version.  I grabbed it form the VMware download site and booted the host off the ISO.  The installer found my existing version of VMware and asked if I wanted to upgrade.  

This seemed like the least impactful option, so I picked it.  It scanned again and asked me to confirm, so I hit F11 and crossed my fingers.  It ran through the upgrade, didn’t ask for any input, and then wanted to reboot. 

After the reboot, the host came back up, synced with VSAN and showed up in vCenter with the original version of ESX on it.  I was back in business and it was now time to figure out how to fix this so I could update my hosts.  I won’t detail my troubleshooting, but you can find the fix below that will get you updated to the latest version of ESX if you encountered these issues.

The Fix

The fix involves applying patches from new baselines that don’t have the patched for 7.0.2 in them.  I created some new baselines below, to avoid those patches.  I applied them in phases just to make sure we didn’t have any issues.

Create a new baseline called “Temp-Crit” like the picture below and attach it to your cluster of hosts.  Check your cluster for compliance.

Apply this baseline to one host in your environment.  DRS should move your VMs off this host, patch it, and reboot it.  Watch it from your KVM console and make sure it actually boots up this time.

If this was successful, your host will reboot and show a new build number.  At this time, mine came back up with Build 17551050.  The previous build was 16850804.

You will repeat the same process above, but with another new baseline.

Create a new baseline called “Temp-Security” like the picture below and attach it to your cluster of hosts.  Check your cluster for compliance.

Apply this baseline to the host in your environment you have been working on.  DRS should move your VMs off this host, patch it, and reboot it.  Watch it from your KVM console and make sure it actually boots up this time.

Once this is done, repeat this process for your other hosts.  You should be fully patched with all the VMware updates.

Update: 3/12/21
We still are not able to apply the All Patches baseline because there is metadata left in LifeCycle Manager regrinding the old patch that has a bug.  VMware removed it from the repo, but you’re vCenter already knows about it at this point.  VMware is working on a fix for it.  See the following KB articles for more info.

Failed to load crypto64.efi Fatal error: 15 (Not found) after patching ESXi host to 7.0 Update 2 (83063)

“Cannot download VIB” error upgrading to ESXi 7.0 Update 2 using vLCM (83107)