Quick Fix – Found zero ssd devices for SSD cache tier

When i added a host to an existing vSAN cluster via the SDDC manager the task failed with the following error: “Found zero ssd devices for SSD cache tier”

To quickly fix this we need to set the cache disk on the ESXi host to SSD, you can check the current value with the vdq -q command. As you can see in the picture below the disk I want to use for the cache is marked with a value of “0”, so it is not recognized as SSD drive.

In the past you had to set the the disk to SSD with SATP claim rules, but from version 7.x and 8.x there is a new and simpler command to do this. Run the following ESXCLI command and use the storage device ID and the -M option with value of true (or false to revert the change) to mark the device as an SSD.

esxcli storage hpp device set -d naa.6000c299027de72c68de829e23455e88 -M true

Aria Ops for Networks deployment fails from Aria Suite LifeCycle Manager.

In my lab I tried to deploy Aria Operations for Networks 6.12.1 (AON/vRNI) from Aria Suite LifeCycle 8.16 (ASLC/vRLCM). Before the deployment of AON I successfully deployed other products:

vIDM 3.3.7
Aria Operations for Logs 8.16 (vRLI)
Aria Operations 8.17.1 (vRops)

Attempted to deploy Aria Ops for Networks 6.12.1 but it failed with LCMVSPHERECONFIG1000016 error. 

-----------------------------------------------------------------------------------------------------------
java.io.IOException: com.vmware.vim.binding.vmodl.fault.SystemError
	at com.vmware.vrealize.lcm.drivers.vsphere65.vlsi.utils.ExceptionMappingUtils.mapAndThrowImportVAppExceptions(ExceptionMappingUtils.java:78)
	at com.vmware.vrealize.lcm.drivers.vsphere65.deploy.impl.BaseOvfDeploy.importOvf(BaseOvfDeploy.java:713)
	at com.vmware.vrealize.lcm.plugin.core.vsphere.tasks.DeployOvfTask.execute(DeployOvfTask.java:251)
	at com.vmware.vrealize.lcm.automata.core.TaskThread.run(TaskThread.java:62)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)
-----------------------------------------------------------------------------------------------------------

Error Code: LCMVSPHERECONFIG1000016
IO Exception occurred while performing the operation. Check the logs for more information.
Unexpected ioexception occurred.

——————————————————————————————————–

After some scavenging around, I ran into the known issues in the Release Notes from version 8.14

So let’s check which user is being used between the ASLC and vCenter:

Now login the vCenter and check if the privileges assigned to this account match the 2 from the release notes:

As you can see above the privileges are not configured on this role, let’s configure the 2 privileges:

Now we need to resync the account between vCenter and ASLC.

After this I Hit the retry button on the request in ASLC and the ovf is being deployed in vCenter.

VCF 5.0 – Remove vSAN datastore after VCF deployment failed

After a failed deployment of VCF 5.0, i was left with a vSAN Datastore on the first host in the cluster, and this was blocking a retry of the deployment.

In this state the vsanDatastore is unable to be deleted. If I try to delete it, the option is greyed out.

To delete the datastore and the partitions of the disks, we first need to SSH into the host and get the cluster.

We need the Sub-Cluster Master UUID, copy it to the clipboard. To leave the cluster the command is:

esxcli vsan cluster leave -u <Sub-Cluster Master UUID>

The ESXi UI now shows that the datastore is gone.

Let’s check the cli, if there still is a vSAN cluster or storage:

esxcli vsan cluster list
esxcli vsan storage list

No vSAN clusters or storage is configured on the host, after this the retry was successfully and the VCF management cluster was deployed.

VCF 5.0 – Error connecting to ESXi Host SSL certificate common name doesn’t match ESXi FQDN.

During a new lab deployment of VCF 5.0 i ran in to an small issue running the validation.

I deployed the hosts up front and made them available and unique before the validation. ran the following command to regenerate the certs and restart the services:

/sbin/generate-certificates
/etc/init.d/hostd restart && /etc/init.d/vpxa restart

Next i wanted to see what Common Name (CN) was on the certificate:

As you can see the Common Name (CN) contains only the hostname.

Next i changed the hostname on the ESXi hosts to have the complete FQDN.

Checked the hostname on the cli, and regenerated the certificates again:

After the regeneration of the certificate on the Host, you have to restart the services:

/etc/init.d/hostd restart && /etc/init.d/vpxa restart

Check the Certificate in the browser and now the Common Name had the FQDN in it, and the validation finished successfully.


VCF 5.0 – Retry Bring up fails on Generate input for SSH Keys

After a Failed VCF bring-up, I wanted to retry the bring-up. Luckily the error I encountered before was resolved but again I ran in to an issue during the retry.

Now the issue was with the import of the SSH Keys

Going through some internal resources i stumbled upon the solution, since this is a nested lab environment on top of VCD you have to reset the MAC address of the ESXi Host.

VCF 5.0 LAB Deployment Issue – Failed to migrate vmnics of ESXi host to Distributed vSwitch

During my latest deployment of VCF in my lab environment I ran in to the following issue.

Failed to migrate vmnics of host 192.168.11.12 to DVS sfo-m01-cl01-vds01 . Reason: Failed to migrate vmknic vmk0 to DvSwitch 50 22 42 8c d5 a1 d4 8f-6d 9e 8a 1e 93 ac 5b 9d Failed to migrate vmnics of host 192.168.11.12 to DVS sfo-m01-cl01-vds01 . Reason: Failed to migrate vmknic vmk0 to DvSwitch 50 22 42 8c d5 a1 d4 8f-6d 9e 8a 1e 93 ac 5b 9d Failed to migrate vmnics of host 192.168.11.12 to DVS sfo-m01-cl01-vds01 . Reason: Failed to migrate vmknic vmk0 to DvSwitch 50 22 42 8c d5 a1 d4 8f-6d 9e 8a 1e 93 ac 5b 9d Failed to migrate vmnics of host 192.168.11.12 to DVS sfo-m01-cl01-vds01 . Reason: Failed to migrate vmknic vmk0 to DvSwitch 50 22 42 8c d5 a1 d4 8f-6d 9e 8a 1e 93 ac 5b 9d

The error is pretty clear, the migration of vmk0 from the standard vSwitch to the Distributed vSwitch failed on esx02. I checked esx01 and on this host the migration was successfull.

I tried to manually migrating the vmk0 to the distributed vSwitch also ran in to an error in vCenter.
Right-click dvSwitch -> Add and Manage Hosts -> Manage Host Networking -> Select esx02

Click Next and leave the physical adapters as is, click next again.
On the next screen click on “Assign Port Group” next to vmk0.

Click on ASSIGN next to the management portgroup

Next, Next, Finish…..Task is running and fails after a few seconds.

Checking the Task Details on the ESX host:

After some investigation and searching internally within VMware resources and also ran into this blog article: https://mhvmw.wordpress.com/2023/03/17/issue-with-nested-vcf-4-5-deployment-lab-only/

it is a MAC address conflict when the esxi takes the mac of the physical nic for vmk0. 
By deleting and recreating the vmk0 interface you generate a new MAC address for vmk0.

Steps to check, delete and recreate vmk0 interface

Login via DCUI

Enable ESXi Shell

Next, Click ALT+F1 to access ESXi console and login as root.

Type the command:
esxcli network ip interface list

Make a note of the portgroup, in this case “Management network” and then remove the vmk0 with the following command:
esxcli network ip interface remove –-interface-name=vmk0

When vmk0 is deleted, we can immediately create a new interface with the same name and portgroup. This is done by the following command:
esxcli network ip interface add -–interface-name=vmk0 -p “Management Network”

To check if vmk0 is created again type the command:
esxcli network ip interface list

Click ALT+F2 to access ESXi DCUI and login to disable the ESXi shell.
Now we can configure the IP settings again via the DCUI

Go to Configure Management Settings -> IPv4 Configuration and set the static IPv4 configuration

Hit Enter then Esc and Yes to restart the management network

Now we can try to redeploy via cloudbuilder, after this the deployment went on succesfully