VCF -

vSAN HCL DB Out of Date – Offline Update

Nov 13, 2024r.bakkerVCF, vSANtroubleshooting, upgrade, vcf, vSAN

During the upgrade pre-check test of the Workload domain I got the following error: vSAN HCL DB Out of Date.

Because my SDDC manager has no direct internet connection I needed to get the file on my jump host. This can be done by browsing to https://partnerweb.vmware.com/service/vsan/all.json and copy the entire content and create a new file with extension”.json”

This can also be done via the bundle-transfer util, and you also should be able to transfer the file to the SDDC manager with this util. How to get the tool and install it can be found on https://www.aaronrombaut.com/real-world-use-of-vmware-bundle-transfer-utility/

If you have the bundle-transfer util installed we can get the vSAN HCL file with the following command:

lcm-bundle-transfer-util.bat --vsanHclDownload --outputDirectory F:\

After I got the all.json file on my jump-host, I need to update it to the SDDC manager:

lcm-bundle-transfer-util.bat --vsanHclUpload --inputDirectory "F:\vsan\hcl\all.json" --sddcMgrFqdn sddc-l-01a.corp.local --sddcMgrUser vcf

But as you can see below this command gave me an error:

Fails due to not generating token, let’s see is we can upload the file by API, let’s use Postman for this.

Get Bearer token for the authentication with the SDDC manager:

Select POST as action
Fill in the api url: https://sddc-l-01a.corp.local/v1/tokens
Under the Authorization tab select No Auth

Under Headers create a new Key Content-Type with Value application/json

Under Body select raw and JSON, in the field, put the sso admin and password for the SDDC Manager

{ "username" : "administrator@vsphere.local", "password" : "XXXXXXX" }

Now hit the send button and wait for the output:

The bearer token is needed in the command to update the HCL DB on the SDDC manager:

Upload all.json to SDDC Manager via Postman:

Select PUT as Action
Use the following API URL: https://<sddc-fqdn>/v1/vsan-hcl/content
Under Authorization select Bearer Token and Paste the Token from the previous step in the field.

Under Headers add a new Key Content-Type with Value text/plain

Under Body choose raw and JSON
In the field copy the content of the all.json file
Hit Send.

There is no output only a Status 200 OK

Let’s check if this particular pre-check error now is solved.

VICTORY!!

Quick Fix – Found zero ssd devices for SSD cache tier

Jul 26, 2024r.bakkerVCF, vSANCloud, lab, SDDC, troubleshooting, vcf, vSAN

When i added a host to an existing vSAN cluster via the SDDC manager the task failed with the following error: “Found zero ssd devices for SSD cache tier”

To quickly fix this we need to set the cache disk on the ESXi host to SSD, you can check the current value with the vdq -q command. As you can see in the picture below the disk I want to use for the cache is marked with a value of “0”, so it is not recognized as SSD drive.

In the past you had to set the the disk to SSD with SATP claim rules, but from version 7.x and 8.x there is a new and simpler command to do this. Run the following ESXCLI command and use the storage device ID and the -M option with value of true (or false to revert the change) to mark the device as an SSD.

esxcli storage hpp device set -d naa.6000c299027de72c68de829e23455e88 -M true

Aria Ops for Networks deployment fails from Aria Suite LifeCycle Manager.

Jul 11, 2024r.bakkerNSX-T, VCFaon, aria operations for networks, aria suite lifecycle, lab, SDN, troubleshooting, vcf, vrlcm, vrni

In my lab I tried to deploy Aria Operations for Networks 6.12.1 (AON/vRNI) from Aria Suite LifeCycle 8.16 (ASLC/vRLCM). Before the deployment of AON I successfully deployed other products:

vIDM 3.3.7
Aria Operations for Logs 8.16 (vRLI)
Aria Operations 8.17.1 (vRops)

Attempted to deploy Aria Ops for Networks 6.12.1 but it failed with LCMVSPHERECONFIG1000016 error.

-----------------------------------------------------------------------------------------------------------
java.io.IOException: com.vmware.vim.binding.vmodl.fault.SystemError
	at com.vmware.vrealize.lcm.drivers.vsphere65.vlsi.utils.ExceptionMappingUtils.mapAndThrowImportVAppExceptions(ExceptionMappingUtils.java:78)
	at com.vmware.vrealize.lcm.drivers.vsphere65.deploy.impl.BaseOvfDeploy.importOvf(BaseOvfDeploy.java:713)
	at com.vmware.vrealize.lcm.plugin.core.vsphere.tasks.DeployOvfTask.execute(DeployOvfTask.java:251)
	at com.vmware.vrealize.lcm.automata.core.TaskThread.run(TaskThread.java:62)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)
-----------------------------------------------------------------------------------------------------------

Error Code: LCMVSPHERECONFIG1000016
IO Exception occurred while performing the operation. Check the logs for more information. Unexpected ioexception occurred.

——————————————————————————————————–

After some scavenging around, I ran into the known issues in the Release Notes from version 8.14

So let’s check which user is being used between the ASLC and vCenter:

Now login the vCenter and check if the privileges assigned to this account match the 2 from the release notes:

As you can see above the privileges are not configured on this role, let’s configure the 2 privileges:

Now we need to resync the account between vCenter and ASLC.

After this I Hit the retry button on the request in ASLC and the ovf is being deployed in vCenter.

VCF 5.0 – Remove vSAN datastore after VCF deployment failed

Jul 11, 2024r.bakkerVCFCloud, cloudbuilder, lab, troubleshooting, vcf, vSAN

After a failed deployment of VCF 5.0, i was left with a vSAN Datastore on the first host in the cluster, and this was blocking a retry of the deployment.

In this state the vsanDatastore is unable to be deleted. If I try to delete it, the option is greyed out.

To delete the datastore and the partitions of the disks, we first need to SSH into the host and get the cluster.

We need the Sub-Cluster Master UUID, copy it to the clipboard. To leave the cluster the command is:

esxcli vsan cluster leave -u <Sub-Cluster Master UUID>

The ESXi UI now shows that the datastore is gone.

Let’s check the cli, if there still is a vSAN cluster or storage:

esxcli vsan cluster list

esxcli vsan storage list

No vSAN clusters or storage is configured on the host, after this the retry was successfully and the VCF management cluster was deployed.

VCF 5.0 – Error connecting to ESXi Host SSL certificate common name doesn’t match ESXi FQDN.

Jul 11, 2024r.bakkerVCFCloud, cloudbuilder, troubleshooting, vcf

During a new lab deployment of VCF 5.0 i ran in to an small issue running the validation.

I deployed the hosts up front and made them available and unique before the validation. ran the following command to regenerate the certs and restart the services:

/sbin/generate-certificates
/etc/init.d/hostd restart && /etc/init.d/vpxa restart

Next i wanted to see what Common Name (CN) was on the certificate:

As you can see the Common Name (CN) contains only the hostname.

Next i changed the hostname on the ESXi hosts to have the complete FQDN.

Checked the hostname on the cli, and regenerated the certificates again:

After the regeneration of the certificate on the Host, you have to restart the services:

/etc/init.d/hostd restart && /etc/init.d/vpxa restart

Check the Certificate in the browser and now the Common Name had the FQDN in it, and the validation finished successfully.

VCF 5.0 – Retry Bring up fails on Generate input for SSH Keys

Jan 11, 2024r.bakkerVCFcloudbuilder, lab, troubleshooting, vcf, vcloud

After a Failed VCF bring-up, I wanted to retry the bring-up. Luckily the error I encountered before was resolved but again I ran in to an issue during the retry.

Now the issue was with the import of the SSH Keys

Going through some internal resources i stumbled upon the solution, since this is a nested lab environment on top of VCD you have to reset the MAC address of the ESXi Host.

VCF 5.0 LAB Deployment Issue – Failed to migrate vmnics of ESXi host to Distributed vSwitch

Jan 10, 2024r.bakkerVCFCloud, cloud foundation, cloudbuilder, lab, troubleshooting, vcf, VMware

During my latest deployment of VCF in my lab environment I ran in to the following issue.

Failed to migrate vmnics of host 192.168.11.12 to DVS sfo-m01-cl01-vds01 . Reason: Failed to migrate vmknic vmk0 to DvSwitch 50 22 42 8c d5 a1 d4 8f-6d 9e 8a 1e 93 ac 5b 9d Failed to migrate vmnics of host 192.168.11.12 to DVS sfo-m01-cl01-vds01 . Reason: Failed to migrate vmknic vmk0 to DvSwitch 50 22 42 8c d5 a1 d4 8f-6d 9e 8a 1e 93 ac 5b 9d Failed to migrate vmnics of host 192.168.11.12 to DVS sfo-m01-cl01-vds01 . Reason: Failed to migrate vmknic vmk0 to DvSwitch 50 22 42 8c d5 a1 d4 8f-6d 9e 8a 1e 93 ac 5b 9d Failed to migrate vmnics of host 192.168.11.12 to DVS sfo-m01-cl01-vds01 . Reason: Failed to migrate vmknic vmk0 to DvSwitch 50 22 42 8c d5 a1 d4 8f-6d 9e 8a 1e 93 ac 5b 9d

The error is pretty clear, the migration of vmk0 from the standard vSwitch to the Distributed vSwitch failed on esx02. I checked esx01 and on this host the migration was successfull.

I tried to manually migrating the vmk0 to the distributed vSwitch also ran in to an error in vCenter.
Right-click dvSwitch -> Add and Manage Hosts -> Manage Host Networking -> Select esx02

Click Next and leave the physical adapters as is, click next again.
On the next screen click on “Assign Port Group” next to vmk0.

Click on ASSIGN next to the management portgroup

Next, Next, Finish…..Task is running and fails after a few seconds.

Checking the Task Details on the ESX host:

After some investigation and searching internally within VMware resources and also ran into this blog article: https://mhvmw.wordpress.com/2023/03/17/issue-with-nested-vcf-4-5-deployment-lab-only/

it is a MAC address conflict when the esxi takes the mac of the physical nic for vmk0.
By deleting and recreating the vmk0 interface you generate a new MAC address for vmk0.

Steps to check, delete and recreate vmk0 interface

Enable ESXi Shell

Next, Click ALT+F1 to access ESXi console and login as root.

Type the command:
esxcli network ip interface list

Make a note of the portgroup, in this case “Management network” and then remove the vmk0 with the following command:
esxcli network ip interface remove –-interface-name=vmk0

When vmk0 is deleted, we can immediately create a new interface with the same name and portgroup. This is done by the following command:
esxcli network ip interface add -–interface-name=vmk0 -p “Management Network”

To check if vmk0 is created again type the command:
esxcli network ip interface list

Click ALT+F2 to access ESXi DCUI and login to disable the ESXi shell.
Now we can configure the IP settings again via the DCUI

Go to Configure Management Settings -> IPv4 Configuration and set the static IPv4 configuration

Hit Enter then Esc and Yes to restart the management network

Now we can try to redeploy via cloudbuilder, after this the deployment went on succesfully

Virtualization & other Stuff

Category: VCF

vSAN HCL DB Out of Date – Offline Update

Get Bearer token for the authentication with the SDDC manager:

Upload all.json to SDDC Manager via Postman:

Aria Ops for Networks deployment fails from Aria Suite LifeCycle Manager.

VCF 5.0 – Retry Bring up fails on Generate input for SSH Keys