During the deployment of SSP (Security Services Platform) 5.1.1, I ran into an issue while onboarding the NSX instance.
This error means that there is still an old SSP installation onboarded in NSX. In my lab environment this was the case and the old SSP installation got corrupted due to storage issues and could not get correctly ofboarded and decommisioned. Luckily there is a kb that can help you clean up NSX and SSP.
After running the script the NSX site cleanup is complete, and you can go try to onboard NSX again to SSP. During the onboarding you can follow the onboarding process:
k get site -A
k describe site 098ad7b9-cdea-46b8-82a7-1f2e3d1fd366 -n nsxi-platform
When you see “OnboardingInProgress”, you just have to hold on a little bit until it says “OnboardingComplete”.
In the SSP you now shall see that NSX is correctly onboarded!
During the upgrade pre-check test of the Workload domain I got the following error: vSAN HCL DB Out of Date.
Because my SDDC manager has no direct internet connection I needed to get the file on my jump host. This can be done by browsing to https://partnerweb.vmware.com/service/vsan/all.json and copy the entire content and create a new file with extension”.json”
When i added a host to an existing vSAN cluster via the SDDC manager the task failed with the following error: “Found zero ssd devices for SSD cache tier”
To quickly fix this we need to set the cache disk on the ESXi host to SSD, you can check the current value with the vdq -q command. As you can see in the picture below the disk I want to use for the cache is marked with a value of “0”, so it is not recognized as SSD drive.
In the past you had to set the the disk to SSD with SATP claim rules, but from version 7.x and 8.x there is a new and simpler command to do this. Run the following ESXCLI command and use the storage device ID and the -M option with value of true (or false to revert the change) to mark the device as an SSD.
esxcli storage hpp device set -d naa.6000c299027de72c68de829e23455e88 -M true
In my lab I tried to deploy Aria Operations for Networks 6.12.1 (AON/vRNI) from Aria Suite LifeCycle 8.16 (ASLC/vRLCM). Before the deployment of AON I successfully deployed other products:
Attempted to deploy Aria Ops for Networks 6.12.1 but it failed with LCMVSPHERECONFIG1000016 error.
-----------------------------------------------------------------------------------------------------------
java.io.IOException: com.vmware.vim.binding.vmodl.fault.SystemError
at com.vmware.vrealize.lcm.drivers.vsphere65.vlsi.utils.ExceptionMappingUtils.mapAndThrowImportVAppExceptions(ExceptionMappingUtils.java:78)
at com.vmware.vrealize.lcm.drivers.vsphere65.deploy.impl.BaseOvfDeploy.importOvf(BaseOvfDeploy.java:713)
at com.vmware.vrealize.lcm.plugin.core.vsphere.tasks.DeployOvfTask.execute(DeployOvfTask.java:251)
at com.vmware.vrealize.lcm.automata.core.TaskThread.run(TaskThread.java:62)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
-----------------------------------------------------------------------------------------------------------
Error Code: LCMVSPHERECONFIG1000016 IO Exception occurred while performing the operation. Check the logs for more information. Unexpected ioexception occurred.
After a failed deployment of VCF 5.0, i was left with a vSAN Datastore on the first host in the cluster, and this was blocking a retry of the deployment.
In this state the vsanDatastore is unable to be deleted. If I try to delete it, the option is greyed out.
To delete the datastore and the partitions of the disks, we first need to SSH into the host and get the cluster.
We need the Sub-Cluster Master UUID, copy it to the clipboard. To leave the cluster the command is:
During a new lab deployment of VCF 5.0 i ran in to an small issue running the validation.
I deployed the hosts up front and made them available and unique before the validation. ran the following command to regenerate the certs and restart the services:
After a Failed VCF bring-up, I wanted to retry the bring-up. Luckily the error I encountered before was resolved but again I ran in to an issue during the retry.
Now the issue was with the import of the SSH Keys
Going through some internal resources i stumbled upon the solution, since this is a nested lab environment on top of VCD you have to reset the MAC address of the ESXi Host.
During my latest deployment of VCF in my lab environment I ran in to the following issue.
Failed to migrate vmnics of host 192.168.11.12 to DVS sfo-m01-cl01-vds01 . Reason: Failed to migrate vmknic vmk0 to DvSwitch 50 22 42 8c d5 a1 d4 8f-6d 9e 8a 1e 93 ac 5b 9d Failed to migrate vmnics of host 192.168.11.12 to DVS sfo-m01-cl01-vds01 . Reason: Failed to migrate vmknic vmk0 to DvSwitch 50 22 42 8c d5 a1 d4 8f-6d 9e 8a 1e 93 ac 5b 9d Failed to migrate vmnics of host 192.168.11.12 to DVS sfo-m01-cl01-vds01 . Reason: Failed to migrate vmknic vmk0 to DvSwitch 50 22 42 8c d5 a1 d4 8f-6d 9e 8a 1e 93 ac 5b 9d Failed to migrate vmnics of host 192.168.11.12 to DVS sfo-m01-cl01-vds01 . Reason: Failed to migrate vmknic vmk0 to DvSwitch 50 22 42 8c d5 a1 d4 8f-6d 9e 8a 1e 93 ac 5b 9d
The error is pretty clear, the migration of vmk0 from the standard vSwitch to the Distributed vSwitch failed on esx02. I checked esx01 and on this host the migration was successfull.
I tried to manually migrating the vmk0 to the distributed vSwitch also ran in to an error in vCenter. Right-click dvSwitch -> Add and Manage Hosts -> Manage Host Networking -> Select esx02
Click Next and leave the physical adapters as is, click next again. On the next screen click on “Assign Port Group” next to vmk0.
Click on ASSIGN next to the management portgroup
Next, Next, Finish…..Task is running and fails after a few seconds.
it is a MAC address conflict when the esxi takes the mac of the physical nic for vmk0. By deleting and recreating the vmk0 interface you generate a new MAC address for vmk0.
Steps to check, delete and recreate vmk0 interface
Login via DCUI
Enable ESXi Shell
Next, Click ALT+F1 to access ESXi console and login as root.
Type the command: esxcli network ip interface list
Make a note of the portgroup, in this case “Management network” and then remove the vmk0 with the following command: esxcli network ip interface remove –-interface-name=vmk0
When vmk0 is deleted, we can immediately create a new interface with the same name and portgroup. This is done by the following command: esxcli network ip interface add -–interface-name=vmk0 -p “Management Network”
To check if vmk0 is created again type the command: esxcli network ip interface list
Click ALT+F2 to access ESXi DCUI and login to disable the ESXi shell. Now we can configure the IP settings again via the DCUI
Go to Configure Management Settings -> IPv4 Configuration and set the static IPv4 configuration
Hit Enter then Esc and Yes to restart the management network
Now we can try to redeploy via cloudbuilder, after this the deployment went on succesfully
After a successful upgrade of NSX, after the last step the upgrade of the management plane the compute manager disappeared, let’s see how we can fix that!
When i try to add the vCenter it says it is already registered, let’s check with the API.
First do a API GET in Postman to get the compute manager id:
Output:
Now we have the compute manager id, we can check if it is registered and up:
Output:
As you can see the compute manager is registered and up, why is it not showing up in the UI?
Solution:
Login with the admin user by SSH, and run the following command.
start search resync inventory
Wait a few seconds and refresh the UI, now the Compute Manager is back!
Recently i got the question if in one Cloud Director Tenant (Organization) Granular Role Based Access Control and separation of rights can be configured between multiple teams within that Organization.
Details:
In our Test case Team-A is responsible for Org VDC A and can only manage and view the Edge GW resources (networks, Edge Gateways) within that VDC-Group. Team-B responsible for Org VDC B and can manage and view the all resources in all VDC-Groups, Except the Edge GW in Org VDC A. This Edge GW can only be managed by the Team-A.
Also because of some Tenant requirements the T0 (VRF) is also split between Internet and Customer specific. You can read more about this setup in an this post.
Requirements:
Separation of Rights between Org VDCs
Shared networks between Org VDCs
Team-A Can only manage the Edge GW in ORG VDC A
Team-B can manage all resources in both ORG VDCs except the Edge GW in ORG VDC A
Setup:
One Provider VDC (vCenter)
One Organization in Cloud Director (Tenant1)
Two Org VDC connected to the same Provider VDC (ORG VDC A & ORG VDC B)
Two Data Center Groups (VDC Group A & VDC Group B)
Two Edge GW (Edge A connected to VDC Groupp A & Edge Bconnected to VDC Group B)
Tenant Acces Role Team-A
Tenant Acces Role Team-B
Datacenter Groups
From version 10.2, VMware Cloud Director supports Data Center Group networking backed by NSX-T Data Center.
A Data Center Group acts as a Cross-VDC router that provides centralized networking administration, egress point configuration, and east-west traffic between all networks within the group.
Using Data Center Groups, we can share organization networks across various ORG VDCs. To do so we first group the virtual data centers, then create a VDC network that is scoped to the Data Center Group. A data center group can contain between one and 16 virtual data centers that are configured to share multiple egress points.
We need to created 2 Data Center Groups and connect them to the participating VDC & Edges
VDC Group A -> ORG VDC A (24-2 in picture below) & Edge A VDC Group B -> ORG VDC B (24 in picture below) & Edge B
Roles
By default, organization VDCs are shared with all users and groups that have a role which includes the Allow Access to All Organization VDCs right.
As an Organization Administrator, you can limit the access to each of the organization VDCs in your organization to specific users and groups.
Our organization has multiple organization VDCs and we want to have them managed separately, so create a custom role that would function as an organization VDC administrator and assign it to specific users or groups within your organization, providing them with access only to a specific VDC’s compute and networking resources.
For Team-B we can use the predefined role Organization Administrator. In this role the following right is allowed: Allow Access to All Organization VDCs
This permission is exactly what is tells you it does, giving you permission to ALL Organizations VDC in the Organization. So with this role we are able to manage VM’s, networks, etc in all the Organization VDC’s.
Exactly what we need for Team-B.
For Team-A we need to make a new role with more granular permissions. Create a new role and exclude the Allow Access to All Organization VDCs right. Set the rest of the permissions to view and manage the Edges and Networks.
Publish both Roles to the Tenant and create 2 users in the Tenant.
Limit Access to ORG-VDC
Now we need to limit the access to the Org VDC, on the Virtual Data Center dashboard screen, click the card of the virtual data center that you want to limit access to.
Under Settings, click Sharing, The list of users and groups within the organization that have access to the VDC appears. To change the access settings to the organization VDC, click Edit.
Select Specific Users and Groups, From the Users list, select the users that you want to provide with access to the VDC, same procedure if you are using groups.
So for ORG VDC A select Team-A, Team-B already has access to all ORG VDC because of the Allow Access to All Organization VDCs right.
To share the VDC with the selected users and groups, click Share. At this moment Team-A can only view and manage Edges and Networks in ORG VDC A, and Team B can view and manage all resources in both ORG VDCs, also the Edge in ORG VDC A. if we need to get that sorted out we need to created roles and groups for every thinkable Resource set (Edge Admin, VM Admin, DFW Admin, etc) in every ORG VDC.
But another requirement in this Test Case is the Shared Network between ORG VDCs. For this requirement it is needed to add the other VDC to the participating VDCs in the Data Center Group.
As soon as you configure this Team-A (which can only see the Edge-A in ORG VDC A), immediately can see the Edge-B under ORG VDC B. A no go in our case. The rights are distributed very horizontal. So as soon as you have multiple participating VDC in your Data Center group the team that was restricted to viewing and managing the resources in ORG VDC A, can now also view and manage the resources it is permitted to in ORG VDC B.
Conclusion
For this Test Case the outcome was negative as we needed shared networks between ORG VDCs. If the sharing of networks is not needed, you set a very Granular RBAC model, but keep in mind when you set the Allow Access to All Organization VDCs right in a role, the users/groups that have this role are allowed to show all resources they are eligible to in All Organization VDCs