Before I start, its been a while since my last post, mainly because I have been really busy with work and family. Now hopefully I will make it habit to post something useful once every few weeks.
Disclaimer: This is not the **official** recommendation from Nutanix on Cisco ACI. This is just something that I worked on for a client of ours and thought would be useful for anyone who might end up deploying Nutanix + Cisco ACI + new vCenter on Nutanix NDFS :)..
Problem Statement: Cisco ACI requires OOB access to vCenter to deploy the Cisco ACI networks as Portgroups in vCenter. Build vCenter on NDFS. NDFS needs 10 Gb fabric (ideally) from each node. All the uplinks in the leaf switch are controlled by Cisco ACI. But ACI needs vCenter to push out Management and VM Network VLANs.
Some of you might see this and go uh oh, but let me assure you, this also becomes a problem in non Nutanix environments as well; especially anyone using IP based storage and only have 2 x 10Gb adapters.
There are 2 ways in which we can take care of this.
Option 1: Deploy a vCenter in a management only cluster, which doesn’t depend on the Cisco ACI for Networking. (Needs seperate physical infrastructure for networking and for management cluster)
Option 2: Add another dual 10Gb Nic to each of the nodes. (becomes a lot more expensive when you think of tens of nodes x 4 10Gb adapters).
Both the above options are quite costly, be it from a networking physical infrastructure point of view or a management only cluster point of view.
So how do we go about solving this?
As you can see from the picture below each Nutanix node has 2 x 1 Gb network ports and 2 x 10 Gb network ports.
To provide an OOB network for vCenter, we had to update the vCenter appliance to support multiple NICs. One for the public interface to talk to ESXi and others, one for private OOB network. Since we had to build this vCenter new, there was a chicken and egg problem to start with. How do we build a NDFS fabric without having access to 10 Gb ? How do we build the vCenter server on NDFS without migrating the Management & CVM Networks onto the 10 Gb ?
The physical networking infrastructure was provided by Cisco 2950 for OOB management (IPMI and the fore mentioned Cisco ACI provisioning network) and Cisco Nexus 9000 (for ACI). The figure below shows how the networking was configured from both VMware as well as physical layer.
So as you can see, without changing the switch config on the 2950 or ACI switches, it isn’t possible to
- build vCenter
- Migrate the CVM network to 10 Gb
- Build NDFS on 10 Gb.
To combat this, we built the NDFS layer while it was connected to a local 1 Gb Switch. We installed vCenter PSC and vCenter Server on the NDFS layer while it was on 1 Gb. The customer was very impressed with the speed that we were getting on the 1Gb NDFS layer, this also shows the advantage of having data locality. Once we built the vCenter Server and dual homed it, we provisioned the VLANs required by using Cisco ACI.
Now we ran into another problem, how to migrate the vCenter and the Management Networks onto 10Gb. The network that we used to create NDFS was untagged and so were all the management network connections between the hosts and vCenter. Whenever we tried to move any ESXi Management onto the 10gb, it would fail because it wouldn’t or couldn’t talk to the vCenter server.
One could think, well shutdown vCenter Server and move it to the 10Gb. Well it wasn’t that easy as the NDFS was still running on 10Gb and the moment we move the vCenter Server to VLAN 100, it would lose network connectivity to the hosts and we are back at where we started. The customer network engineer and I spent the next few hours on the white board to figure out how to make it work. We created a secondary management network vmk3 ( vmk0,vmk1 & vmk2 were already being used) interface on all the ESXi hosts , just in case we lost all connectivity.
This is what we did, we shutdown the vCenter, Nutanix Cluster and the CVMs in that particular order. Used a transceiver on the 10Gb network to connect to the RJ45 (x number of nodes) and configured them with VLAN 103 with VLAN trunking taking these ports away from Cisco ACI control. When done, we connected the VMNIC0 and VMNIC1 to that particular VLAN and brought up the Nutanix CVMs, Cluster and vCenter in that particular order. I also configure my mac to run with a VLAN 103 adapter connected to another port on the same production switch. (Now, this was possible only because this was a green fields deployment, imagine how hard this would be to do in an actual production environment. Just getting the changes approved would be a nightmare. Let alone, running the same VLAN on multiple interfaces controlled by multiple devices and connecting a MacBook Pro to production network). When everything was working, we switched the vmnic0 and vmnic1 back to Cisco 2960 for OOB management of vCenter + Cisco ACI with untagged traffic.
Once the NDFS was up and running, we migrated the vmk0 interface on all the ESXi Servers, ensure that the inter-communication was good. We then proceeded with migration of the CVMs and thereby NDFS. The last step we did was to move the vCenter over to the 10Gb. Thankfully the vmkernel interface doesn’t care about the underlying management of the network, so VLAN 103 which was provided by 2 different entities could talk to each other. Here is the pictorial representation of the change.
Option 2- Using a 10 Gb Switch: This would be the least confusing option, get a 10Gb Switch, configure it with VLAN 103 and connect one of the 10Gb NICs to this switch and go about your way. Well I didn’t have access to a 10 Gb switch (I do now as I went and bought one straight away, which I can now also use in my Home Lab 🙂 ).
Conclusion: Cisco ACI still doesn’t solve the chicken egg problem when a new vCenter is required. I gave the feedback to the Cisco Rep for the customer saying that VMware resolved this issues in 5.1 DVS. Requiring an OOB to configure networks for VMware environment is doing things the old-fashioned way. All of these issues could’ve been resolved, if we were able to release one of the 10 Gb NICs from Cisco ACI control and setup normal trunked network on that interface.
This might not be the ONLY way or best way to resolve the issue of ACI being dependent on vCenter to push the network changes but this is the way that I could make work. If any of you can think of better ways please post them in the comments and I will update the post based on those.
Until next one.. adios and thanks for reading.