Here’s a good way to get involved in Troubleshooting scenarios and think like a #VCDX. This is a real life scenario which has been fixed and NOT a part of the official VCDX Troubleshooting scenario
One of the clients I work with have had a massive outage and we are trying to find the culprit(s).
First lets look at the symptoms of the issue, we can then diagnose what the problem is (been watching the TV Series House lately).
Symptom 1: Cross site vMotion doesn’t happen on Metro Cluster.
Symptom 2: DRS doesn’t balance the load on either site (8 hosts at each site). Host utilisation varies from 50% to 90%.
Symptom 3: Management Network access is really slow.
Symptom 4: VM access is fine from RDP but console access from vSphere Client is unavailable or slow.
Symptom 5: DRS migrations have either failed or been waiting for services to be available.
Symptom 6: Manual vMotion is really slow too.
Symptom 7: All Management access is unavailable (last one).
Now lets look at the infrastructure,
- CISCO UCS B230 series Blades (20 cores and 512 GB RAM)
- 2204 XP FI
- Nexus 1K Distributed Switch, Nexus 5k Edge Switches and Nexus 7k Core switches
- Standard Switch
- EMC Storage (VMAX, FAST enabled, Fast Cache Enabled)
- VPLEX Metro for vMSC.
Feel free to comment with the diagnosis.. I will post the diagnosis in the next post..