Thursday 22 October 2015

Nutanix Connect Chapter Champion!

I just received the exciting news that my nomination has Nutanix Connect Champion has been confirmed! It also turns out I am the first chapter leader in New Zealand.



I am looking forward for my Champion Welcome kit and UG Events kit to arrive. Not sure what to expect from it all but I am looking forward to run the local chapter. Nutanix will be providing me with the resources to making it a success.

Looks like Nutanix Connect Champion comes with quite a few benefits too! I am entitled to the following gems:


  • Nutanix .Next User Conference Lifetime Pass: Nutanix offers their first 25 champions lifetime free passes and I am one of the lucky 25!
  • 40% .Next Discount for your entire team: My colleagues can get 40% off conference pricing any time and that is on top of the already 40% off for the early birds pricing.
  • Next Unconference Early submissions to Call for Speaker Proposals: Nutanix reserve early submissions of speaker proposals to the .Next Conference Unconference track for the Chapter Champions.
  • 20% Discounts on NPP/NPX, Acropolis and VDI Training for you and the chapter
  • Free Nutanix Fit Check: When you purchase one fit check your company is entitled to one free Fit Check


Tuesday 20 October 2015

High CPU Contention in vROPS

Last week I was investigating a VM that was perceived to have CPU performance issues. As always I investigate the following metrics first to rule out contention:

  • CPU Ready
  • Co-Stop
  • Swap Wait 
  • I/O Wait
The values of the metrics returned were well within the acceptable range and thus did not indicated any contention. However the CPU Contention metric indicated a rather high value. I saw this pattern across most of the VM too.


So what is the CPU Contention metric all about? From what I understand it is not a granular metric but a derived metric which allows you to quickly spot that the VM is suffering from CPU contention.
You can then inspect the individual metrics I mentioned above. Just to make sure I double checked the individual CPU metrics in vROPS but nothing there that is of concern. Obviously something is not quite right so let's investigate the workload of our VM


My VM is configured with 10Ghz of CPU and its demand, indicated by the green bar, is 5 Ghz.
Usage, as indicated by the grey bar, is 2 Ghz. Demand is what is requested and usage is what is delivered. Since the VM does not get the resources that are requested we have to conclude there is contention somewhere. As I really could not find anything that pointed to contention I turned to Google and started seeing some reports that this could be caused by CPU power management policies. Some people reported they had this issue and it was fixed by disabling power management.
Worth testing it for myself....

The procedure will be different depending on your hardware. In my case it was HP hardware. The following VMware KB may come in handy.
Power management on ESXi can be managed via the host if the host bios support OS control mode.
ESX supports 4 modes of power management:

  • High Performance
  • Low Performance
  • Balanced
  • Custom
As other people suggested, the high performance policy fixed their issue. This effectively disables power management. You can change this setting without disruption but you will need to reboot your host to ensure the setting is applied. Select your Host > Configuration > Hardware > Power Management. Set to High Performance


In case of HP hardware, and depending on the generation, you will need to set the power profile and power regulator too. Both of these options can be set in the BIOS and the latter can be done via ILO interface too. The power profile allows for 4 settings:

  • Balanced Power and Performance (Default)
  • Minimum Power Usage
  • Maximum Performance
  • Custom
I changed this to custom as this allows most flexibility and makes all options available.

The Power regulator allows you to configure the following profiles 

  • Dynamic Power Savings Mode
  • Static Low Power Mode
  • Static High Performance Mode
  • OS Control Mode
This can be changed from either the BIOS or ILO interface. If changing in ILO you can do it at anytime but will not take affect until you reboot


I changed the option to OS control mode as it actually ensures that the processors run in their maximum power and performance mode unless you change the profile via the OS. We did set the policy to High Performance in ESXi so we have now effectively disabled all power savings.

So has all this work actually made a difference? Let's check!
In vROPS we see that our CPU Contention metric did make a difference indeed.


We did determine that "contention = demand - usage". When looking at the demand and usage under workload we can see that these are now the same which means that the VM is getting the resources it is requested.



Looks like our contention is gone. Although there were really no complaints from users in regards to performance, one colleague found that one of his VM was performing better after this change.

Although the focus of this post was on vROPS there are other ways of determining whether there is contention. In ESXtop I found no issue with the likes of %RDY and %CSTP but I did notice that the %LAT_C entry was high and it also indicated that the VM wanted (%RUN) more resources than it received (%USED).  This blog post by Michael Wilmsen does a very good job explaining it in more detail.





Wednesday 14 October 2015

One-Click upgrade to Acropolis 4.5.

I am very excited about the release of Acropolis 4.5. It has some great features I am eager to test, one of them being Azure Cloud Connect. When I was about to upgrade I noticed that the new NOS version did not appear in the list of available upgrades.



I did notice that NCC 2.1.1 was available to upgrade which was released at the same time.
Assuming it was an issue with PRISM I did restart the PRISM service
  • $ allssh genesis stop prism
  • $ cluster start
This did not fix the issue. I also noticed the same problem on another cluster and in my PRISM Central install. Time to check with the knowledgeable SRE at Nutanix :-)

I was told that the One-Click Upgrade for 4.5 is not available. Apparently this is because of the many new features introduced and they would like the customer to be fully aware of these new features before upgrading. Until it is available you can do a manual upgrade. Just upload the binary and metadata file. PRISM Central can be manually upgraded with the same binary/metadata file that you use for upgrading your clusters.