Port 5480: 2016

Friday, 2 December 2016

Win a Nutanix Xpress Platform

X-mas has arrived early this year! Well at least for whoever will be the lucky winner.
Nutanix is giving away a Nutanix Xpress Platform that comes fully loaded with a 1 year support and software license, as well as many other prices.

Some time back I saw a web seminar on the Xpress Platform and I would love to have one for myself. Unfortunately I do not qualify myself as it is open only to new SMB customers and you have to be US based. Hopefully most of you do qualify so hurry up and enter to win.

So what is included in the Xpress platform?

AHV Hypervisor and PRISM Management
Quick and easy deployment within 60 minutes
Simplified management
No pesky hypervisor licesning costs
Non-disruptive upgrades with zero downtime
Integrated local, remote and cloud backup

The platform hardware specs are as follows:

3 nodes with 16 cores CPU, 64 GB of Memory, 480 GB SSD and 4 TB HDD Storage
Xpress software edition
1 year Nutanix support

Wednesday, 30 November 2016

Copying files to Azure Storage Blob

I was assigned a SR today to move files from a local server onto Azure Blob storage.
The web application is being moved to Azure and I needed a way to move the files to the container created as part of the application setup. Initially I investigated Microsoft Azure Storage Explorer . I liked the look of it but had some issues with it. I had to transfer nearly 8000 folders and the app seem to crash if I dragged in more than 30 at a time.

After some searching I came across AZCopy and this did the trick. Its usage is straight forward. You download it as part of the Microsoft Azure Storage tools package.

Once installed you start the Microsoft Azure Storage Command Line from your Start menu.

I wanted to ensure the nearly 8,000 folders and 38,000 files got copied over and I did this with the following command

AzCopy /Source:C:\myfolder /Dest:https://myaccount.blob.core.windows.net/mycontainer /DestKey:key /S

Replace key with the key specified on your Azure storage account

You can now browse your blob container and all folders should be there

Friday, 25 November 2016

Expand your Nutanix Cluster

One of the things I like most about AOS is the one click functionalities that make my life easier. Upgrading AOS or the hypervisor is something I get to do on a regular basis, other functions not so much. Some time back I got to use the convert cluster functionality and you can read more about it here. Another one click option I wanted to try out for a while is the expand cluster one. A fresh shipment of NX-1065 arrived so this presented me with a great opportunity to try it out.

You have racked the nodes and you have configured your network ports. The requirements are the same as when using the foundation applet or foundation VM. One thing that has caught me out once or twice before with re-imaging is that I connected all my network cabling upfront. In my case that is a 1 Gbps for IPMI and 2 x 10 Gbps for all my management and data networks. Ensure you disconnect your 10 Gbps cabling and only connect your IPMI connection to the shared 1 Gbps network port.

The expand procedure is quite straightforward:

Log in to PRISM and go to gear icon
Select expand cluster. On the screen tick boxes of nodes you want to add. Enter the required IP for CVM and hypervisor
Select the hypervisor version you want to use
Click the run checks button and the pre-check will start.
When done expand cluster.
Just sit back and wait for everything to complete

Once the nodes have been added you still have a bit of clean up work to do. One of the things I think that can be improved is the option to set the hypervisor hostname. You will that it will appear in PRISM with a default hostname such as the ones below

I want my hostnames to be consistent so I will need to change a few things

SSH into your ESXi host
Change hostname with 'esxcli system hostname set --fqdn <name>' command
Restart genesis on all nodes 'allssh genesis restart genesis'
You may also want to execute service hostd restart on your ESXi nodes

Although the nodes are added to the cluster, the ESXi hosts will not automatically mount the existing datastores. This can be easily done via PRISM though.

Go to Home > Storage > Table > Container
Select the datastore and click update
Make sure mount/unmount ESXI hosts is selected
Tick the box next to the new host and click save

One other thing I'd like to see build into the Expand cluster wizard is the ability to set CVM size. Currently it will be done with the default which caused dedupe to be disabled on the cluster. Here is the fix:

Shut down the CVM with 'cvm_shutdown -P now' command
Change the RAM size to what is required for dedupe. You wan the CVM memory sizes in your cluster to be conssitent
Power on CVM
SSH into CVM
Run edit-zeus --editor=vim
Remove the line that says "disable_on_disk_dedupe: true"
Save the file and quit vim

At this point I am ready to make some changes in vCenter. I want to ensure that I am making use of 10 Gbps networking and that my vmk and CVM are running on my distributed switches. I won't be going into much detail on this but the Add and Manage Hosts wizard is your friend.

The last thing I had to clean up is some red herring alerts I got from PRISM. It will tell you that the 1 Gbps interfaces are down. You can fix this by following the steps in this KB

And that is pretty much it. I have used this process a few times now with ESXi and AHV and I think it is fantastic!

Saturday, 6 August 2016

In-Place Hypervisor Conversion from ESXi to AHV

It has been nearly 3 years since we bought our first Nutanix block. The NX-1465 was used in a ROBO location to provide some backup services for the main datacentre at the head office. Due to infrastructure changes this block started to see less and less use and a fine piece of equipment such as this is wasted when only hosting a handful of VM. Having found an alternative solution for these VM, I decided to convert this cluster to Acropolis Hypervisor. It will allow me to come to grips with a different hypervisor and also give us the ability to test some of the AOS features such as File Services and Containers.

Rather than destroying cluster and rebuild I wanted to see how good the conversion utility worked. Looking at the documentation it became clear that it is not a matter of just pressing the button, there are a few requirements to adhere to. Since the ESXi cluster was attached to a distributed switch I had some cleaning up to do. vDS is not supported so I migrated all physical adapters and vmk interfaces to a freshly created standard switch. Only one external vSS is supported btw. I also deleted my vMotion interface from my setup. Once all my networking met the requirements I removed the nodes and cluster from vCenter all together. I actually did not see this as a requirement in the documentation but it made sense to me as it is most likely cleaner.

I also was keen to see how well it would convert a virtual machine. I made sure I left a Windows 2012 R2 VM behind that was created from our standard template. One requirement for this is the install of the Nutanix Guest Tools. NGT is somewhat like VMware tools and provides enhanced functionality for the VM. It is key to AMF (App Mobility Fabric) and installs features such as file level restore, VSS Copy and Nutanix guest agent.

Install Nutanix Guest Tools

Go to table view under VM
Select your VM and enable NGT. You will be prompted to continue
This will mount the iso so log on to your VM and click setup under the CD-Rom drive
Agree to EULA and select install.
You will be prompted to install Python
Follow the bouncing ball
The mobility drivers will be installed as part of the installer
The setup is now complete

Nutanix has a video available that shows the above steps but it also shows how to install NGT on a Linux VM.

Before starting the actual conversion process I also deleted my existing protection domains and snapshots. Not sure if this was needed but seemed like a good idea.

Convert Cluster

Go to the gear icon and select Convert Cluster
Select the hypervisor and take note of the warnings
All going well you should get message that conversion was succesful
Click convert cluster to get started. A warning message will appear. Click yes
The conversion will start
Some progress bars will appear

At this point I was wondering if something was actually happening as I did not see any progress. After about 50 minutes I got kicked of the session and things went black for a wee while. Then things started happening and a progress screen appeared.

After 90 minutes the conversion appeared to be complete as indicated by this screen after logging in

Everything appeared to be in order with the system but I did get A1082 alerts for a while. The funny thing here was that they actually referred to vmnic while Acropolis names the interfaces Eth. I opened a job with support and this appears to be a red herring. The alerts appeared to have stopped about a day after conversion.

The system seemed to be just fine but what about the VM? It appeared in the list of VM's but I ran into a problem when trying to boot the VM. It was stuck at the boot disk as shown here.

It appears that the issue was caused by the fact that the original VM made use of EUFI boot. AHV does not support EUFI apparently with certain versions but it looks like it will be fixed in 4.7.1. I will be upgrading soon and hopefully that will kick the VM into life. It would have been good if the conversion process would have picked up on this non-compatibility beforehand.
I now have an AHV cluster and can't wait to get cracking. WIll spend some quality time with the best practices guide now. Expect to see some more posts on AHV soon.

Monday, 11 July 2016

Integrate vSphere Replication with Site Recovery Manager

If you don't have a disaster recovery strategy you are definitely doing something wrong, especially if you live in an earthquake prone country such as New Zealand. The Christchurch earthquake back in February 2011 was a reminder to many businesses that you just cannot afford not to have a disaster recovery strategy! The virtual datacenter I manage is not located in an area that is considered high risk for earthquakes but there is obviously so much more that could go wrong. Ask our Aussie cousins in Queensland about floodings...

Traditionally my employer has been making use of the traditional server + shared storage approach to host our virtual infrastructure. Back in the early days we were limited to a virtual infrastructure and shared storage on the primary site only but as DR became more important the same infrastructure got implemented at a remote site. Having VMware as hypervisor of choice and Netapp as a filer solution, Site Recovery Manager was the obvious choice. Over the last few years we have seen some change in the way we deliver virtual servers. Although the main share of virtual servers are still delivered on the server + shared storage platform we are increasingly making use of the Nutanix Xtreme Computing Platform. The services we hosted on Nutanix initially were pre-prod and development in nature but we are so impressed with the product that we are now using it for production environments.

Introducing a hybrid environment comes with some challenges. Protecting workloads on Nutanix to a Netapp based environment at the recovery site means that you can no longer make use of the Netapp snapmirror technology. Implementing a Nutanix solution at the recovery site is not an option at this stage. One way to do this is replicating the workloads at the VM level. Although there are several products that can do this we did not look past vSphere Replication. It is included with the existing ESXi licensing and more important it integrates with the existing Site Recovery Manager implementation.

The vSphere Replication is pretty straight forward. Just make sure you check the compatibility list for version info on vCenter and Site Recovery Manager.

Install

Download the appropriate ISO from VMware
Connect ISO to your workstation and import OVF
Give your appliance a name and set the location
Accept the default deployment configuration
Specify storage location and set disk format
Connect appliance to the desired port group
Set password for the appliance
Take note of the vService bindings

Click finish and power on VM

Configuration

Browse to http://<your appliance>:5480
Log in with root user and password you specified during install
Go to System > Time Zone and set accordingly

The above steps for installation and configuration will need to be done at both the recovery and protected sites.

VM Protection

Now that your appliances are up and running at both ends it is time to start protecting some of your VM.

Click on VM you wish to protect. All vSphere Replication Actions > Configure Replication
Click replicate to a vCenter

Click Add Remote Site and enter details

Accept the security warning
Select the added remote site and click next

The replication server will be auto assigned. Click next.

Select a target location for your VM and click next

Don't set guest OS quescing unless required
Configure RPO as desired and click next.

Click finish
Go to vCenter > Manage > vSphere Replication. Here you can verify your configuration
Goto vCenter > Monitor > vSphere Replication. Under outgoing Replications you can see the status

Although the purpose of this post is about integrating vSphere Replication with SRM I will quickly cover how to recover your VM if you are only using vSphere Replication.

Log into your recovery site's vCenter
Go to vCenter > Monitor > vSphere Replication
Right-click the VM and select Recovery
A recovery wizard will appear which allows you to select Recovery Options, Folder, a destination host. You will be informed that the network devices of the recovered VM will be disconnected.
Connect to desired network and power on

vSphere Replication does not allow automated failback but you can reconfigure the replication in reverse direction thus from secondary site back to primary. If the disks are still available on the primary site you can use these as replication seeds and only sync the changed blocks

Ensure that the VM has been removed from inventory at primary site
Select the VM on the recovery site and configure replication. Choose replicate to vCenter Server.
Select the primary vCenter server
Under target location you need to ensure that you point to the volume that holds existing disk. In addition, under target allocation, you will need to point to the existing VM folder. Otherwise you will not be making use of replication seeds
Configure the remaining settings to your liking.
When recovering the VM at the primary site again you will be informed that the vmx config file already exists. Click yes
You should now be able to recover your VM again

SRM Integration

One of the benefits of vSphere Replication is that it integrates with Site Recovery Manager. Just like with SRA, you can create protection groups and recovery plans.

Configure your VM for replication has you did in previous steps

Create a new protection group under SRM

Under protection group type, ensure you specify vSphere Replication

Add the VM to the group

Create new recovery plan.

Specify the recovery site

Specify the Protection Group you created previosuly

Select your test networks. The default setting is isolated and this will create a new vswitch with no connections

You can now run your test plan by clicking the green arrow.

As a test I added a txt file to desktop. I now opt to replicate recent changes to recovery site

The SRM window will show test is in progress. You can monitor progress under the recovery plan > monitor > Recovery steps.

Under vcenter > Monitor > replication you can also see that the sync is in progress.

Once the sync is completed the VM will be powered on

I have verified my VM has been recovered and can see that my modified file is on the desktop.

As you can see it is pretty straight forward to integrate vSphere Replication with Site Recovery Manager. Running a recovery is not all that different from a test but obviously its impact is significantly different hence the reason you need to click the checkbox and you get presented with a big red circle.

Go to your recovery plan and click the red circle.

Make sure you understand the consequences. Read and click the checkbox. Click next and finish.

Assuming all goes well your virtual machine is now running from your recovery site. You probably want to re-protect your VM as this will allow you to send your workload back to the original site.

Click the lightning bold shield

Confirm that you want to re-protect your VM. Click the checkbox. Click next and finish

One thing to keep in mind here is that when you do a re-protect on a protection group that makes use of vSphere Replication it will do a full sync replica. There appear to be no option to make use of replication seeds as you can with non-SRM integrated vSphere replication. If you do have the solution, let me know

Sunday, 28 February 2016

Check your ring size. Network loss after VM migration

Last week my colleague was patching the ESX fleet from build 3029944 to build 3343343. Nothing new, has been done many times before and you know it works. Unfortunately it did not go without hassle this time around. After placing a host in maintenance mode the VM migrate as expected however a few VM did lose their connectivity to the network. Early investigation indicated that it was very random. Some of the VM placed on a host build 3029944 had no issues when they arrived on host with new build 3343343 When looking at the common denominators we started to notice that the VM were mostly of build Windows 2008 R2 and part of the same applications. In this case Microsoft Lync and some of the Exchange environment. We did not notice issues with Linux builds. Nothing at the ESX layer indicated that there were issues although the particular VM port on the vDS appeared down.

Concentrating on the OS level we noticed that there were issues with the vmxnet3 driver under device manager. The error message returned was: This device cannot start (Code 10). A couple of workarounds to this are uninstalling the device and rescanning the device or removing the vnic from the VM and adding a new one. Although this fixed the issue we still had no idea as to what caused this issue. In the mean time VMware support was engaged but nothing was obvious as we already discovered. Eventually the problem was found and it turned it is a known bug with build 3343343 as the engineer discovered in an internally published KB.

The issue was due to the fact that some of our VM had their advanced vmxnet3 nic settings altered and in particular the ring size #2 setting. This setting was set on certain applications, such as Lync and Exchange, by our networking architect after experiencing packet loss. You may want to have a look at KB2039495 .

As part of the recommendations the setting was set to 4096 and this is apparently the problem with this bug. The recommendation is 2048 and indeed there is no loss of connectivity after that. This explains why replacing vnic worked too as it wipes out the default setting.

Pages