VSAN Deduplication – what they don’t tell you
I was asked about deduplication ratios for VDI today which brought back some memories from when I deployed a VSAN cluster a few months ago.
It was a VSAN all-flash cluster (mentioned in my previous blog post) which I enabled deduplication and compression. I used Zerto to migrate some VMs for testing and found that I was getting a deduplication ratio of 1.00x.
After several hours of research and testing, I found the following:
- VMs need to be thin-provisioned (Object Space Reservation of 0%) to take advantage of VSAN’s deduplication capabilities. I was replicating VMs using Zerto with the “thin” checkbox unchecked. I don’t normally like provisioning VMs with thin disks as it’s an additional management overhead and increases the risks to the platform. Once the datastore fills up, all VMs will go into a pause state which may also cause corruption in some Linux VMs. Most SAN arrays have thin provisioning enabled on the back-end anyway so there’s no reason to do “thin-on-thin”.
- There are no eager-zeroed thick, lazy-zeroed thick or thin VM disks when it comes to VSAN. VSAN is an object store and its equivalent is Object Space Reservation which is either 100% for thick-provisioned disks or 0% for thin-provisioned disks. The OSR is set using the VM Storage Policy applied to the vSAN Datastore.
- Once a VM is provisioned with 100% Object Space Reservation, you cannot easily change it to 0% Object Space Reservation without doing a storage vmotion to a different datastore and moving it back again.
- On a costs perspective, the vSAN Deduplication and Compression feature requires the vSAN Advanced or Enterprise license so you may want to do a costs benefits analysis on what deduplication ratio would be the break-even point depending on your licensing costs. Using Zerto was a good way to gauging the deduplication ratios that I would obtain so I could just delete the Zerto Replication Groups and start again with or without deduplication if I wanted to.
After setting up the VM Storage Policy for the vSAN Datastore with OSR of 0% and settings the Zerto Virtual Protection Groups with the “thin” checkbox selected, I am getting close to 1.5x deduplication ratio for the migrated VMs.
Final word on deduplication ratios… I’ve heard vendors claim high deduplication ratios between 50x to 200x on their hyper-converged solutions especially when it comes to VDI solutions. I would be very sceptical of these claims when it comes to VDI. For example, VMware Horizon View uses “gold” or “master” images when deploying a desktop pool of linked-clones. This basically means that all VMs are a copy of this gold/master image and the user’s desktop settings/files can be stored on a shared drive elsewhere. This model allows for extreme deduplication claims as you could theoretically have 1000 VMs from the same gold/master image and because all 1000 VMs are a copy, you could have close to 1000x deduplication ratio.
Let me know if you have experienced similar situations or if you disagree with me in the comments below.