A few weeks ago (August 20, 2019 to be exact) Pure Storage released Purity 5.3. This was an exciting time for us as it is a release with several new features and updates. In case you haven’t heard or read about it, one of those things was called EncryptReduce, which my colleague Cody Hosterman has already blogged about. So feel free to take a peek at that too while you’re investigating new features!
But that is not what we are going to be discussing today, instead we will be covering the topic of Quality of Service (QoS) and what important changes we have made with this release.
We have actually had QoS baked into our Purity Operating Environment for quite some time, since before Purity 5.0. Although it was Purity 5.0 where “Always On QoS” was introduced. With each subsequent release there has been a targeted effort to not only harden this critical feature but to allow for more flexible / fine tuning. Afterall, that is what QoS is for, to allow administrators to set reasonable limits for individual workloads in their respective environment(s).
So without further ado let’s look at what has been changed / added:
- Adds QoS IOPS limits for volumes and volume groups
- Displays QoS limits in the Volumes page
- Adds QoS information to downloaded CSV files
So let’s go through these one by one real quick to give a comprehensive overview.
QoS IOPS limits for volumes and volume groups
In previous versions of Purity you had the ability to impose limits on individual volumes by bandwidth only. This could be done via GUI or CLI; below is an example of what this looked like:
pureuser@sn1-m20-e05-28> purevol setattr --bw-limit 100M sn1-m20-e05-28-dev-ds Name Size Bandwidth Limit (B/s) sn1-m20-e05-28-dev-ds 1T 100M
You could then check to see which volumes had QoS enabled by running the ‘purevol list –qos’ command, as noted below:
pureuser@sn1-m20-e05-28> purevol list --qos Name Size Bandwidth Limit (B/s) sn1-m20-e05-28-dev-ds 1T 100M sn1-m20-e05-28-dev-ds2 1T - sn1-m20-e05-28-prod-ds 1T - sn1-m20-e05-28-prod-ds2 1T - sn1-m20-e05-28-prod-iSCSI 1T - workload-ds 5T -
Here we can see that only one volume has the QoS limitations set, which is the volume example listed previously.
Now if we migrate a VM that is running ~800MB/s of I/O to a different volume, and move it to this newly restricted volume, we can see the dramatic drop occur instantly as QoS limits are enforced:
Now with Purity 5.3 though you can impose BW limitations on volumes and volume groups (vgroups). Additionally, you can also enforce IOPS limitations for both object types as well.
Since the volume is pretty straightforward let’s show an example of imposing a limit on a vgroup. By imposing a limit on the vgroup this will enforce a limit for all volumes contained within that vgroup collectively. So if my limitation is say 2,000 IOPS on the vgroup, then all of the volumes associated with that vgroup are unable to send more than 2,000 IOPS collectively to the FlashArray. Let’s illustrate this to make it a little clearer.
First, if you look below you will see that I have a vgroup that has 7 volumes associated with it:
pureuser@sn1-m20-e05-28> purevgroup list vvol-workload-vm-6-d3abf446-vg Name Volumes vvol-workload-vm-6-d3abf446-vg vvol-workload-vm-6-d3abf446-vg/Config-9fb330a0 vvol-workload-vm-6-d3abf446-vg/Data-16beb948 vvol-workload-vm-6-d3abf446-vg/Data-3562277d vvol-workload-vm-6-d3abf446-vg/Data-4a259d81 vvol-workload-vm-6-d3abf446-vg/Data-c7f728d8 vvol-workload-vm-6-d3abf446-vg/Data-d87191ce vvol-workload-vm-6-d3abf446-vg/Swap-673a2641
So when I set the vgroup limitation to 2,000 (2k) IOPS for this vgroup, it means that all 7 of these volumes collectively will not be able to exceed 2k IOPs for both reads and writes.
So let’s do that now.
pureuser@sn1-m20-e05-28> purevgroup setattr --iops-limit 2k vvol-workload-vm-6-d3abf446-vg Name IOPS Limit vvol-workload-vm-6-d3abf446-vg 2K ......... Name Volumes Bandwidth Limit (B/s) IOPS Limit vvol--vSphere-HA-7d76e579-vg vvol--vSphere-HA-7d76e579-vg/Config-c7852a73 - - vvol-workload-vm-6-d3abf446-vg vvol-workload-vm-6-d3abf446-vg/Config-9fb330a0 - 2K vvol-workload-vm-6-d3abf446-vg/Data-16beb948 vvol-workload-vm-6-d3abf446-vg/Data-3562277d vvol-workload-vm-6-d3abf446-vg/Data-4a259d81 vvol-workload-vm-6-d3abf446-vg/Data-c7f728d8 vvol-workload-vm-6-d3abf446-vg/Data-d87191ce vvol-workload-vm-6-d3abf446-vg/Swap-673a2641
Now that it is set, let’s see what happened to our IOPS on the FlashArray:
As you will note above we went from being able to send ~70k IOPS (35k READ and 35k WRITE) down to 2k IOPs (1k READ and 1k WRITE). Now the reason my example looks so drastic is because I am only sending I/O from this vgroup to my FlashArray for easy illustration. Obviously if you have other volumes sending I/O to your FlashArray it wouldn’t likely look as drastic as this (unless the vgroup / volume you imposed limits on was the primary offender).
One thing to note here that maybe wasn’t clear enough above is that while we can now set limitations on both the volume and vgroup level we have not introduced the ability to limit by I/O type (i.e. READ vs WRITE). So when a limit is set it is important to understand that this limit is set for any I/O request to the specified volumes. Truthfully, I am not sure where that is at on the roadmap or IF it is even on the roadmap, for us to support this but if anyone wants me to take a look I am happy to do so.
At this point it is hopefully clear what this new QoS feature provides for administrators and being able to more closely restrict those “run away” volumes on the FlashArray.
Displays QoS limits in the Volumes page
Another feature that we have introduced into Purity 5.3 with QoS is the ability to see what limitations are set on a volume or volume group level when reviewing the FlashArray performance tab.
So if we take our example above we can see this information in the GUI now:
Note when you hover over the vgroup that we set the limitations on we are able to clearly see the 2k IOPS limitation we set.
Additionally, we can also see that bandwidth and IOPS limitations can be managed from the GUI for volumes and vgroups if you prefer not to use CLI:
Having these readily available in each of these sections makes troubleshooting and management a little bit easier for everyone. 🙂
Adds QoS information to downloaded CSV files
The last feature I want to touch base on is that the QoS information is now listed and available in downloaded CSV files. There are customers who will download CSV files for historical information, troubleshooting performance issues, etc. So it makes sense that this information get included into these statistics so that accurate reporting, diagnosis, and troubleshooting can be performed.
Now when I run a command and want to see the CSV output, it looks similar to the following:
If you look at the output above you will be able to clearly see that the volume group is being rate limited and adding ~5ms of latency for both reads and writes. Obviously this is because the volumes are trying to send more I/O than what they are allowed and thus they are throttled to keep them inline with the 2k IOPs limits.
Imagine though if you see that 5ms of latency reported on the ESXi host, its VMs, datastores, etc and you are unsure where it was coming from. You look at this CSV output and the underlying reason becomes clear. You can see this information in the GUI as well, but not everyone may have access to the FlashArray and thus if this information can be incredibly helpful; especially if historical information is available to compare against.
I think that about wraps it up for this one and you should have all of the information you need for QoS on Purity 5.3 and later.
If you do want to read more you can check out the following links:
As always, let me know if you have any questions, comments or concerns!