How to adjust the Round Robin IOPS limit from default 1000 to 1 ESXi 7

In this blog we will learn how to change the Round Robin Path Selection Policy (VMW_PSP_RR) on FC and iSCSI LUNs in a VMware environment to better balance the I/O load across all active storage paths and make the setting persistent.

Modifying the IOPS limitation to 1 will improve performance where an active storage path might have several queued I/Os. In addition, setting the limitation to 1 allows other active storage paths to service I/O requests.  This will have the benefit of reduced latency and increased throughput.

From my own testing on TrueNAS core changing RR policy to iops=1 has shown a 104.75% increase in IOPS, and up to a 46.6667% reduction in latency for VMFS datastores. It is recommended to test this in your environment to determine if this change positively effects your workloads. Throughput maxed out at 1647MB/s on a QLogic (QLE2562) 8Gb dual port FC HBA, tests where done on a all flash pool.

ESXi Round Robin PSP supports two types of limits:

  • IOPS limit: The Round Robin PSP defaults to an IOPS limit with a value of 1000. In this default case, a new path is used after 1000 I/O operations are issued.
  • Bytes limit: The bytes limit is an alternative to the IOPS limit. The bytes limit allows for a specified amount of bytes to be transferred before the path is switched. (Default = 10485760 bytes)
  • Essentially, Round Robin will attempt to re-balance after every 1000 I/Os or 10485760 bytes.

Reference:

https://kb.vmware.com/s/article/2069356

Note: You do not need to restart the host for the changes to take effect.

Configure ESXi Host

I now have to enable the SSH service then SSH into your host and run the following command to list all of the storage SCSI devices. This will include all FC, iSCSI and local block storage mounted on the host.

esxcli storage nmp device list | grep naa
[root@R720-HOST-01:~] esxcli storage nmp device list | grep naa
naa.6589cfc0000006809342a12039572e04
   Device Display Name: TrueNAS Fibre Channel Disk (naa.6589cfc0000006809342a12039572e04)
naa.6589cfc000000816db54001e3f0d2264
   Device Display Name: TrueNAS Fibre Channel Disk (naa.6589cfc000000816db54001e3f0d2264)

Get info about a LUN by running the below command on one of you LUN IDs

esxcli storage nmp device list -d naa.6589cfc0000006809342a12039572e04
[root@R720-HOST-01:~] esxcli storage nmp device list -d naa.6589cfc0000006809342a12039572e04
naa.6589cfc0000006809342a12039572e04
   Device Display Name: TrueNAS Fibre Channel Disk (naa.6589cfc0000006809342a12039572e04)
   Storage Array Type: VMW_SATP_ALUA
   Storage Array Type Device Config: {implicit_support=on; explicit_support=off; explicit_allow=on; alua_followover=on; action_OnRetryErrors=off; {TPG_id=1,TPG_state=AO}}
   Path Selection Policy: VMW_PSP_RR
   Path Selection Policy Device Config: {policy=iops,iops=1000,bytes=10485760,useANO=0; lastPathIndex=1: NumIOsPending=0,numBytesPending=0}
   Path Selection Policy Device Custom Config: policy=iops;iops=1000;bytes=10485760;samplingCycles=16;latencyEvalTime=180000;useANO=0;
   Working Paths: vmhba3:C0:T1:L1, vmhba2:C0:T0:L1
   Is USB: false

As you can see the IOPS are set to 1000. We want to change IOPS to 1 for all of the FC LUNs. This can be done with one command. As you can see the LUN IDs for TureNAS all start with same string of numbers. We can pipe into grep to select only LUNs that have IDs that begin with naa.6589 and set IOPS to 1.

for i in `esxcfg-scsidevs -c |awk '{print $1}' | grep naa.6589`; do esxcli storage nmp psp roundrobin deviceconfig set --type=iops --iops=1 --device=$i; done

I can now check that the IOPS policy has been changed to 1.

esxcli storage nmp device list -d naa.6589cfc0000006809342a12039572e04
[root@R720-HOST-01:~] esxcli storage nmp device list -d naa.6589cfc0000006809342a12039572e04
naa.6589cfc0000006809342a12039572e04
   Device Display Name: TrueNAS Fibre Channel Disk (naa.6589cfc0000006809342a12039572e04)
   Storage Array Type: VMW_SATP_ALUA
   Storage Array Type Device Config: {implicit_support=on; explicit_support=off; explicit_allow=on; alua_followover=on; action_OnRetryErrors=off; {TPG_id=1,TPG_state=AO}}
   Path Selection Policy: VMW_PSP_RR
   Path Selection Policy Device Config: {policy=iops,iops=1,bytes=10485760,useANO=0; lastPathIndex=1: NumIOsPending=0,numBytesPending=0}
   Path Selection Policy Device Custom Config: policy=iops;iops=1;bytes=10485760;samplingCycles=16;latencyEvalTime=180000;useANO=0;
   Working Paths: vmhba3:C0:T1:L1, vmhba2:C0:T0:L1
   Is USB: false

There you go, it could not be any easier to achieve better storage performance. This is good for testing different settings but it’s not persistent across reboots and updates. To make the settings persistent you need to make a SATP claim rule.

Create A SATP Claim rule

You can list all of the available SATP claim rules with the below command to see if there is one listed for your array. ESXi does not ship with a claim rule for TrueNAS so we will have to create one.

esxcli storage nmp satp rule list

First things first, lets figure out if the device is managed by VMware’s native multipath plugin, the NMP, or is it managed by a third-party plugin, such as EMC’s PowerPath? I start with the esxcli storage nmp device list command. This not only confirms that the device is managed by NMP, but will also display the Storage Array Type Plugin (SATP) for path fail-over and the Path Selection Policy (PSP) for load balancing. Here is an example of this command I’m using the -d option to run it against one device to keep the output to a minimum.

esxcli storage core device list -d naa.6589cfc0000006809342a12039572e04
[root@R720-HOST-01:~]  esxcli storage core device list -d naa.6589cfc0000006809342a12039572e04
naa.6589cfc0000006809342a12039572e04
   Display Name: TrueNAS Fibre Channel Disk (naa.6589cfc0000006809342a12039572e04)
   Has Settable Display Name: true
   Size: 2915041
   Device Type: Direct-Access 
   Multipath Plugin: NMP
   Devfs Path: /vmfs/devices/disks/naa.6589cfc0000006809342a12039572e04
   Vendor: TrueNAS 
   Model: iSCSI Disk      
   Revision: 0123
   SCSI Level: 7
   Is Pseudo: false
   Status: on
   Is RDM Capable: true
   Is Local: false
   Is Removable: false
   Is SSD: true
   Is VVOL PE: false
   Is Offline: false
   Is Perennially Reserved: false
   Queue Full Sample Size: 0
   Queue Full Threshold: 0
   Thin Provisioning Status: yes
   Attached Filters: 
   VAAI Status: supported
   Other UIDs: vml.010001000039306231316333653366323630303100695343534920
   Is Shared Clusterwide: true
   Is SAS: false
   Is USB: false
   Is Boot Device: false
   Device Max Queue Depth: 61
   No of outstanding IOs with competing worlds: 32
   Drive Type: unknown
   RAID Level: unknown
   Number of Physical Drives: unknown
   Protection Enabled: false
   PI Activated: false
   PI Type: 0
   PI Protection Mask: NO PROTECTION
   Supported Guard Types: NO GUARD SUPPORT
   DIX Enabled: false
   DIX Guard Type: NO GUARD SUPPORT
   Emulated DIX/DIF Enabled: false

To configure shared storage for best practices on ESXi using a iSCIS or Fibre Channel storage array we have to create a SATP claim rule. A SATP claim rule simply describes a certain configuration (mainly around multi-pathing) for a specific set of devices. For the TrueNAS array, this rule consists of making sure devices are using Round Robin and an I/O operations limit of 1. You can set the same options via the command line and they will take affect immediately but they will not survive a host reboot, this is why you need to create a claim rule.

esxcli storage nmp satp rule add --satp "VMW_SATP_ALUA" --vendor "TrueNAS" --model "iSCSI Disk" --psp "VMW_PSP_RR" --psp-option "iops=1" --claim-option="tpgs_on" --description "TrueNAS iSCSI Claim Rule"

Now when you reboot a host you will have the correct settings and any new iSCSI or Fibre Channel data stores you add that are on a TrueNAS system will automatically have the correct settings applied.

Fore more esxcli commands read the vSphere Command-Line Interface Reference.

Leave a Comment