Home » Posts tagged '#Storage'

Tag Archives: #Storage

Storage Design with Flash Storage. One Big LUN vs multiple smaller LUNS.

Traditionally, when storage design was done for VMware Environments, a lot of criteria had to be considered. This included

  • Number of Drives
  • Speed of Drives
  • Number of IOPS per each drive
  • RAID penalty
  • Write Penalty
  • Read Penalty
  • Scalability of the Array

But with the advent of all flash arrays (XIO, Pure, Nimble, Violin etc etc) a lot of these parameter no longer constraint the storage design for VMware environments. Each of the AFA offerings have their own RAID kind of technology, which pretty much  guarantees a very high resiliency to failure and data loss. Also with the new kind of Flash drives introduced (eMLC from memory), the consumer level SSDs are no longer used in AFAs. So now that the physical limitations on the drives have been eradicated, lets look at the next steps.

Queue Depth: 

Queue depth is a very misleading constraint, there are queue depths at each level, LUN, Processor, Array. So each physical enitity(or not so physical for CNAs and LUN) has an individual queue depth. How do we address this short coming? If there is a lot of IO being thrown at the Array, if its not able to process it, the queue is going to fill up.

If the host parameters are not set properly, it will start to fill up the HBA queue depth across the multiple LUNs that it has access to. Some of these parameters can be changed to ensure that the ESXi vmkernel process does things differently when using AFAs.

I’ve previously mentioned some parameters that need to be changed for XtremIO. I guess the same would apply for all the AFAs out there. Using ESXi with AFA and not changing advanced parameters to take advantage of AFA is like, buying a Ferrari to drive in Melbourne CBD. It only proves that you are an idiot, restricted by the ‘speed limit’.

OK But what about LUNs:

Now to the original question, One Big LUN vs Multiple smaller LUNS. Each decision has its own advantages and disadvantages, for example, choosing one Big LUN can give cumulative IOPS available across multiple storage nodes in AFA. So if one node provides 250,000 IOPS (random workload 50% read), then adding another node to it will enhance it to 500,00 IOPS. That single node provides more IOPS than a fully scaled and filled VNX 7500. Thats a lot of horsepower if you ask me.

The same can be said for multiple smaller LUNS, each LUN created is spanned (atleast in XtremIO AFAIK) across all the available nodes in the cluster. So you would still get the benefit of insane amount of IOPS for each decision.

Other Considerations:

There are other considerations that you will need to take into account when designing storage for VMware. To start with, workload consideration is a good one. Depending on the workload thats consuming all of these resources, you might want to provide a single big LUN or the application architecture might force you to use multiple smaller LUNS. One of my customers’ SQL Team is convinced that even on AFA, the data and the log LUN have to be separated on ‘spindles’. I explained about the lack of spindles and the redundancy/resiliency/availability aspect of AFA. After a long discussion, it was agreed that there would still be multiple LUNs created but all of them on the same 2 XIO node array. Not across the other 2 x 2 node XIO arrays that are available.

What about DR/SRM:

DR/SRM strategy doesn’t need to change significantly for SRM. I have always believed in providing the optimal number of LUNs for SRM for a mixed workload. Some applications might require a separate LUN (for a vApp for example). While some are happy to co-exist. It also comes down to the application owners, some application owners are adamant that the workloads should be maintained seperately, while others are happy to co-exist on the same LUN as long as their RTO/RPO requirements are met.

So in short, the answer is ” IT DEPENDS“. But my vote goes to multiple medium sized LUNs (10-12TB) :). This will provide the advantages of both big and small LUNS.

Whats your say ?

I’d appreciate the comments about this in the blog rather than on twitter, but then again both are social media so doesn’t matter.

Things that need to be changed on vSphere for Xtreme IO

Xtreme IO is the newest and fastest (well EMC say so) All Flash Array in the market. I have been running this in my “lab” running a POC which is quickly turning into a major VDI refresh for one of the clients. Having run throug the basics of creating storage and monitoring alerting etc in my previous posts., I am going to concentrate on what parameters we need to change in the vSphere world to ensure we get the best performance from Xtreme IO.

The parameters also depend on what version of ESXi you’re using, as Xtreme IO supports ESXi 4.1 + .

Without further delay, lets start.

Adjusting the HBA Queue Depth

We are going to sending a lot more IO through to the Xtreme IO array than you would to the traditional hybrid array. So we need to ensure that the HBA queue depth is allowing a lot more IO requests through.
You can find out the module by using the command

Step 1: esxcli system module list | grep ql (or lpfc for emulex)

Once you find out the module that is being used. The command below can be used to change the HBA queue depth on the server.

Qlogic – esxcli system module parameters set -p ql2xmaxdepth=256 -m qla2xxx (or whatever is the module from the command in Step 1.)

Emulex – esxcli system module parameters set -p lpfc0_lun_queue_depth=256 -m lpfc820 ( or whatever is the module from the command in Step 1)

Multi Pathing

If you are not going to use Powerpath, since its an active active X number of controllers array (yeah, i know its got 2 controllers per disk shelf so as of today you can scale upto 6 disk shelves per cluster so 12 controllers), we will be using Round Robin if using NMP.

The engineers who work with Xtreme IO recommend that the default number of iops be changed from 1000 to 1, yes “ONE”. So essentially you are sending an IO request to each controller in the cluster. I haven’t really seen any improvement in the performance by doing so but it is only a recommendation at the end of the day. If you see that you are not going to achieve any significant performance by doing so, the onus is on you to make that decision.

First, lets get all the volumes that’ve been configured on Xtreme IO.

esxcli storage nmp path list | grep XtremeIO

this will give you the naa.id of all the volumes that are running on XtremeIO.

Now lets set the policy to RR for those volumes.

esxcli storage nmp device set — device <naa.id> -psp VMW_PSP_RR (5.x)
esxcli nmp device setpolicy — device <naa.id > –psp VMW_PSP_RR (4.1)

You can also set the default path selection policy for any storage in 5.x by identifying the SATP and modifying it with the command

esxcli storage nmp satp set –default-psp=VMW_PSP_RR —satp =<your_SATP_name>

To set the number of IOs to 1 in RR,

esxcli storage nmp psp roundrobin deviceconfig set -d <naa.id> –iops 1 –type iops (5.x)

esxcli nmp roundrobin setconfig –device=<naa.id> –iops=1 (4.1)

Of course if you dont want to go change all of this, you can still use Powerpath.

Host Parameters to Change

For best performance we also need to set a couple of disk parameters. You can do this via GUI or the easier way via CLI (preferred).

Using GUI, set the following parameters Disk.SchedNumReqOutstanding to 256 & Disk.SchedQuantum to 64

Note: If you have non Xtreme IO volumes on these hosts, they may lead to over stress on the controllers and cause performance degradation while communicating with them.

Using Command line in 4.1, set the parameters using

esxcfg-advcfg -s 64 /Disk /SchedQuantum
esxcfg-advcfg -s 256 /Disk /SchedNumReqOutstanding

to query that its been set correctly, use

esxcfg-advcfg -g /Disk /SchedQuantum
esxcfg-advcfg -g /Disk /SchedNumReqOutstanding

You should also change the Disk.DiskMaxIOSize from the default of 32767 to 4096. This is because XtremeIO reads and writes by default in 4k chunks and thats how it gets the awesome deduplication ratio.

In ESXi 5.0/5.1 you can set the SchedNumReqOutstanding by using

esxcli storage core device set -d <naa.id> -O 256

In vSphere 5.5 you can set this paramter on each volume individually instead of configuring on per host.

vCenter Server Parameters

Depending on the number of xBricks that are configured per cluster, the vCenter server parameter
config.vpxd.ResourceManager.maxCostPerHost needs to be changed. This adjusts the maximum number of full cloning operations.

One xBrick Cluster – 8 (default)
Two xBrick Cluster – 16
Four xBrick Cluster – 32

Thats the end of this post. Please feel free to correct me if I’ve got any commands wrong.


Recommendation (as per AusFestivus’ comment):  EMC recommend that PP be used for best performance. But it always comes down to the cost constraints and how much the client wants to spend.  In my opinion, PP is more like “nice to have for best performance without tinkering”. But if you can keep tinkering and changing things to get the best performance out, you can do without PP.


Check Out koodzo.com!