storage | CrashLoopBackoff

Cisco – EMC Webcast: … An Optimized End User Experience

No matter what you do to accelerate, optimize and transform your desktop environment (physical or virtual) if the presentation is sub-par, no one cares. The common message from any vSpecialist when it comes to EUC (End User Computing, VDI is so 2011) is focus on the end user experience. Make it easy to access my data and applications from anywhere at any time and I am a happy user.

This is something I really believe in. Having delivered VDI (or TS) solutions in the past, starting as a Citrix Metaframe XP administrator. So when I noticed this webcast I wanted to be sure share it with everyone. EMC is a huge place and there is ALWAYS something going on, but I wanted to take special notice when Cisco, EMC, VCE and VMware team up with a focus on getting the end user experience done right.

Save the date and sign up! August 22, 2012 11:00 AM EDT / 8:00 AM PDT.

So sign up now here: http://bit.ly/vdia22

What to expect?

When it comes to EUC there are so many “best” practices out there many times you just need someone to tell you what works. I will take a few seconds to detail the high level bullets I always share with customers when speaking about EUC.

From the EMC perspective it often relates to putting the right data in the right place. When using Flash drives to lower cost and footprint knowing how VDI I/O works is very important.

Also from the EMC realm is the amazing impact FAST Cache can have on these deployments vs. trying to account for all unexpected I/O with spinning media. This additionally lowers your cost and spindle count. That is right, someone at EMC saying buy less drives.
Use the money you save to put more RAM in your Cisco UCS B – series blades. Memory being the second bottle neck after storage when it comes to your VDI role out.
Speaking of memory make sure you use the best hypervisor for consolidation and memory management. vSphere 5 is still years ahead of even the promised products from the other guys. The TCO picture for hardware is ONLY part of the story, so make sure you get every last drop out of those Cisco UCS blades.
Lastly, if you want to deliver this in a tested and proven manner AND you realize your time to market is critical, EMC VSPEX and VCE Vblock take the world’s best components and software and make it work for you. No more testing for 9 months before pushing the go button.

Get to the WEBCAST Already

Once again, if you are exploring, testing, POC’ing, or running in production VDI in any way shape or form. Join the webcast on August 22 and see when EMC and Cisco have in store.

Save the date and sign up! August 22, 2012.

So sign up now here: http://bit.ly/vdia22

Extents vs Storage DRS

I was meeting with a customer today and had to stop for a second when they said they were using 10 TB datastores in vSphere 4.1.

At first I was going through my head of maybe NFS? No they are an all block shop. Oh wait yeah, extents. They were using 2 TB -512 byte luns to create a giant Datastore. I asked, why? The answer was simple, “so we only manage one datastore.”

I responded with well check out Storage DRS in vSphere 5! It gives you that one point to manage and automatic placement across multiple datastores. Additionally you actually can find which VM lives where, and use Storage Maintenance mode to do storage related maintenance. Right now they are locked into using extents. If they change their datastores into a Cluster the gain flexibility while not losing the ease of management.

I wanted to use the opportunity to list some information I think about Extents with VMware.

Extents do not equal bad. Just have the right reason to use them, and running out of space is not one.
If you lose one extent you don’t lose everything, unless that one is the first extent.
VMware places blocks on extents in some sort of even fashion. It is not spill and fill. While not really load balancing you don’t kill just one lun at a time.

An extent with a datastore is like a stack of luns. Don’t knock out the bottom block!

Some points about Storage DRS.

Storage DRS places VMDK’s based on IO and Space metrics.
Storage DRS and SRM 5 don’t play nice, last time I checked (2/13/12).
Combine Storage DRS with Storage Policy and you have a really easy way to place and manage VM’s on the storage. Just set the policy and check if it is compliant.

A Storage DRS cluster is multiple datastores appearing as one.

Some links on the topics:

Some more information from VMware on Extents
More on Storage DRS (SDRS)

In conclusion, SDRS may be removing some of the last reasons to use an extent (getting multiple lun performance with single point of management). Add that to being able to have up to 64 TB Datastores with VMFS and using extents will become even rarer than before. Unless you have another reason? Post it in the comments!

vSphere Metro Stretched Clusters – Some Info/Links

A lot of questions lately about vSphere Clusters across distance. I really need to learn for myself so I collected some good links.

Make sure you understand what “Only Non-uniform host access configuration is supported” means. Someone correct me if I have this wrong but your device that enables the distributed virtual storage needs to be sure that hosts in site A are writing to their preferred volumes in site A and vice versa in Site B. Probably way over simplifying it.

LINKS

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2007545

http://virtualgeek.typepad.com/virtual_geek/2011/10/new-vmware-hcl-category-vsphere-metro-stretched-cluster.html

http://www.yellow-bricks.com/2011/10/07/vsphere-metro-storage-cluster-solutions-what-is-supported-and-what-not/

http://www.yellow-bricks.com/2011/10/05/vsphere-5-0-ha-and-metro-stretched-cluster-solutions/

Big thanks to Scott Lowe for clearing the details on this topic.

Storage Caching vs Tiering Part 2

Recently I had the privilege of being a Tech Field Day Delegate. Tech Field Day is organized by Gestalt IT. If you want more detail on Tech Field Day visit right here. In interest of full disclosure the vendors we visit sponsor the event. The delegates are under no obligation to review good or bad the sponsoring companies.

After jumping in with a post last week on tierless caching I wanted to jump in with my thoughts on a second Tech Field Day vendor. Avere presented a very interesting and technical presentation. I appreciated being engaged on an engineering level and not a marketing pitch.

Avere tiers everything. It is essentially a scale out NAS solution (they called it a FXT Appliance) that can front end any existing NFS. Described to me by someone else as file acceleration. The Avere NAS stores data internally on a cluster of NAS units. The “paranoia meter” lets you set how often the mass storage device is updated. If you need more availability or speed you add Avere devices. If you need more disk space you add to your mass storage. In their benchmarking tests they basically used some drives connected to a CentOS machine running NFS front-ended by Avere’s NAS units. They were able to get the required IOPS at a fraction of the cost of NetApp or EMC.

The Avere Systems blog provides some good questions on Tiering.

The really good part of the presentation is how they write between the tiers. Everything is optimized for that particular type of media, SSD, SAS or SATA.
When I asked about NetApp’s statements about tiering (funny they were on the same day). Ron Bianchini responded, “that when you sell hammers, everything is a nail.” I believe him.

So how do we move past all the marketing speak to get down to the truth when it comes to Caching and Tiering. I am leaning toward thinking of any location where data lives for any period of time as a tier. I think a cache is a tier. Really fast cache for reads and writes is for sure a tier. Different kinds of disks are tiers. So I would say everyone has tiers. The value comes in when the storage vendor innovates and automates the movement and management of that data.

My questions/comments about Avere.

1. Slick technology. I would like to see it work in the enterprise over time. People might be scared because it is not one of the “big names”.
2. Having came from Spinnaker. Is the plan to go long term with Avere, or build something to be purchased by a big guy?
3. I would like to see how the methods used by the Avere FXT appliance can be applied to block storage. Plenty of slow inexpensive iSCSI products that would benefit from a device like this on the front end.

Storage Caching vs Tiering Part 1

The first place hosting the delegates was NetApp. I basically have worked with several different storage vendors but I must admit I have never experienced NetApp in any way before. Except for Storage vMotioning Virtual Machines from an old NetApp (I don’t even know the model) to a new SAN.

Among the 4 hours of slide shows I learned a ton. One great topic is Storage Caching vs Tiering. Some of the delegates have already blogged about the sessions here and here.

So I am going to give my super quick summary of Caching as I understood it from the NetApp session. Followed by a post about Tiering as I learned from one of our subsequent sessions from Avere.

1. Caching is superior to Tiering because Tiering requires too much management.
2. Caching outperforms tiering.
3. Tiering drives cost up.

The NetApp method is to use really quick Flash Memory to speed up the performance of the SAN. Their software attempts to predict what data will be read and keep that data available in the cache. This “front-ends” a giant pool of SATA drives. The cache cards provide the performance the the SATA drives provide a single large pool to manage. With a simplified management model and using just one type of big disk the cost is driven down.

My Take Away in Tierless-Caching

This is a solution that has a place and would work well for many situations. This is not the only solution. All in all the presentation was very good. The comparisons against tiering were really setup against a “straw-man”. A multi-device tiered solution requiring manual management off all the different storage tiers is of course a really hard solution. It could cost more to obtain and could be more expensive to manage. I asked about fully virtual automated tiering solutions. Solutions that manage your “tiers” as one big pool. These solutions would seem to solve the problem of managing tiers of disks, keeping the cost down. The question was somewhat deflected because these solutions will move data on a schedule. “How can I know when to move my data up to the top tier?” was the question posed by NetApp. Of course this is not exactly how a fully-automated tiering SAN works, but is a valid concern.

My Questions for the Smartguys:

1. How can the NetApp caching software choices be better/worse than software that makes tiering decisions from companies that have done this for several years?
2. If tiering is so bad, why does Compellent’s stock continue to rise in anticipation of an acquisition from someone big?
3. Would I really want to pay NetApp sized money to send my backups to a NetApp pool of SATA disks? Would I be better off with a more affordable SATA solution for Backup to Disk even if I have to spend slightly more time managing the device?

Equallogic, VAAI and the Fear of Queues

Previously I posted on how using bigger VMFS volumes helps Equallogic reduce their scalability issues when it comes to total iSCSI connections. There was a comment about does this mean we can have a new best practice for VMFS size. I quickly said, “Yeah, make em big or go home.” I didn’t really say that but something like it. Since the commenter responded with a long response from Equallogic saying VAAI only fixes SCSI locks all the other issues with bigger datastores still remain. ALL the other issues being “Queue Depth.”

Here is my order of potential IO problems on with VMware on Equallogic:

Being spindle bound. You have an awesome virtualized array that will send IO to every disk in the pool or group. Unlike some others you can take advantage of a lot of spindles. Even then, depending on the types of disks some IO workloads are going to use up all your potential IO.
Solution(s): More spindles is always a good solution if you have unlimited budget. Not always practical. Put some planning into your deployment. Don’t just buy 17TB of SATA. Get some faster disk and break your Group into pools and separate the workloads into something better suited to the IO needs.
Connection Limits. The next problem you will run into if you are not having IO problems is the total iSCSI connections. In an attempt to get all of the IO you can from your array you have multiple vmk ports using MPIO. This multiplies the connections very quickly. When you reach the limit, connections drop and bad things happen.
Solution: The new 5.02 firmware increases the total maximum connections. Additionally, bigger datastores means less connections. Do the math.
Queue Depth. There are queues everywhere, the SAN ports have queues. Each LUN has a queue. The HBA has a queue. I would need to defer to a this article by Frank Denneman (a much smarter guy than myself.) That balanced storage design is best course of action.
Solution(s): Refer to problem 1. Properly designed storage is going to give you the best solution for any potential (even though unlikely) queue problems. In your great storage design, make room for monitoring. Equallogic gives you SANHQ USE IT!!! See how your front end queues are doing on all your ports. Use ESXTOP or RESXTOP to see how the queues look on the ESX host. Most of us will find that queues are not a problem when problem one is properly taken care of. If you still have a queuing problem then go ahead and make a new datastore. I would also request Equallogic (and others) release a Path Selection Policy plugin that uses a Least Queue Depth algorithm (or something smarter). That would help a lot.

So I will repeat my earlier statement that VAAI allows you to make bigger datastores and house more VM’s per store. I will add a caveat, if you have a particular application that needs a high IO workload, give it a datastore.

The Fun Stuff at VMworld 2010

Much of my planned activities for the blog didn’t work out this year. Not too much in the sessions or keynotes that was worth a blog post yet. Expect some View 4.5 and vCloud Directory posts once I can get it in the lab. Probably the most useful parts of VMworld were the discussions at the Thristy Bear, Bloggers Lounge, Chieftain, over breakfast or dinner among many other places. There was a great turn out for the In-n-out trip noting that it took around 30 minutes on public transportation to get there. This post is sharing some of the few experiences* I had and the couple of pictures I thought to make while in San Francisco. I met a lot more people than last year. I couldn’t even begin to name them all off but it was a great time hanging out with all of you enjoying a few drinks and talking Virtualization and Storage and other topics.

This is the hall in our hotel. I kept seeing these twin girls at the end of the hall. It was scary.

Here is proof of my In-n-Out take down. Double Double and Fries welldone. Several people showed up. I hope everyone enjoyed it. I do not think any In-n-Out vs Five Guys battles were decided though.

I hung off the of the Cable Car all the way back to Powell and Market. Jase McCarty @jasemccarty and Josh Leibster @vmsupergenius

The view from the top of the hill and the front of the Cable Car. The picture does not do justice to how steep the hill is.

Random shot at the Veeam party.

A couple of VMware R&D Managers I met at the CTO gathering before the VMware party. Steve Herrod hosted a party that included a great mix of vExperts and some of the thought leaders at VMware. Great chance to meet some people, @kendrickcoleman beat me down in wii bowling though. I will be practicing until next year.

Proof that I at least made it to the door of the CTO party, by Wednesday I had a pretty good collection of flare on my badge. TGI Fridays made me an offer but I didn’t want to move my family back to the West Coast.

A less fuzzy picture with Rich Brambley @rbrambley and Rick Vanover @rickvanover. I am honored to just hold the sign for these guys.

The Veeam party got bit crazy when 17 Princess Leia’s showed up.

The EMC vSpecialists roll up on VMworld 2010, there was at least 4000 more people at VMworld than last year. 3500 of them were from EMC. Actually found out they were real guys (and girls) and were really cool. Really good conversations about virtualization were had with many of these guys. If you haven’t seen it yet Nick Weaver @lynxbat and other vSpecialist put together a pretty good rap video. Check it out here

*in the event I did not have actual pictures of the event artistic liberties were taken.

Storage IO Control and An Idea

After being out of town for almost all of July, I am finally getting to make a run at vSphere 4.1. I am throwing different features at our lab environment and seeing what they do. I don’t think I would be writing anything new in saying vMotion and Storage vMotion is faster. Clones and deploying from a template is faster (VAAI). I decided to take a peak at the Resource Allocation for IOps per VM. Nothing you do not already know, you can now assign shares and limits to the Disk IO. Useful if you need certain machines to never take too much IO and cause storage latency. This only kicks in when the latency threshold is exceeded.

My wacky ideas usually come from the idea of resource pools, shares and limits are cool but I don’t want them used all the time. So why don’t I apply the limits or shares dynamically based on a certain time or expected workload. Lets say my third party backup software runs at 8pm, and that software is on a VM. At 7:59 I could lower all the shares of all the vm’s and raise the disk shares of my backup server. This prevents rogue dba from killing your backup window with a query or stored procedure that is heavy in the disk. Even deeper if I could return the shares to each VM as the backup software finishes backing up all the vm’s on that datastore. I wonder if this will actually shorten backup windows or just make the dba’s mad. Either way you win. 🙂

While clearing up my understanding on the issue of SIOC William Lam sent me to these two scripts (very useful):
http://www.virtuallyghetto.com/2010/07/script-configure-vm-disk-shares.html
http://www.virtuallyghetto.com/2010/07/script-automate-storage-io-control-in.html

Adaptive Queuing in ESX

While troubleshooting another issue a week or two ago I came across this VMware knowledge base article. Having spent most of the time with other brand arrays in the past, I thought this was a pretty cool solution verses just increasing the queue length of the HBA. I would recommend setting this on your 3par BEFORE you get QFULL problems. Additionally, Netapp has an implementation of this as well.

Be sure to read the note at the bottom especially:

If hosts running operating systems other than ESX are connected to array ports that are being accessed by ESX hosts, while the latter are configured to use the adaptive algorithm, make sure those operating systems use an adaptive queue depth algorithm as well or isolate them on different ports on the storage array.

I do need to dig deeper how this affects performance as the queue begins to fill, not sure if one method is better than another. Is this the new direction that many Storage Vendors will follow?

Until then, the best advice is to do what your storage vendor recommends, especially if they say it is critical.

Here is a quick run through for you.

In the vSphere Client

Select the ESX host and go to the configuration tab and click on the Advanced Settings under Software.

In the Advanced Settings

Select the option for Disk and scroll down to the QFullSampleSize and QFullThreshold.
Change the values to the 3par recommended values:
QFullSampleSize = 32
QFullThreshold = 4

Random Half Thoughts While Driving

So I often have epiphany teasers while driving long distances or stuck in traffic. I call them teasers because they are never fully developed ideas and often disappear into thoughts about passing cars, or yelling at the person on their cell phone going 15 MPH taking up 2 lanes.

Here is some I was able to save today (VMware related):

1. What if I DID want an HA cluster to be split in two different locations, Why?
2. Why must we over-subscribe iSCSI vmkernel ports to make the best use of the 1gbe phyical nics. Is it a just the software iSCSI in vSphere? Is just something that happens with IP storage? I should test that sometime…
3. If I had 10 GB nics I wouldn’t use them on Service Console or Vmotion that would be a waste. No wait, VMotion ports could use it to speed up your VMotions.
4. Why do people use VLAN 1 for their production servers? Didnt’ their Momma teach em?
5. People shouldn’t fear using extents, they are not that bad. No, maybe they are. Nah, I bet they are fine, how often does just 1 lun go down. What are the chances of it being the first lun in your extent? Ok maybe it happens a bunch. I am too scared to try it today.