vmware | CrashLoopBackoff

B.Y.O.P – The Alternative Vblock

In college I often would be invited to a get together that could often include the letters BYOB, Bring Your Own Beer. Sometimes a cookout would be BYOM, Bring Your Own Meat (or meat alternative for the vegetarians). So today I want to leverage this to push my new acronym B.Y.O.P. Bring Your Own Pod. Lately I have been seeing people talk about Vblocks. If I can venture a succinct definition a Vblock is a pre-configured set of Cisco, EMC and VMware products tested by super smart people, approved by these people to work together, then supported by these organizations as a single entity. Your reseller/solutions provider really should already be doing this very thing for you. You may choose to buy just the network piece, or the hypervisor but your partner should be able to verify a solution to work from end to end and provide unified support.

So You can’t call it BYOPCVCEP

Why not Vblock? This might get me blacklisted by the Elders of the vDiva council, but VCE doesn’t exist to make your life in the datacenter easier, they exist to sell you more VMware, Cisco and EMC. Vblock for sure simplifies your buying experience. I believe they are all great products and may very well do just what you need. Without competition though the only winner is VCE. Do not by forced into a box by the giant vendors. Find someone that can help determine your end goal, provide you vendor neutral analysis of the building blocks needed to achieve your end goal. Then provide the correct vendors and unified support to Build Your Own Pod.

So What is the Alternative Vblock

Originally I was going to draw up a sweet solution of 3par, Xsigo and Dell R610’s and say, “Hey everyone! This is some cool stuff. Try to quiet the overwhelmingly loud voice calling from VCE and give this Alternative Vblock a try.” As I thought more and more about it I think doing that is contrary to my main point. I would like more to provide the discussion points or some possible products among others that can be used to Build Your Own Pod. I am a firm believer in getting what is right for your datacenter needs. So here is a few links to help begin the discussion.

Xsigo and Pod – Jon Toor
3par and iBlocks – Marc Farley

You might be a vDiva if…

I am avoiding a post where I have to think really hard about a topic. That makes me procrastinate and come up with even crazier ideas. I am writing this one down now. Most of these apply to me so if you are offended by any of them you are probably a vDiva.

You might be a vDiva if…

… you roll your eyes when someone talks about installing a PHYSICAL server.

… you are on twitter to see how many people you can get to look at your blog, but you never stoop so low to interact with the common folk.

… you are surprised when the guy at the table at the VMUG doesn’t know who you are.

… you constantly check your Google Analytics account to see how many views you have. (guilty)

… you refer to yourself as @… (your twitter account)

… you hunt down @jtroyer if you latest post takes too long to get on the v12n board.

… your require a signed rider agreement with your speaking topic for VMworld, saying you need 800 green M&M’s, a copy of Lord of the Rings in your hotel room, and direct phone access to Steve Herrod’s iPhone.

I probably ticked a bunch of people off. I am just having fun. Have a great day! Go ahead and add your own in the comments.

Operational Readiness

One thing I am thinking about due to the VCDX application is operational readiness. What does it mean to pronounce this project or solution good-to-go? In my world it would be to test that each feature does exactly what it should be doing. Most commonly this will be failover testing, but could reach into any feature or be as big as DR plan that involves much more than the technical parts doing what they should. Some things I think need to be checked:

Resources

Are the CPU, Memory, Network and Storage doing what they should be? Some load generating programs like IOmeter can be fine to test network and storage performance. CPU busy programs can verify Resource Pools and DRS are behaving the way they should.

Failover

You have redundant links right? Start pulling cables. Do the links failover for Virtual Machines, Service Console, and iSCSI? How about the redundancy of the physical network, even more cable to pull! Also test that the storage controllers failover correctly. Also, I will make sure HA does what it is supposed to, instantly power off a host and make sure some test virtual machines start up somewhere else on the cluster.

Virtual Center Operations

Deploy new virtual machines, host and storage VMotion, deploy from a template, and clone a vm are all things we need to make sure are working. If this is a big enough deployment make sure the customer can use the deployment appliance if you are making use of one. Make sure the alarms send traps and emails too.

Storage Operations

Create new luns, test replication, test storage alarms and make sure the customer understands thin provisioning if it is in use. Make sure you are getting IO as designed from the Storage side. Making use of the SAN tools to be sure the storage is doing what it should.

Applications

You can verify that each application is working as intended within the virtual environment.

There must be something I am missing but the point is trying to test out everything so you can tell that this virtualization solution is ready to be used.

Using Network Load Balancing with View

If you have a smaller View deployment but still want to have redundant connection servers look no further than Microsoft NLB. Solve this problem without the need for an expensive hardware loadbalancer. Will it have all of the bells and whistles? No. If you have less than a 1000 users you probably would not see the benefit of the advanced features in a hardware load balancer. Make sure to read the whitepaper from VMware about NLB in Virtual Machines.

I am making the assumption you are like me and want everything to be as virtual as possible. So the View Connection Manager servers will be VM’s

Setup the primary and replica View Servers

I won’t go over installing View. Just be sure to setup the initial manager server. Then go ahead and setup the replica VM.

Configure NLB

Go the the Administrative tools and open the Network Load Balancing Manager. Right click the top node in the tree menu on the left and select New Cluster.
Set the IP and other information you will used for the Load Balanced cluster. This is a new IP not used by your View Manager servers.
In the VMware document referenced above VMware recommends setting the Cluster operation mode to Multicast.
Click Next then next again. When asked to configure port rules I leave it on the default and click next. You can chose limit this to certain ports.

Click Next again and enter localhost in the wizard to configure the local interfaces for NLB. Click next and make sure to note the priority. When setting up the replica server this number needs to be different. Finally click finish and wait for the configuration to finish. You should now be able to ping your new cluster IP address.

Setup the Replica Server in the Load Balancer

Righ Click the node in the tree menu for the NLB Cluster you just created and select Add new host to cluster. Enter the IP for the Replica Server and click connect. Select the interface that will be used for the Load Balancing and click next. Make sure the Priority is unique from the first server. If it gives you any grief after this point close and re-open the Network Load balancing Manager. The working cluster should look like this:

Test the Failover

Start a continual ping to the cluster IP. Now use the vSphere Client to disconnect the network from one of the servers. Watch the pings continue to come back.

Finally, create a DNS A record (something like desktop.yourdomain.com) and point it to the cluster IP. You now have some decent failover in case of a VM failure and even a host failure (suggestion would be to use seperate hosts for the VM’s).

Note – You may need to add static ARP entries into your switching depending on your network topology. Be sure to test this fully and consult your network manufacturer’s documention for help with static ARP.

Adaptive Queuing in ESX

While troubleshooting another issue a week or two ago I came across this VMware knowledge base article. Having spent most of the time with other brand arrays in the past, I thought this was a pretty cool solution verses just increasing the queue length of the HBA. I would recommend setting this on your 3par BEFORE you get QFULL problems. Additionally, Netapp has an implementation of this as well.

Be sure to read the note at the bottom especially:

If hosts running operating systems other than ESX are connected to array ports that are being accessed by ESX hosts, while the latter are configured to use the adaptive algorithm, make sure those operating systems use an adaptive queue depth algorithm as well or isolate them on different ports on the storage array.

I do need to dig deeper how this affects performance as the queue begins to fill, not sure if one method is better than another. Is this the new direction that many Storage Vendors will follow?

Until then, the best advice is to do what your storage vendor recommends, especially if they say it is critical.

Here is a quick run through for you.

In the vSphere Client

Select the ESX host and go to the configuration tab and click on the Advanced Settings under Software.

In the Advanced Settings

Select the option for Disk and scroll down to the QFullSampleSize and QFullThreshold.
Change the values to the 3par recommended values:
QFullSampleSize = 32
QFullThreshold = 4

Ask Good Questions

This happened a long time ago. I arrived at a customer site to install View Desktop Manager (may have been version 2). This was before any cool VDI sizing tools like Liquidware Labs. I am installing ESX and VDM I casually ask, “What apps will you be running on this install?” The answer was, “Oh, web apps like youtube, flash and some shockwave stuff.” I thought “ah dang” in my best Mater voice. This was a case of two different organizations thinking someone else had gathered the proper information. Important details sometimes fall through the cracks. Since that day, I try to at least uncover most of this stuff before I show up on site.

Even though we have great assessment tools now, remember to ask some questions and get to know what is your customers end goal.

Things I learned that day. As related to VDI.

1. Know what your client is doing, “What apps are you going to use?”

2. Know where your client wants to do that thing from, “So, what kind of connection do you have to that remote office with 100+ users?”

This is not the full list of questions I would ask, just some I learned along the way.

VMware View and Xsigo

*Disclaimer – I work for a Xsigo and VMware partner.

I was in the VMware View Design and Best practices class a couple weeks ago. Much of the class is built on the VMware View Reference Architecture. The picture below is from that PDF.

It really struck me how many IO connections (Network or Storage) it would take to run this POD. Minimum (in my opinion) would be 6 cables per host with ten 8 host clusters that is 480 cables! Let’s say that 160 of those are 4 gb Fiberchannel and the other 320 are 1 gb ethernet. The is 640 gb for storage and 320 for network.

Xsigo currently uses 20 gb infiniband and best practice would be to use 2 cards per server. The same 80 servers in the above cluster would have 3200 gb of bandwidth available. Add in the flexibility and ease of management you get using virtual IO. The cost savings in the number director class fiber switches and datacenter switches you no longer need and the ROI I would think the pays for the Xsigo Directors. I don’t deal with pricing so this is pure contemplation. So I will stick with the technical benefits. Being in the datacenter I like any solution that makes provisioning servers easier, takes less cabling, and gives me unbelievable bandwidth.

So just in the way VMware changed the way we think about the datacenter. Virtual IO will once again change how we deal with our deployments.

VMware Knowledge Base Videos

I just saw this on twitter and thought it was so cool I would post it.

The knowledge base team added some video KB entries.

Here is one: Restarting the Management Agents on ESX

Not everyone can just read a KB article and know right what to do. So this is a cool move by the KB team.

ESX Commands – esxcfg-nas

esxcfg-nas

Standard use of this command is to add or list your NFS mounts.

List: esxcfg-nas –l

Add: esxcfg-nas –a –o <host> -s <share> <name>

Not much more I can say. A little more detail here:

http://b2v.co.uk/b2vguide2vmware3.htm

A thread in the communities about a problem someone had where the nfsclient wasn’t loaded:
http://communities.vmware.com/message/864559

So go out and add some NFS datastores!

ESX Commands – esxcfg-mpath

It has been almost 1 year since I started looking at the esxcfg-* commands. It initially came as a look at the first part of the Enterprise Administration Exam’s Blueprint very first bullet point. In that post I talked about using esxcfg-mpath to identify which luns are fiber, iSCSI, NFS or local.

Today lets look a little bit deeper at the command and how it can be used to help you day to day.

We can always use esxcfg-mpath -l to list all of the luns and their paths. A good thing to check here is that you have the same number of paths to each datastore that comes from the SAN. You may have a zoning issue if a certain lun can only be seen from 1 path rather than all of them. In general each hba will see the lun through each controller in an active active type fiber channel SAN. So hba A should see the lun from Controller A and B. Likewise, hba B should see Controller A and B for a total of 4 paths. If you are using fixed or MRU as the pathing policy only one will be active but esxcfg-mpath -l will show four paths.

Of course if you have more hbas and controllers you will have more paths.

You can follow the examples given by the esxcfg-mpath -h to get help. One useful tool is to create a crude script using esxcfg-mpath –policy with the –lun tag to change the policy from say MRU to Fixed. I am not a perl scripter and I sure someone already has a real shell or perl script to set the policy but I do like to prepare multiple command in notepad then paste them into the cli.

esxcfg-mpath –policy=fixed –lun=vmhba0:0:1
esxcfg-mpath –policy=fixed –lun=vmhba0:0:2
and so on…

Then try to feel really smart by alternating the paths so every fourth lun will use different paths.

esxcfg-mpath –path=vmhba1:0:1 –lun=vmhba0:0:1 –state=on
esxcfg-mpath –path=vmhba2:0:1 –lun=vmhba0:0:2 –state=on
esxcfg-mpath –preferred –path=vmhba1:0:1 –lun=vmhba0:0:1
esxcfg-mpath –preferred –path=vmhba2:0:1 –lun=vmhba0:0:2

The first two lines set the path for the respective luns to different hbas. The last two lines set the preferred path to that same port on the hba. So when there is failover the path will fail back to your set config when all is well.