AutoDeploy and VXLAN

UPDATE!!: So another awesome engineer that i work with has found a solution to this.

“as long as you let VSM create the vmknic the first time, and then preserve that MAC address (in the answer file of the Host-Profile), you’re good.  (Assuming you’ve added the vxlan vib to your
image)”  He also mentions “I think one of the keys is to make sure VSM is showing the correct IP in Datacenters->Network Virtualization->Preparation->Connectivity BEFORE attempting host reboots.”

Thank you Jason

Original Post:

So working with the same technician that found the UCS and PXE bug, he found another one this time relating to VXLAN and Auto-Deploy. (Thanks Zach & Eric)

First, a little background.

The customer is doing a full-scale vCloud Enterprise Suite deployment.  They wanted to utilize VXLAN and fully stateless hosts.

So normally when the cluster gets prepared, vShield Manager (now called vCloud Networking and Security. I’m sure the name will change tomorrow) creates a vmknic on the hosts that is used for the VXLAN transportation.

Now we have Auto-Deploy which muddies everything up…. The process  that should occur is this;

First boot a new host and configure it as needed.  Then prepare the cluster through vSM/VCNS.  This adds a vmknic to the host for the VXLAN transport.  Then you create a host profile from it.  Then as you add more hosts you update their answer file for those hosts.  Then reboot and all is happy…

Well here is the rub, upon reboot the vSM/VCNS prep happens before the host-profile is applied.  So when the host-profile gets applied the, just created, vmknic is removed and re-added. Sadly, there is no way to just add the IP address to this vmknic through the host-profile, it’s an all or nothing affair.  What sucks is when doing the Host-Profile remediation the vmknic isn’t just modified, it’s actually deleted and re-created with the appropriate settings.  I’m sure this is to simplify code.  Anyway, now this new vmknic is created with the correct settings, but vSM/VCNS doesn’t know about it because some identifier has changed….doh!!

There is currently no fix for this Order of Operations issue…  This will be fixed when VCNS gets updated to 5.5 though.  So for now it’s VXLAN or Auto-Deploy.

 

iSCSI Boot with ESXi 5.0 & UCS Blades

UPDATE:: The issue was the NIC/HBA Placement Policy.  The customer had set a policy to have the HBA’s first, then the iSCSI Overlay NIC, then the remaining NICs.  When we moved the iSCSI NIC to the bottom of the list, the ESXi 5.0 installer worked just fine.  I’m not 100% sure why this fix is actually working, but either way it works.

So at a recent customers site i was trying to configure iSCSI Booting of ESXi 5.0 on a UCS Blade, B230 M2.  To make a long story short it doesn’t fully work and isn’t offically supported by Cisco.  In fact, NO blade models are supported for ESXi 5.0 & iSCSI boot by Cisco.  They claim a fix is on the way, and i will post an update when there is a fix.

Here is the exact issue, and my orgianal thoughts, in case it helps anybody;

We got an error installing ESXi 5 to a Netapp LUN.  Got an error “Expecting 2 bootbanks, found 0” at 90% of the install of ESXi. The blade is a B230 M2.

The LUN is seen in BIOS as well as by the ESXi 5 installer.  I even verified the “Details” option, and all the information is correct.

Doing an Alt-F12 during the install and watching the logs more closely today, at ~90% it appears to be unloading a module, that appears by its’ name, to be some sort of vmware tools type package.  As SOON as it does that the installer claims that there is no IP address on the iSCSI NIC and begins to look for DHCP.  The issue is during the configuration of the Service Profile and the iSCSI NIC, at no time did we choose DHCP, we choose static. (We even have tried Pooled)  Since there is no DHCP Server in that subnet it doesn’t pickup an address and thus loses connectivity to the LUN.

So we rebooted the blade after the error, and ESXi5 actually loads with no errors.  The odd thing is that the root password that’s specified isn’t set, it’s blank like ESXi 4.x was.

So an interesting question is what’s happening during that last 10% of the installation of ESXi 5??  Since it boots cleanly, it almost seems like it does a sort of “sysprep” of the OS, ie all the configuration details.  If that’s the only issue then it might technically be ok.  However I don’t get the “warm and fuzzies”.  My concern would be that, maybe not today but down the road some module that wasn’t loaded correctly will come back to bite the client.

Also, what is happening in that last 10% that’s different then ESXi 4.x??  We were able to load 4.1 just fine with no errors.

Again we called Cisco TAC and we were told that ESXi 5 iSCSI booting wasn’t supported on any blade.  They do support 4.1 as well as Windows, and a variety of Linux Distos.

Configuring iSCSI boot on a FlexPod

Here is a nice document to follow to configure iSCSI booting for a FlexPod, ie. UCS Blades, NetApp array & ESXi.

UPDATE: This document has the fix i found for ESXi 5.0.  This was tested on B230 M2’s and seems to work every time.

This document will be updated as i get new information.

FlexPod iSCSI Boot-Fixed

Enabling Jumbo Frames in a Flexpod Environment

Update: I have fixed the 5548 section i was missing the last two lines.

This post will help the user enable Jumbo frames on their Flexpod environment. This document will also work for just about any UCS-based environment, however you will have to check on how to enable Jumbo Frames for their storage array.

This post assumes a few things;

Environment is running 5548 Nexus switches
User needs to setup Jumbo-Frames on the NetApp for NFS/CIFS Shares
Netapp has VIF or MMVIF connections for said NFS/CIFS connections.

Cisco UCS Configuration 

-Login to the UCSM, Click on the LAN Tab.
-Expand LANs, & LAN Cloud.
-Click on the QoS System Class, Change the “Best-Effort” MTU to 9216. 

NOTE: You need to just type in the number, it’s not one of the ones that can be selected in the drop-down.

Expand the Policies section on the LAN Tab.  Right-Click on the QoS Polices and click “Create new QoS Policy”.  Call it “Jumbo-Frames” or something similar.
-On the vNIC Template or actual vNIC on the Service Profile, set the “QoS Policy” to the new Policy.

 ESX/ESXi Configuration

-Either SSH or Console into the ESX host.  If your using ESXi you’ll need to ensure local or remote tech support mode is enabled.
-We need to set the vSwitch that the Jumbo-Framed NICs will be on to allow Jumbo-Frames.
          Type esxcfg-vswitch –l   find the vSwitch we need to modify.
          Type esxcfg-vswitch –m 9000 vSwitch# (Replace # with the actual number)
          Type esxcfg-vswitch –l   you should now see the MTU to 9000

-We now need to set the actual VMKernel NICs.

          Type esxcfg-vmknic –l  find the vmk’s that we need to modify
          Type esxcfg-vmknic –m 9000 <portgroup name> (this is the portgroup that the vmk is part of)
          Type esxcfg-vmknic –l   verify that the MTU is now 9000 

Note: If your using dvSwitches, you can set the MTU size through the VI-Client.

5548 Configuration 

Login to the 5548 switch on the “A” side.
-Type the following;

system jumbomtu 9216
policy-map type network-qos jumbo
class type network-qos class-default
mtu 9216
multi-cast-optimize
exit
system qos
service-policy type network-qos jumbo
exit
copy run start

-Repeat on the “B” Side 

NetApp Configuration 

-Login to the Filer.
-Type ifconfig –a  verify which ports we need to make run jumbo frames.
 -Type ifconfig <VIF_NAME> mtusize 9000 

NOTE: You need to make sure you enable jumbo-frames not only on the VLAN’d VIF but also the “root” VIF.

Boot from USB drive in VMware Workstation

Its quite annoying that you can’t boot from a USB Drive in VMware Workstation.  So here’s a simple workaround.

1. Download PloP boot manager http://www.plop.at/en/bootmanager.html#download

2.  Extract the .ZIP

3. Attach the .ISO from the extracted .ZIP to your VM that you want to USB boot.

4. Making sure the USB stick is inserted into your PC, and attached to the VM.

5.  When the PloP boot manager comes up, select “USB”.

Enjoy booting whatever you have on your USB drive.

VMware Workstation USB Issues

So i’m using VMware Workstation 7.1 to run Ubuntu under my standard Windows 7 desktop.  I’m trying to build ChromeOS to play around with, and i can’t get my USB stick to be seen on my VM.

As it turns out the VMware USB Arbitration Service on my host isn’t started, and in fact wont start.  Turns out there is some USB filter driver causing the issue.  I’m willing to bet it’s part of the driver for my new USB 3.0 Motherboard.

Anyway here is the simple fix.

Shut down Workstation.

Open the registry (Start > Run > regedit).

Browse to HKEY_LOCAL_MACHINESYSTEMCurrentControlSetServiceshcmon.

Create a new key called Parameters.

In Parameters, create a new DWORD value entry named DisableDriverCheck, and then set the value to 1.

This works great and i can now pass USB to my VM.

Enjoy

Time settings/issues on ESXi 4.1

Here are some short and sweet items that i discovered yesterday;

Interestingly, ESXi does not allow you to change the timezone, it is permanently set to UTC.

Also, If you setup an NTP server on your ESXi hosts and that NTP server goes away for some reason, the ESXi host will not revert to using its’ own clock or even continuing to make a valient effort of keeping time, it instead reverts to January 1, 0001, to say this creates some issues is simplifying it.  The ESX hosts complain about not being able to “synchronize”, which is the first clue you get about the issue.  When you try and manually set the date through the VI-Client, you get a bunch of errors when you try and do anything and then the VI-Client froze.  The only option i found was to get that NTP server online.

Note: It may have been possible to do it via powershell or cli commands, however i had needed to get the NTP servers online anyway, and once this occurred, the ESXi servers re-synced and was able to respond.