Custom UCS/FlexPod Build Script

 UPDATE: Working with some of our internal guys, its come to my attention that some of the script has broken with the newer UCSM versions.  I will be updating this to be more “adaptable”, however use the script for ideas and feel free to kang any code from it for now.


 

So i started working on developing a Powershell script that will grab variables from an Excel sheet and create a UCS Build off of that.

I am at a point that the build actually works quite well now. I’m pretty proud of myself since i’m NOT a deep Powershell guy. This came about from looking at other UCS Powershell scripts and a lot of tweaking and testing.

Anyway this script will continue to grow and its functionality expand. My end goal is to be able to do a base FlexPod build by scripting, including UCS, Nexus Switches, Netapp and VMware.

It will take a lot of time, and i may never really use the script but its more of a pet project to not only see if i can do it, but also grow my Powershell skillset.

Here is the github if you’d like to follow/assist or download and play with it a bit.

https://github.com/cknic/UCS_Build

Updates

So I’ve been neglecting this site really badly.

I’ve been insanely busy with all kinds of things.  So whats new;

Got my Citrix CCIA certification, woohoo!!  Now just waiting on some Citrix projects 🙂

 

In the meantime i’m doing more FlexPod’s.  I am working on an update document for my iSCSI boot.  I found that the new UCS 2.0(2r) i belive added the IQN Pools, so some of the screenshots changed, however the process is basicly the same.

I’ve also been writing a lot of internal documentation right now, so the thought of writing more when i’m “off the clock” hasn’t been fun.  Thats coming to a bit of an end, so i’m going to start writing more here again.  I’ve come across some interesting things lately.

iSCSI Boot with ESXi 5.0 & UCS Blades

UPDATE:: The issue was the NIC/HBA Placement Policy.  The customer had set a policy to have the HBA’s first, then the iSCSI Overlay NIC, then the remaining NICs.  When we moved the iSCSI NIC to the bottom of the list, the ESXi 5.0 installer worked just fine.  I’m not 100% sure why this fix is actually working, but either way it works.

So at a recent customers site i was trying to configure iSCSI Booting of ESXi 5.0 on a UCS Blade, B230 M2.  To make a long story short it doesn’t fully work and isn’t offically supported by Cisco.  In fact, NO blade models are supported for ESXi 5.0 & iSCSI boot by Cisco.  They claim a fix is on the way, and i will post an update when there is a fix.

Here is the exact issue, and my orgianal thoughts, in case it helps anybody;

We got an error installing ESXi 5 to a Netapp LUN.  Got an error “Expecting 2 bootbanks, found 0” at 90% of the install of ESXi. The blade is a B230 M2.

The LUN is seen in BIOS as well as by the ESXi 5 installer.  I even verified the “Details” option, and all the information is correct.

Doing an Alt-F12 during the install and watching the logs more closely today, at ~90% it appears to be unloading a module, that appears by its’ name, to be some sort of vmware tools type package.  As SOON as it does that the installer claims that there is no IP address on the iSCSI NIC and begins to look for DHCP.  The issue is during the configuration of the Service Profile and the iSCSI NIC, at no time did we choose DHCP, we choose static. (We even have tried Pooled)  Since there is no DHCP Server in that subnet it doesn’t pickup an address and thus loses connectivity to the LUN.

So we rebooted the blade after the error, and ESXi5 actually loads with no errors.  The odd thing is that the root password that’s specified isn’t set, it’s blank like ESXi 4.x was.

So an interesting question is what’s happening during that last 10% of the installation of ESXi 5??  Since it boots cleanly, it almost seems like it does a sort of “sysprep” of the OS, ie all the configuration details.  If that’s the only issue then it might technically be ok.  However I don’t get the “warm and fuzzies”.  My concern would be that, maybe not today but down the road some module that wasn’t loaded correctly will come back to bite the client.

Also, what is happening in that last 10% that’s different then ESXi 4.x??  We were able to load 4.1 just fine with no errors.

Again we called Cisco TAC and we were told that ESXi 5 iSCSI booting wasn’t supported on any blade.  They do support 4.1 as well as Windows, and a variety of Linux Distos.

Configuring iSCSI boot on a FlexPod

Here is a nice document to follow to configure iSCSI booting for a FlexPod, ie. UCS Blades, NetApp array & ESXi.

UPDATE: This document has the fix i found for ESXi 5.0.  This was tested on B230 M2’s and seems to work every time.

This document will be updated as i get new information.

FlexPod iSCSI Boot-Fixed

Direct Connected Fiber Storage to UCS

So i’ve come across this recently.  I have a client that is direct connecting the Fiber from their NetApp array to the 6120’s of the UCS.

The issue that has been raised is that this is not technically supported.  As is seems Cisco releases with the 1.4.1 firmware release that you can absolutely do this.  However there is a caveat, it’s supported by Cisco as long as the storage vendor will support it.

The biggest problem is that NetApp did support it, but they don’t any longer.  So it seems Cisco was left holding the ball when NetApp walked away.

So if your running a NetApp array that is direct connected to their UCS w/o an MDS or even a 5548 with the FC module, its no longer technically supported and you very well may run into issues if you need Vendor support.

For those not familiar with direct connecting the storage i’ll give a little but of information on it, as well as some of my experiences with it and some tips on making it “work” with UCS.

So inside the 6120 there is effectivly a very very dumb MDS switch.  There is no Zoning, it is all 1 big zone, you do vSANs, but obviously no inter-vSAN routing, no security, no real way of even getting any initiator/target information for troubleshooting purposes.

In order to even use the functionality, you must change the Fiber portion of the switch from “End-Host Mode” to “Switch Mode”.  This is EXTREMELY similar in method and functionality to switching the Network side to “Switch Mode”.

You MUST also make sure to select the default vSAN that is created upon inital set-up, and enable “Default Zoning”

Intersting note you MUST absolutely make sure the HBA name in the Boot Policy is the EXACT same as the HBA name in the HBA Template, or it won’t boot.
So again, in my opinion if you can avoid direct connecting your SAN storage to the 6120, please avoid it, at least until UCS 2.0 comes out  🙂

Enabling Jumbo Frames in a Flexpod Environment

Update: I have fixed the 5548 section i was missing the last two lines.

This post will help the user enable Jumbo frames on their Flexpod environment. This document will also work for just about any UCS-based environment, however you will have to check on how to enable Jumbo Frames for their storage array.

This post assumes a few things;

Environment is running 5548 Nexus switches
User needs to setup Jumbo-Frames on the NetApp for NFS/CIFS Shares
Netapp has VIF or MMVIF connections for said NFS/CIFS connections.

Cisco UCS Configuration 

-Login to the UCSM, Click on the LAN Tab.
-Expand LANs, & LAN Cloud.
-Click on the QoS System Class, Change the “Best-Effort” MTU to 9216. 

NOTE: You need to just type in the number, it’s not one of the ones that can be selected in the drop-down.

Expand the Policies section on the LAN Tab.  Right-Click on the QoS Polices and click “Create new QoS Policy”.  Call it “Jumbo-Frames” or something similar.
-On the vNIC Template or actual vNIC on the Service Profile, set the “QoS Policy” to the new Policy.

 ESX/ESXi Configuration

-Either SSH or Console into the ESX host.  If your using ESXi you’ll need to ensure local or remote tech support mode is enabled.
-We need to set the vSwitch that the Jumbo-Framed NICs will be on to allow Jumbo-Frames.
          Type esxcfg-vswitch –l   find the vSwitch we need to modify.
          Type esxcfg-vswitch –m 9000 vSwitch# (Replace # with the actual number)
          Type esxcfg-vswitch –l   you should now see the MTU to 9000

-We now need to set the actual VMKernel NICs.

          Type esxcfg-vmknic –l  find the vmk’s that we need to modify
          Type esxcfg-vmknic –m 9000 <portgroup name> (this is the portgroup that the vmk is part of)
          Type esxcfg-vmknic –l   verify that the MTU is now 9000 

Note: If your using dvSwitches, you can set the MTU size through the VI-Client.

5548 Configuration 

Login to the 5548 switch on the “A” side.
-Type the following;

system jumbomtu 9216
policy-map type network-qos jumbo
class type network-qos class-default
mtu 9216
multi-cast-optimize
exit
system qos
service-policy type network-qos jumbo
exit
copy run start

-Repeat on the “B” Side 

NetApp Configuration 

-Login to the Filer.
-Type ifconfig –a  verify which ports we need to make run jumbo frames.
 -Type ifconfig <VIF_NAME> mtusize 9000 

NOTE: You need to make sure you enable jumbo-frames not only on the VLAN’d VIF but also the “root” VIF.

SSD drives don’t secure erase

So if you’re in the industry that requires its’ drives secure erased, or even if your a security minded person.  I came across a very interesting study.

In essence it says that because there is some brains on the actual SSD itself, there is no way to be sure you’ve erased the disk.   In a normal HDD, the erase program just writes a ton of 1’s and 0’s to the disk.  The problem is when the erasing program writes to what it think is block X, the SSD might actually write to block Y.  This is because of the way SSDs try to spread out the data so that one particular area of the memory chip isn’t over utilized.

This is quite an interesting article.

http://www.usenix.org/events/fast11/tech/full_papers/Wei.pdf

Creating a 24TB Storage server for $5,000

Background

So one of the recent tasks given to me was to create a “JBOD” Server for various uses.  I had to price it around $5k, be at least 20TB usable, and it had to be flexible, yet fast.  So i started doing some research into the subject.  What i came up with was that there was no way we could purchase a “pre-built” solution, of that size, for the price point we wanted.  The only remaining option was to build it myself using commodity hardware.

Parts

I had orginally spec’d out this server with much nicer hardware, or at least slightly nicer, however due to corporate policy, i had to use CDW….yay.  So some items are not ideal, such as the Motherboard and Power Supply, i happened to like a Gigabyte board and a Corsair PSU.  I will not list prices, as i’m sure by the time anybody reads this, they won’t be relevant.

ASUS M4A785-M  Motherboard,
AMD Phenom II X4 3.0ghz Quad-Core Processor,
Crucial 4gb DDR2 Kit, (Later i added an additional pair of 1gb RAM sticks i had laying around),
Antec 1000W Quattro Power Supply,
NORCO RPC-4020 4U Rackmount Server Case,
Areca ARC-1280ML PCIe SATA II RAID Controller Card,
Seagate Barracude 7.2kRPM 1.5TB 32MB Cache SATA HDD,  (20 of them)
A few Molex “Y” cables and extenders.
Any NICs you would like to add.
1GB USB Key (For the OpenFiler Install)

Operating System

The first decision was what operating system to run on this server.  After weighing my decisions it came down to “FreeNAS”, “OpenFiler”, “OpenSolaris w/ ZFS” or “Native Linux”.  Since there were time constraints on this project i did not have the time to really learn “OpenSolaris”.  I also thought that a full blown linux kernel would be a good choice, but may not provide the performance i needed, and that a “storage optimized” specialized kernel is the better route.  I ended up choosing “OpenFiler” since the community-at-large seemed to feel that OpenFiler performed better and had much better support for iSCSI which is something i wanted to test.  I also decided that i wanted to maximize space on my RAID array so i wanted to run the OpenFiler OS from a separate disk.  The original plan was to use a space 2.5″ HDD i had lying around, however the drive turned out to be no good.  So i opted to use a 2gb USB stick i had.  The OS does not have any read/write intensive tasks once it is loaded so USB 2.0 speeds should not be an issue.  The only time this may be an issue is if a lot of swapping is happening, and then there are other problems that need to be addressed first.

Installation and Configuration

The first step was to assemble the chassis.  The NORCO case came pretty much ready to go, i am actually quite impressed with this case.  It came with all the drive trays, the backplane for the drives, all the case fans and more then enough hardware for any installation.  The drives all came in OEM packaging, they were then secured into the hot-swap trays.  The motherboard, memory, CPU and Power Supply were all installed in the case.  The next thing to go in was the Areca controller.  This unfortunately took up my 16x PCI-E slot, as it requires an 8x PCI-E slot.  Thankfully many motherboards come with on-board video.  (Make sure the one you choose does, no sense eating up a PCI Slot for a video card)

The next item was to run the Molex Power cables.  This case powers the drives off the backplane, so you need to plug two molex plugs per backplane “strip”.  If you have large hands stop right now and enlist the help of somebody who is small-handed, as this is a really tight space.  You may have to use some Y-Cables and/or extenders depending on your supply and how many Molex ends it has.  (Note: Make sure that you try and balance out the Power Rails on your supply, so one with multiple rails as apposed to one large one seems to work better.)

Next is the SATA cables.  Theses should be supplied with the Areca Controller.  While this is no where near as bad as the Molex cables, this still isn’t fun.  The controller i used had Multi-lane SATA to 4-port SATA as well as a connecter that would normally light up little LED’s for Power/Activity lights. This extra connector isn’t necessary with the NORCO case, as the backplane actually handles the blinky lights.

At this point i would test out the system, make sure it at least POSTS’ as well as detects all the hardware correctly.  Next i had to create the RAID Arrays.  Initially the design called for 3 RAID-5 Arrays consisting of 7,7, & 6 disks.  (Note: With this array if you can choose to do the initialization now or in the background.  Now took well over a weekend to perform due to the size of the server.)

After that i inserted the USB stick and the OpenFiler Install CD.  Boot to the CD and follow all the prompts to install the OS.  Make sure to install it to the USB stick and not any of the Arrays.  From there follow the OpenFiler documentation, if you paid for it, or use the various forums or other methods to configure it.  It is fairly straight-forward and pretty easy to use.

Performance Results

After everything was configured and setup i decided to do some simple tests to determine if the array was performing well and that things worked as they should.  Since OpenFiler runs a variant of Linux (rPath), i decided a simple dd read/write test would at least give me some ballpark numbers.

I did two tests on the local commandline,   “dd if=/dev/zero of=file bs=1024k count=20000” & “dd if=file of=/dev/null bs=1024k count=20000”.  This gave me sequential reads and writes totaling around 20gb, which made sure i was not just working in cache.

The tests gave me read speeds of anywhere from 300-500 MB/s, and write speads from 200-400 MB/s, yes thats Bytes.  I was quite impressed, especially since it was only running the test against a single RAID group, which is max of 7 disks.

The JBOD has been running as primary storage for my ESX lab.  It is hosting 12 ESX Servers and running around 80-90 VMs, over iSCSI.  It is now starting to see bad performance numbers, while the disk is being utilized fairly heavy

Unfortunately, since this is a Network-Disk box the major limitation has been the gigabit ports out of the back.

Lessons Learned

If i had to do this project over with what i know now, i would have done a few things differently.

1. Not settled on hardware and tried harder to get what i originally spec’d out.

This would have made my life much easier as i found out that this ASUS MB didn’t have the necessary PCI-E slots to put the quad-port GIG-E nics i wanted, as well as the iSCSI and FC HBA’s.

2. I would have going with OpenSolaris and ZFS or some other high performing FileSystem.

OpenFiler works quite well, but it has had some odd quirks, most notibly with NFS and it dropping the connections to ESX and not allowing them back in.

3. I would not have used iSCSI for the primary ESX Storage.

In all of our testing it was seen that NFS VMFS volumes out performed iSCSI VMFS volumes.

4. 10gb

1gb pipes just weren’t fast enough for what we were trying to do.