UCS Build Good Practices

So i have done more UCS Build then i can count, and between some of my colleagues and i, we’ve done hundreds of them by now. So we put together a “UCS Best Practice Guide”, i am sharing it here because i feel that this information should be make public. I have added “Why?” to the line items since many customers want to know why things are done. Again this isn’t the end all be all, just what we found have worked for many customers and makes for a good starting point.

Overview

This guide is intended to be used on UCS deployments.  This guide will detail all of the best practices that have been defined by both Cisco as well as by lessons learned in the field from Subject Matter Experts in the field.  This guide will ensure not only proper deployment but also will help ensure consistency.

This document is written following the basic installation order and processes.

This document is written as a Guideline when existing Client practices aren’t in place, or when asked about recommendations.  Client requirements, wishes and needs should always supersede anything written here.

 

Physical Equipment

This section will list out the best practices for the physical equipment and rack and stacking.

Rack & Stack

    The Fabric Interconnects should ideally be located at the top of the racks

  • It is preferred to be in two separate racks to provide additional failure domains.
  • Cisco does approve of a “Center of Rack” design as well.
  • Airflow for both 61xx & 62xx are Front to Back.
    • Air enters through the fan trays and power supplies and exits by the ports.
  • Airflow for the Chassis is also Front to Back.
  • The FIs weigh between 35 & 50 pounds each so be sure to account for it.
  • The UCS Chassis can be up to 300 pounds.
  • Ensure all equipment is using the proper rails and/or brackets
  • Ensure equipment is secured using ALL of the screw holes!
  • Ensure what will become Fabric Interconnect A is on the top or left, and B is the bottom or right.
    • Typically Left vs Right is looking at the front of the system.
      • Front contains the blades.
    • Exception: Please be sure to refer to customer requirements on numbering.
  • Chassis numbering should be bottom -> top & left -> right.

Cabling

  • All Server Ports should be cabled using the leftmost ports on the FI.
    • Why??
      • Since typically its server ports that are added, the next chassis can remain cabled in the correct numerical order.
  • All Uplink Ports should be cabled using the rightmost available ports on the FI.
    • Why??
      • Again this allows spare ports to be for the typically added Server ports.
  • FEX/IOMs should be cabled from top to bottom.
  • Use the shortest possible cables, while ensuring no cables are too tight.
  • Cabling should look closer to the picture on the right, otherwise airflow becomes an issue.

UCS

 

UCSM Configuration

This section will provide the configuration best practices for all UCSM items.

Overall

  • Do not put any pools, policies or templates under the root organization, create a sub-org and place everything under that.
    • Why?
      • This allows additional sub-orgs to be created later that won’t pull from these root org items
  • Delete all default pools, which can be deleted.
    •  Why?
      • This keeps the error count down and ensure that correct pool/policies are being created
  • Ensure all names are meaningful and descriptive
  • Use the “Description” field when possible for more information
  • All pools should be set to Sequential
    •  Why?
      • The default setting uses an algorithm that is seemingly random when bigger then 32 objects.
  • Engineer must have patience while working with UCS, you WILL break it otherwise!!
    • Why?
      • The UCSM can have a substantial delay from clicking to reporting back tasks, if you click too quickly tasks can get interrupted or queued and cause major issues.

 

Equipment Tab

Policies

  • Chassis Discovery Policy should be set to the minimum number of IOM -> FI links.
    • Why?
      • This will prevent any issues down the road relating to chassis being discovered.
    • You MUST do an acknowledge on each chassis after all links are up (if using above method)
  • Link Grouping Preference should be set to Port-Channel.
  • Power Policy should be set to Grid.

Fabric Interconnects

  • Fabric Interconnects should be placed into End-Host mode whenever possible. (default)
  • When setting Unified ports, only the minimum necessary number of ports should be turned into fibre channel ports.
    • Why?
      • This prevents wasting ports
      • Should be done right away as a reboot is necessary.
      • Keep in mind any future expansion the customer may discuss.
  • Primary Fabric Interconnect should be “A” and subordinate should be “B”.
  • Server Ports should be enabled one chassis at a time.
    • Why?
      • This will order the chassis in the correct physical order.
    • If mis-numbered they can be decommissioned and re-numbered.
  • Uplink ports can be enabled in bulk.
  • Set any unused ports to “Unconfigured”.
    • Why?
      • This is for both security and licensing purposes.

Chassis

  • Resolve any “Fabric Conn” problems before continuing configuration.
    • Why?
      • The remediation steps are disruptive to the environment.
    • Ensure at least 10 minutes have passed from the time the chassis appeared within UCSM.
    • If problem persists, perform a chassis “Acknowledge” and wait for it to clear.
    • Chassis should not be “acknowledged” until all cables have been connected.
      • Why?
        • If this is done earlier, there is a potential to have issues with the IOMs
  • Ensure that all blades and IOMs show no errors.

Firmware

  • Update to the latest recommended firmware.
    • Why?
      • This prevents issues during configuration
    • Firmware should be no less than 60 days old, unless dictated by the customer or TAC.
  • Engineer MUST read the Cisco Release Notes and understand upgrade process described.
    • Why?
      • These procedures can vary between versions.
      • In addition there are many caveats related to “do not update if” listed.

 

Admin Tab

Communication Management

  • Ensure that your Management Interfaces are specified correctly, including domain.
  • If the Management subnet’s gateway are not pingable, specify the MII Status setting.
    • Why?
      • If this is not set the FIs will assume they are down and attempt failover constantly.

Time Zone Management

  • Be sure to set the time-zone if not UTC.
  • Be sure to specify NTP Servers.
    • Note: This does not set the time on blades themselves.

License Management

  • Be sure check that no “Grace Period” licenses are in use.
    • Why?
      • This will prevent licensing errors on day 121.
  • Download and apply any licenses that were purchased.
    • Most times they come pre-loaded depending on the SKU.

User Management

  • Engineer must fully understand RBAC integration processes.
    • Why?
      • The process can be a bit complicated and involves integration with production customer systems.

LAN Tab

LAN Cloud

  • When creating Port-Channels match the ID to the Port-Channel ID of the uplinks on the upstream switch.
    • Why?
      • This is to help simplify troubleshooting and keep items consistent.
  • Set the QoS System Class to match the upstream Nexus switches QoS.
    • It is highly recommended to enable jumbo frames on the best effort and other system classes.  Many traffic types take advantage of jumbo frames: vMotion, NFS, iSCSI, Oracle RAC, etc.
    • If they are not set here, setting Jumbo on the vNICs will not do anything.
      • Future troubleshooting is often very difficult.  This is due to MTU mismatch at L2 resulting in dropped frames, vs. a mismatch at L3 resulting in fragmentation.

Policies

  • Create a Network Control Policy that enables CDP and sets MAC Register Mode to All Host Vlans.
    • Why?
      • This allows ESXi to see the CDP information.
      • This also allows MACs to be registered on other VLANs
        • This is especially useful when the customer is not utilizing the native VLAN on the trunks, or when the Native is locked down.
    •  The exception to this rule is;
      • Set this to Native VLAN Only if the customer has a large number of VLANs specified. (200+, typically only seen in Service Provider space)
  • Create QoS Policies that match up with all the enabled policies in the QoS System Class.
    •  Leave Host Control None
      • The exception to this rule is;
        • The Host Control should be set to “Full” if tagging CoS at the host level (typically done with a 1000v, or 5.5 DVS)

IP Pools

  • Create an ext-mgmt pool that is at least the size of the maximum number of blades possible in the Domain, if possible. (Or at least account for future growth)
    • Why?
      • This prevents a split pool possibility later on.
    • Ensure the Subnet is in same subnet as Fabric Interconnects
      • Why?
        • The Out-Of-Band management for the blades MUST reside on the same subnet as the Fabric Interconnects MGMT port since it uses it for external connectivity.
    • Accounting for number of ports on FI, and number of uplinks to FI, determine max number of chassis possible.
      • Why?
        • This will ensure the customer is create a pool ahead of time for a max number of servers supported.  If they don’t plan on growing that big, trim back as necessary.
      • Fabric Interconnects have a maximum possible chassis count of 20.
    • Multiple that number by 8 (max number of blades per chassis).
      • So using a 6248 with 4 uplinks per chassis and 4 uplinks total as an example the math is this;
        • (48 total ports – 4 uplink ports)/(4 uplinks per chassis) = 11 chassis * 8 blades per chassis = 88 total possible blades per UCS Domain.
        • This means you’ll need at least 91 IPs (88 blades + 3 FI IPs) in the management subnet.
    • It is sometimes preferred to double that number in case Service Profile IPs want to be used, however this is not necessary or always possible.

MAC Pools

  • Create MAC Pools with the following convention in mind:
    •  This is only a suggestion, follow customer needs. Larger environments will be limited by the 255 host limit.  Please plan accordingly.
    •  This convention requires MAC pools for each vNIC type, this does increase the initial setup and some management, however a lot of customers like it for better ability to segment the traffic for security, QoS, & Monitoring.

The MAC pool value convention is used to provide a contiguous range of MAC address values.  The MAC address consists of 12 hex values.  The address is built utilizing the following table;

Section 1 Section 2 Section 3 Section 4 Section 5 Section 6
00:25:B5 X Y Z A/B XX

 

  • Section 1 identifies the Cisco OUI
  • Section 2 is the site code
    • For example, Las Vegas is a 1 & Chicago is a 2, etc
  • Section 3 is the domain code
    • This is per UCS domain and increments. Since this is the 2nd UCS at a given DC this will be 2.
  • Section 4 identifies the purpose of the NIC
    • 1 is for Mgmt, 2 is for vMotion, 3 is for Storage Traffic, 4 is for VM Data Traffic
  • Section 5 identifies the Fabric A or B (A is Shown)
  • Section 6 is the assigned GIDs (Allows for 255 unique GIDs)
    • We are starting at 01 so that the server to ID numbering matches.
  • This would give a Storage vNIC on Fabric A of the 3rd Server in the above Chicago UCS a MAC Address of;
    • 00:25:B5:22:3A:03

 

  • Recommend starting the pools with 01 instead of 00.
    • Why?
      • This makes it more “human readable” and less prone to errors
  • Recommend creating a MAC Pool for each NIC type, for each Fabric.
  • Maximum size with this convention is 255 if starting at 01.
    • Should be enough since 255 is max number of blades in a Domain is 160

vNIC Templates

  • Name should be descriptive to its function
  • You should NOT enable failover in the following situations;
    • When any Multi-pathing aware OS is installed.
      • ESXi
      • Certain newer Linux OSs
      • Windows 2012 or later
      • Windows 2008 if Cisco Teaming drivers will be installed
    • You should enable failover in the following situations;
      • Windows 2008 and Earlier with no teaming drivers installed
      • Older Linux OSs
      • iSCSI Boot NICs
  • Create Template as an Updating Template
    • Why?
      • This will make any changes done to the Template propagate to any attached items.
      • Understand the implications though!!
  • Set the appropriate MTU, MAC Pool, QoS & Network Control Policy.

 

SAN Tab

SAN Cloud

  • If using VSANs be sure to create them.
  • Ensure each VSANs FCoE VLAN is in a high-enough range.
    • Why?
      • This will ensure that it is in a range that will not later on be used by Networking equipment, while making it human-readable to know what the VLAN is used for.
    • It is recommended to add 2 or 3000 to the VSAN ID number.
    • So VSAN 2 would have an FCoE VLAN of 2002 or 3002.
      • Ensure the VLAN # is unique and reserved in the environment!!!!!
  • Do NOT create Common/Global VSANs, they should be specific to each Fabric Interconnect.
    • Why?
      • Each FI should be treated as a separate SAN switch.  This complies with typical SAN switch design.
      • This also prevents issues with FCoE where you cannot have the same VSAN on both Fabric Interconnects.
  • Enable FC Zoning for that VSAN ONLY if there is no upstream SAN switch.
    • Why?
      • This prevents the FI from creating zonesets and trying to do an additional layer of zoning that will cause issues.
      • In addition typically the upstream switches do a much better job at handling the zoning then the FIs can.
  • Ensure that your FC Uplink Interfaces are in the correct VSAN.
    •  By Default they will be in VSAN 1.
  • If utilizing FC Port-Channels ensure the ID matches an uplink port-channel ID.
    • Why?
      • This helps with consistency as well as future troubleshooting.

Policies

  • Create a Storage Connectivity Policy if doing Fabric Interconnect FC Zoning.
    • Use ONLY “Single Initiator Single Target”.
      • Why?
        • This is a best practice that is recommended by almost every storage vendor on the market today.

Pools

  • Create a WWNN Pool with the following convention;
    • This is only a suggestion, follow customer needs. Larger environments will be limited by the 255 host limit.  Please plan accordingly.
    • You MUST have a zero in the place that the WWPN’s A or B identifier will go.

The WWNN pool value convention is used to provide a contiguous range of WWNN address values.  The WWNN address consists of 16 hex values.  The address is built utilizing the following table;

Section 1 Section 2 Section 3 Section 4 Section 5 Section 6
20:00 00:25:B5 X Y 00 XX

 

  • Section 1 identifies the ID as an initiator
  • Section 2 is the Cisco OUI
  • Section 3 is the site code
    • For example, Las Vegas is a 1 & Chicago is a 2, etc
  • Section 4 is the domain code
    • This is per UCS domain and increments. Since this is the 2nd UCS at a given DC this will be 2.
  • Section 5 is 00 as a WWNN
  • Section 6 is the assigned GIDs (Allows for 255 unique GIDs)
    • We are starting at 01 so that the server to ID numbering matches.
  • This would give a WWNN of the 3rd Server in the above Chicago UCS a MAC Address of;
    • 20:00:00:25:B5:22:00:03
  • Create WWPN Pools with the following convention;
    • This is only a suggestion, follow customer needs. Larger environments will be limited by the 255 host limit.  Please plan accordingly.
    • You MUST place the A or B identifier in the place that the WWNN has a 0 digit in it.

The WWPN pool value convention is used to provide a contiguous range of WWPN address values.  The WWPN address consists of 16 hex values.  The address is built utilizing the following table;

Section 1 Section 2 Section 3 Section 4 Section 5 Section 6
20:00 00:25:B5 X Y 0A/B XX

 

  • Section 1 identifies the ID as an initiator
  • Section 2 is the Cisco OUI
  • Section 3 is the site code
    • For example, Las Vegas is a 1 & Chicago is a 2, etc
  • Section 4 is the domain code
    • This is per UCS domain and increments. Since this is the 2nd UCS at a given DC this will be 2.
  • Section 5 is either 0A or 0B depending on the fabric
  • Section 6 is the assigned GIDs (Allows for 255 unique GIDs)
    • We are starting at 01 so that the server to ID numbering matches.
  • This would give a WWPN on the A Fabric of the 3rd Server in the above Chicago UCS a MAC Address of:
    • 20:00:00:25:B5:22:0A:03
  • Create IQN Pools with the following prefix;
    • Iqn.2014-05.cisco.com:25B5
      • Why?
        • Certain Storage vendors will not work correctly without this type of formatted IQN.
        • This exact prefix states it is a cisco device, as well as the 25B5 which is part of the cisco registered OUI
    • o   Suffix does not matter
      • However it’s recommended to identify it with the future hostname or other descriptive name and number identifier.

HBA Templates

  • Name should be simple if using only 2 HBAs.
    • Why?
      • This will keep it easy to understand and use.
      • Either way the name should make sense.
    • HBA-A & HBA-B is a typically used option.
  • Ensure the VSAN selected is correct.
  • Select Updating Template.
    • Why?
      • This ensures any changes to the template are propagated to all attached items.
      • Ensure the implications of this are understood.
  • Ensure the QoS Policy is set to the previously create FC Policy.

Server Tab

Policies

  • Create a BIOS policy that sets the following;
    • Why?
      • These are typically requested items by customers and recommended by the industry.
    • Resume AC on Power Loss to “Last-State”.
    • Set DDR Mode to “Performance-Mode”.
    • Boot Option Retry “Enabled”.
    • Quiet Boot “Disabled”.
  • When creating Boot Policy ensure “Enforce vNIC/vHBA” is checked.
    • Why?
      • This ensures that the vNIC you want to boot off of is the actual one that it will boot off of.
    • This means you MUST enter the names exactly the same as they will be created during the profile Templates!!
      • Why?
        • If you don’t you will get an error relating to the boot policy.
    • Ensure that you always have CD/DVD as the first boot device.
      • Why?
        • This allows the use of ISO media even if there is a OS on the blade.
        • Otherwise the admin would have to try and catch the boot selection keypress.
    • Unless specified by the Operating System use Legacy boot mode.
      • Why?
        • If EFI mode is selected many Operating Systems will not boot.
    • If doing SAN Boot, Odd # servers should boot of SAN Head A and Even # servers off Head B.
      • Why?
        • This balances the blades and reduces the impact of a Boot Storm.
      • Ensure to set both A & B HBA’s, WWPNs should from both Storage Heads, on their respective fabrics.
      • The drawback to this is multiple Service Profile Templates, so discuss it with Customer.
  • Create a Host Firmware Package and set it to the same Firmware revision as the FIs.
  • Create a Maintenance Policy with a setting of “User Ack”.
    • Why?
      • This prevents any updates from rebooting all of the UCS blades at once.
  • Create a Power Control Policy with a setting of “No Cap”, unless required by customer.
    • Why?
      • This is used as a majority of customers do not setup or use the Power Capping features.
  • Create a Scrub Policy with all Settings to No, unless required by Customer.
    • Policy does not actually do a data wipe.

Pools

  • Create a UUID Suffix Pool with the following;
    • If you are following the MAC/WWNN/WWPN guides as specified in this guide, Prefix should be set to “Other”
      • Using the Site and Domain codes, enter then into the first two places in the Prefix
      • Using the Site and Domain codes, enter then into the first two places in the Suffix

Service Profile Templates

  • Ensure the Name is descriptive to its purpose as well as any unique features.
  • Select “Updating Template”.
    • Why?
      • This ensures any changes to the template are propagated to all attached items.
      • Ensure the implications of this are understood.
  • Use “Expert” mode for adding vNICs.
    • Why?
      • Choosing anything other the “Expert” will be missing important network info.
  • Name the vNICs the same thing as the vNIC Template if possible.
    • Why?
      • This keeps things consistent and easy to troubleshoot.
  • Ensure you select the appropriate Adapter Policy.
  • Use “Expert” mode for adding vHBAs.
    • Why?
      • Choosing anything other the “Expert” will be missing important info.
  • Name the vHBAs the same thing as the vHBA Template if possible.
    • Why?
      • This keeps things consistent and easy to troubleshoot.
  • Ensure you select the appropriate Adapter Policy.
  • Select “Let system perform placement” on vNIC/vHBA Placement”.
  • Set the Firmware Management policy on the “Server Assignment” page.
  • Set the BIOS, Power Control & Scrub Policy on the last page.
  • Ensure the names of the profiles are descriptive.

Service Profiles

  • Ensure the names of the profiles are descriptive.
    • Best case is use the Hostname of the server as the name, if possible.
  • Put the hostname in the “User Label” section.

Leave a Reply

Your email address will not be published.