I attended the Cisco Plus Canada Roadshow in Calgary recently and sat in on a day of presentations related to Cisco's data center/cloud offerings. The sessions where quite good and I ended up taking quite a few notes. I thought I'd blog my notes in order to share what was presented.

The four sessions were:

  • Journey to the Cloud
  • Cisco UCS
  • Data Center Networking
  • Powering the Cloud

Journey to the Cloud

Presented by Ronnie Scott (Cisco)

According to Cisco's own measurements:

  • 77% of IT time spent keeping lights on (opex spend)
  • 23% spent delivering new capabilities (capex spend)

Drivers for adopting cloud:

  • Reduce IT cost
  • Simplify IT operations
  • Improve pace of delivery
  • Better align IT resources to business needs

Many keywords used when describing "cloud": ubiquitous, convenient, on-demand, shared pool, rapidly provisioned, minimal management.

  • This is taken from the NIST definition of cloud (A NIST definition of cloud computing (PDF))
  • If you're only running big servers with vSphere, you're not operating a cloud. You're operating a virtualized environment.
  • Virtualization is a stepping stone to cloud
  • To be a cloud, the environment must:
  • Be provisioned on-demand using self service
  • Have broad network access
  • Have pooled resources
  • Be able to rapidly expand
  • Be measurable

Virtualization is a stepping stone towards a cloud architecture. However the level of virtualization must be broad and complete. Everything must be virtualized:

  • Server/compute
  • Network
  • Storage

To run a real cloud, enterprises need to move to a capacity planning model and away from a per project spend/resource allocation model. Set high watermarks on the pool of resources and buy more/plan for more when the HWM is hit.

  • New applications/services being introduced as part of a project should consume resources from the existing pool of compute/storage/network. No (or minimal) need to purchase equipment that will only run this new app/service.
  • IT budgeting and spending models need to change. With a cloud architecture, capital outlay is necessary when capacity planning says so, not when project plan for a new app says so.

Cloud delivery models:

  • SaaS/Application - Aimed at end users; provides applications at scale
  • PaaS/Platform - Aimed at developers; provides execution platform at scale
  • IaaS/Infrastructure - Aimed at system administrators; provides infrastructure at scale

IaaS/PaaS - Targeted at running applications with variable resource demand (VDI, DR, application test/dev).

SaaS - Delivers functional apps with low per user/transaction cost (Email, UC, web hosting)

Cisco has a new cloud model: virtual private cloud

  • Cloud services that simulate private cloud in public cloud infrastructure

Cisco sees a model for the future where cloud environments utilize a brokerage service with intelligence to move workloads between public/private/community clouds. Applications would be tagged with certain criteria that specifies the level of service it requires. The broker would use this information to dynamically place the workload into a certain cloud environment based on configured policy.

Cisco's cloud strategy:

  • Build the base
  • Core, "crown jewel" apps run in-house on enterprise private cloud
  • Rent the spike
  • Workload spikes satisfied with public/community clouds

Elements of a private cloud

  • Self service portal
  • Users order services
  • Service delivery automation
  • Delivery of those ordered services is automated
  • Resource management
  • Resource allocation to the ordered services is managed
  • Operations management
  • Integration of the service with operations tools, help desk tools, SLA monitoring, etc
  • Life cycle management
  • Cradle to grave: the ordered services need to be deprovisioned at some point in their lifetime too.

Architecture for private cloud

  • Resources (compute, storage, bandwidth)
  • Resource management (vCenter, UCS Manager UCSM, BMC, etc)
  • Service management (vCloud Director, BMC)
  • Customer access, self provisioning tools (BMC, Cisco Intelligent Automation for Cloud CIAC)

The service management layer needs to talk to the resource management layer in order to provision the physical resources from the pool. The elements in the resource management layer need APIs to make this possible. Cisco products like UCS and the Nexus line all have rich, complete APIs which make this automation possible.

Steps for moving to private cloud:

  • Consolidate
  • Virtualize
  • Automate
  • Self service provisioning

Cisco UCS

Presented by Willem van Schaik (Cisco)

Cisco UCS

Nexus 5000 virtual chassis

  • Nexus 5k acts like a supervisor
  • Nexus 2k fabric extender (FEX) acts like remote line card
  • Nexus 5k + 2k model was extended to Cisco UCS
  • FEX is only network piece installed in the chassis
  • Management of the system is done via the Fabric Interconnect (fancy Nexus 5000s running UCS Manager software on top of NX-OS)

10Gbe benefits

  • More bandwidth is nice, but not the biggest win
  • Low latency is nice, but not the biggest win
  • Biggest win is consolidating multiple networks and traffic onto one cable
  • Builds on "virtualize everything" concept (virtualizing the network)
  • Wire once, use many
  • Or as Cisco says, wire for bandwidth not connectivity

Generic rackmount servers vs. generic blade servers:

  • Most rack server components (NICs ports, storage ports, etc) duplicated 1:1 on a blade
  • Same number of ports and connections to manage in a 16x blade enclosure as with 16 rackmount servers
  • Additionally, every blade enclosure is (at best) its own point of management (worst case, every single blade is its own point of management)
  • No opex savings with blades

With UCS:

  • All points of management moved to single spot: top of rack in the Fabric Interconnect
  • No mgmt point in back of the blade enclosure (only thing in the back is a dumb fabric extender/FEX)
  • UCS chassis are like a storage disk shelf. Plug and use, nothing to configure, no management points on the shelf/chassis
  • One point of management for N chassis/blades

UCS Manager (UCSM)

  • Management software for entire UCS environment
  • Runs on Fabric Interconnects
  • RBAC (Storage, server, network teams all use same console)
  • Full API for integrating with other tools. UCSM GUI written on top of the XML API
  • Other tools can use the API to do the same tasks as UCSM (eg, software from CA)
  • API allows management from any platform. Apps exist for iPhone/iPad, Android and Playbook
  • Central firmware mgmt for all components
  • Fabric Interconnect, Converged Network Adapters, FEX, etc
  • Firmware can be staged (redundant firmware banks) in hours and applied on a scheduled or off hours basis

Service profile

  • Wraps all the server parameters and policies into a template
  • Lives in UCSM not on the blade
  • Creates the personality of the server

The new 6200 series Fabric Interconnect

  • Unified ports
  • Port can be configured for data center bridging or fiber channel
  • 6248 based on Nexus 5548
  • Expect to see 6296 based on Nexus 5596

UCS 2.0 increases bandwidth to blades

  • 8 port FEX (up from 4 ports in first version)
  • Port channel can now be created between the FEX and the upstream Fabric Interconnect
  • No more pinning a blade to a specific upstream port on the FEX
  • 1 blade can use up to 40Gb of bandwidth out of the chassis (limited by the bandwidth from the FEX to the blade; see the bullet point right below)
  • From FEX down to blade, increased lanes on the highway from 1x to 4x 10Gb lanes
  • Also in a port channel
  • 40Gb per port per blade

VM-FEX: virtual NICs on the Palo card become remote ports on the Fabric Interconnect

  • FEX idea cascades from Fabric Interconnect to FEX and from FEX to Virtual Interface Card (Palo) on the blade
  • 802.1Qbh/VN-Tag (pre-standard) used to identify traffic to and from individual VMs
  • Fabric Interconnect creates a logical interface for each VM (based on the VN-Tag). Other end of that logical interface is the virtual NIC created on the Palo card.
  • All switching done in the FI, ALWAYS. Even traffic between two VMs on the same vSphere host must flow up to the FI and then back down
  • Allows policy, control and monitoring of all traffic
  • VMDirectPath mode gives best performance
  • Switching all done in ASICs
  • In this mode, Vmotion is supported under ESXi 5

Data Center Networking

Presented by Ronnie Scott (Cisco)

Fiber Channel over Ethernet is standardized. Period. Competitors are spreading FUD if they say otherwise.

Fiber channel encoding is inefficient

  • FC uses 8b/10b encoding
  • 4Gb FC != 4Gb throughput
  • 10Gb Ethernet uses more efficient encoding scheme. 4Gb FCoE is more efficient on the wire than native 4G FC
  • 16G FC will have more than double the real throughput of 8G FC because it adopts the same encoding used with 10GbE.

Spanning Tree vs. Virtual Port Channel

  • 50% of network bandwidth is wasted in an STP network
  • STP is absolutely necessary through to eliminate loops
  • VPC allows all links to be forwarding
  • Two upstream switches "collude and lie" to southbound switch (VPC == "lies and deceit protocol")

FabricPath

  • New method of transporting frames in the data center
  • No STP
  • All links active
  • Layer 2 adjacency between end devices over a routed, Layer 3 fabric
  • Spine and leaf design
  • High bisectional bandwidth
  • FP based on TRILL which was standardized before anyone had an implementation
  • TRILL has some things that need changing to add things that weren't in the standard and should be (according to Cisco)
  • Likely to see a TRILL v2 at some point
  • Work already underway on next version of TRILL
  • FabricPath enabled with one command per interface. No configuration needed.
  • Don't be surprised if FabricPath is the default interface mode in the data center one day

Overlay Transport Virtualization - OTV

  • Layer 2 adjacency between end devices in different data centers across a Layer 3 data center interconnect
  • Manages traffic tromboning of FHRP VIP (FHRP becomes active in both DCs)
  • Prevents STP BPDUs from crossing the DC interconnect
  • Enables Vmotion between data centers
  • Vmotion between data centers has distance/latency limits!
  • Kudos to Ronnie for actually bringing this up. Most discussions on OTV throw around the idea of long distance vmotion as a cure-all to any DR problems you might have.

Location/ID Separation Protocol - LISP

  • Decouple location from device identification
  • IP addresses today do both: they define the network (location) of a device and its identification (how do I talk to that device)
  • By decoupling the two, a device can have its location changed without changing its ID (ie, without changing how clients identify and connect to the device)
  • Allows you to move workloads around in the cloud and make sure those services and devices can be reached no matter where they are
  • No load balancing, no DNS changes, no client side changes. Works all in the network.

Powering the Cloud

Presented by Willem van Schaik (Cisco)

From above: one of the elements of cloud is automation. In order to build automation:

  • You must understand all processes needed to provision a server/app today
  • The provisioning involves people from all teams (network, server, storage, application support). Everyone must be involved in setting the automation rules and workflows.

Cisco tools:

  • newScale catalog: self service portal
  • Cisco Tidal orchestrator: automation piece

Cisco bought newScale and Tidal to get best of breed self service and orchestration tools.

Orchestration must be full life cycle. You need to deprovision the stuff too at some point! Don't forget.

Amazon cloud is cheap when compared to in house infrastructure that is not being fully utilized (as is the case the majority of the time). Your private cloud is cheaper than Amazon if you're running at a very high level of utilization. If running at high utilization though, how do you handle spikes in load? Burst your spikes into the public cloud (ties back to "rent the peak" above). This requires rules though for which workloads can/should be moved into public cloud.

Using Cisco's own measurements, they flipped the 77% opex/33% capex ratio to 40% opex (keeping the lights on) and 60% capex (introducing new services) by moving to a full cloud architecture.

Buying a Vblock is like buying a car, there's limited options to choose from on the order sheet. You have to pick from the options the manufacturer has made available.

Flexpod is a set of reference designs (for Exchange, for Oracle, etc). You order all the parts yourself, which means you can customize the order, and you integrate the pieces on your own. Flexpod also contains benchmark information to help you with capacity planning.

Final thoughts: This stuff is not easy. Start small. Seek help from Cisco on the really hairy parts. If you can realize full cloud, the benefits are big in cost and time savings.