Blogging the Cloud Track at Cisco Plus 2011

I attended the Cisco Plus Canada Roadshow in Calgary recently and sat in on a day of presentations related to Cisco's data center/cloud offerings. The sessions where quite good and I ended up taking quite a few notes. I thought I'd blog my notes in order to share what was presented.

The four sessions were:

Journey to the Cloud
Cisco UCS
Data Center Networking
Powering the Cloud

Journey to the Cloud⌗

Presented by Ronnie Scott (Cisco)

According to Cisco's own measurements:

77% of IT time spent keeping lights on (opex spend)
23% spent delivering new capabilities (capex spend)

Drivers for adopting cloud:

Reduce IT cost
Simplify IT operations
Improve pace of delivery
Better align IT resources to business needs

Many keywords used when describing "cloud": ubiquitous, convenient, on-demand, shared pool, rapidly provisioned, minimal management.

This is taken from the NIST definition of cloud (A NIST definition of cloud computing (PDF))
If you're only running big servers with vSphere, you're not operating a cloud. You're operating a virtualized environment.
Virtualization is a stepping stone to cloud
To be a cloud, the environment must:
Be provisioned on-demand using self service
Have broad network access
Have pooled resources
Be able to rapidly expand
Be measurable

Virtualization is a stepping stone towards a cloud architecture. However the level of virtualization must be broad and complete. Everything must be virtualized:

Server/compute
Network
Storage

To run a real cloud, enterprises need to move to a capacity planning model and away from a per project spend/resource allocation model. Set high watermarks on the pool of resources and buy more/plan for more when the HWM is hit.

New applications/services being introduced as part of a project should consume resources from the existing pool of compute/storage/network. No (or minimal) need to purchase equipment that will only run this new app/service.
IT budgeting and spending models need to change. With a cloud architecture, capital outlay is necessary when capacity planning says so, not when project plan for a new app says so.

Cloud delivery models:

SaaS/Application - Aimed at end users; provides applications at scale
PaaS/Platform - Aimed at developers; provides execution platform at scale
IaaS/Infrastructure - Aimed at system administrators; provides infrastructure at scale

IaaS/PaaS - Targeted at running applications with variable resource demand (VDI, DR, application test/dev).

SaaS - Delivers functional apps with low per user/transaction cost (Email, UC, web hosting)

Cisco has a new cloud model: virtual private cloud

Cloud services that simulate private cloud in public cloud infrastructure

Cisco sees a model for the future where cloud environments utilize a brokerage service with intelligence to move workloads between public/private/community clouds. Applications would be tagged with certain criteria that specifies the level of service it requires. The broker would use this information to dynamically place the workload into a certain cloud environment based on configured policy.

Cisco's cloud strategy:

Build the base
Core, "crown jewel" apps run in-house on enterprise private cloud
Rent the spike
Workload spikes satisfied with public/community clouds

Elements of a private cloud

Self service portal
Users order services
Service delivery automation
Delivery of those ordered services is automated
Resource management
Resource allocation to the ordered services is managed
Operations management
Integration of the service with operations tools, help desk tools, SLA monitoring, etc
Life cycle management
Cradle to grave: the ordered services need to be deprovisioned at some point in their lifetime too.

Architecture for private cloud

Resources (compute, storage, bandwidth)
Resource management (vCenter, UCS Manager UCSM, BMC, etc)
Service management (vCloud Director, BMC)
Customer access, self provisioning tools (BMC, Cisco Intelligent Automation for Cloud CIAC)

The service management layer needs to talk to the resource management layer in order to provision the physical resources from the pool. The elements in the resource management layer need APIs to make this possible. Cisco products like UCS and the Nexus line all have rich, complete APIs which make this automation possible.

Steps for moving to private cloud:

Consolidate
Virtualize
Automate
Self service provisioning

Cisco UCS⌗

Presented by Willem van Schaik (Cisco)

Cisco UCS

Fastest growing business ever at Cisco
$1bn dollar business
2nd in blade server market in North America
Cisco Reaches 10,000 UCS Customers

Nexus 5000 virtual chassis

Nexus 5k acts like a supervisor
Nexus 2k fabric extender (FEX) acts like remote line card
Nexus 5k + 2k model was extended to Cisco UCS
FEX is only network piece installed in the chassis
Management of the system is done via the Fabric Interconnect (fancy Nexus 5000s running UCS Manager software on top of NX-OS)

10Gbe benefits

More bandwidth is nice, but not the biggest win
Low latency is nice, but not the biggest win
Biggest win is consolidating multiple networks and traffic onto one cable
Builds on "virtualize everything" concept (virtualizing the network)
Wire once, use many
Or as Cisco says, wire for bandwidth not connectivity

Generic rackmount servers vs. generic blade servers:

Most rack server components (NICs ports, storage ports, etc) duplicated 1:1 on a blade
Same number of ports and connections to manage in a 16x blade enclosure as with 16 rackmount servers
Additionally, every blade enclosure is (at best) its own point of management (worst case, every single blade is its own point of management)
No opex savings with blades

With UCS:

All points of management moved to single spot: top of rack in the Fabric Interconnect
No mgmt point in back of the blade enclosure (only thing in the back is a dumb fabric extender/FEX)
UCS chassis are like a storage disk shelf. Plug and use, nothing to configure, no management points on the shelf/chassis
One point of management for N chassis/blades

UCS Manager (UCSM)

Management software for entire UCS environment
Runs on Fabric Interconnects
RBAC (Storage, server, network teams all use same console)
Full API for integrating with other tools. UCSM GUI written on top of the XML API
Other tools can use the API to do the same tasks as UCSM (eg, software from CA)
API allows management from any platform. Apps exist for iPhone/iPad, Android and Playbook
Central firmware mgmt for all components
Fabric Interconnect, Converged Network Adapters, FEX, etc
Firmware can be staged (redundant firmware banks) in hours and applied on a scheduled or off hours basis

Service profile

Wraps all the server parameters and policies into a template
Lives in UCSM not on the blade
Creates the personality of the server

The new 6200 series Fabric Interconnect

Unified ports
Port can be configured for data center bridging or fiber channel
6248 based on Nexus 5548
Expect to see 6296 based on Nexus 5596

UCS 2.0 increases bandwidth to blades

8 port FEX (up from 4 ports in first version)
Port channel can now be created between the FEX and the upstream Fabric Interconnect
No more pinning a blade to a specific upstream port on the FEX
1 blade can use up to 40Gb of bandwidth out of the chassis (limited by the bandwidth from the FEX to the blade; see the bullet point right below)
From FEX down to blade, increased lanes on the highway from 1x to 4x 10Gb lanes
Also in a port channel
40Gb per port per blade

VM-FEX: virtual NICs on the Palo card become remote ports on the Fabric Interconnect

FEX idea cascades from Fabric Interconnect to FEX and from FEX to Virtual Interface Card (Palo) on the blade
802.1Qbh/VN-Tag (pre-standard) used to identify traffic to and from individual VMs
Fabric Interconnect creates a logical interface for each VM (based on the VN-Tag). Other end of that logical interface is the virtual NIC created on the Palo card.
All switching done in the FI, ALWAYS. Even traffic between two VMs on the same vSphere host must flow up to the FI and then back down
Allows policy, control and monitoring of all traffic
VMDirectPath mode gives best performance
Switching all done in ASICs
In this mode, Vmotion is supported under ESXi 5

Data Center Networking⌗

Presented by Ronnie Scott (Cisco)

Fiber Channel over Ethernet is standardized. Period. Competitors are spreading FUD if they say otherwise.

Fiber channel encoding is inefficient

FC uses 8b/10b encoding
4Gb FC != 4Gb throughput
10Gb Ethernet uses more efficient encoding scheme. 4Gb FCoE is more efficient on the wire than native 4G FC
16G FC will have more than double the real throughput of 8G FC because it adopts the same encoding used with 10GbE.

Spanning Tree vs. Virtual Port Channel

50% of network bandwidth is wasted in an STP network
STP is absolutely necessary through to eliminate loops
VPC allows all links to be forwarding
Two upstream switches "collude and lie" to southbound switch (VPC == "lies and deceit protocol")

FabricPath

New method of transporting frames in the data center
No STP
All links active
Layer 2 adjacency between end devices over a routed, Layer 3 fabric
Spine and leaf design
High bisectional bandwidth
FP based on TRILL which was standardized before anyone had an implementation
TRILL has some things that need changing to add things that weren't in the standard and should be (according to Cisco)
Likely to see a TRILL v2 at some point
Work already underway on next version of TRILL
FabricPath enabled with one command per interface. No configuration needed.
Don't be surprised if FabricPath is the default interface mode in the data center one day

Overlay Transport Virtualization - OTV

Layer 2 adjacency between end devices in different data centers across a Layer 3 data center interconnect
Manages traffic tromboning of FHRP VIP (FHRP becomes active in both DCs)
Prevents STP BPDUs from crossing the DC interconnect
Enables Vmotion between data centers
Vmotion between data centers has distance/latency limits!
Kudos to Ronnie for actually bringing this up. Most discussions on OTV throw around the idea of long distance vmotion as a cure-all to any DR problems you might have.

Location/ID Separation Protocol - LISP

Decouple location from device identification
IP addresses today do both: they define the network (location) of a device and its identification (how do I talk to that device)
By decoupling the two, a device can have its location changed without changing its ID (ie, without changing how clients identify and connect to the device)
Allows you to move workloads around in the cloud and make sure those services and devices can be reached no matter where they are
No load balancing, no DNS changes, no client side changes. Works all in the network.

Powering the Cloud⌗

Presented by Willem van Schaik (Cisco)

From above: one of the elements of cloud is automation. In order to build automation:

You must understand all processes needed to provision a server/app today
The provisioning involves people from all teams (network, server, storage, application support). Everyone must be involved in setting the automation rules and workflows.

Cisco tools:

newScale catalog: self service portal
Cisco Tidal orchestrator: automation piece

Cisco bought newScale and Tidal to get best of breed self service and orchestration tools.

Orchestration must be full life cycle. You need to deprovision the stuff too at some point! Don't forget.

Amazon cloud is cheap when compared to in house infrastructure that is not being fully utilized (as is the case the majority of the time). Your private cloud is cheaper than Amazon if you're running at a very high level of utilization. If running at high utilization though, how do you handle spikes in load? Burst your spikes into the public cloud (ties back to "rent the peak" above). This requires rules though for which workloads can/should be moved into public cloud.

Using Cisco's own measurements, they flipped the 77% opex/33% capex ratio to 40% opex (keeping the lights on) and 60% capex (introducing new services) by moving to a full cloud architecture.

Buying a Vblock is like buying a car, there's limited options to choose from on the order sheet. You have to pick from the options the manufacturer has made available.

Flexpod is a set of reference designs (for Exchange, for Oracle, etc). You order all the parts yourself, which means you can customize the order, and you integrate the pieces on your own. Flexpod also contains benchmark information to help you with capacity planning.

Final thoughts: This stuff is not easy. Start small. Seek help from Cisco on the really hairy parts. If you can realize full cloud, the benefits are big in cost and time savings.

Blogging the Cloud Track at Cisco Plus 2011

Journey to the Cloud⌗

Cisco UCS⌗

Data Center Networking⌗

Powering the Cloud⌗

Related posts