For the past few months I've been involved in a case study project with some colleagues at Cisco where we've been researching what the most relevant software skills are that Cisco's pre-sales engineers could benefit from. We're all freaking experts at Outlook of course (that's a joke ?) but we were interested in the areas of programming, automation, orchestration, databases, analytics, and so on. The end goal of the project was to identify what those relevant skills are, have a plan to identify the current skillset in the field, do that gap analysis and then put forward recommendations on how to close the gap.
This probably sounds really boring and dry, and I don't blame you for thinking that, but I actually chose this case study topic from a list of 8 or so. My motivation was largely selfish: I wanted to see first-hand the outcome of this project because I wanted to know how best to align my own training, study, and career in the software arena. I already believed that to stay relevant as my career moves along that software skills would be essential. It was just a question of what type of skills and in which specific areas.
I spent a long time creating my first Spark bot, Zpark. The first commit was in August and the first release was posted in January. So, six months elapsed time. It's also over-engineered. I mean, all it does is post messages back and forth between a back-end system and some Spark spaces and I ended up with something so complex that I had to draw a damn block diagram in the user guide to give people a fighting chance at comprehending how it works.
Its internals could've been much simpler. But that was part of the point of creating the bot: examining the proper architecture for a scalable application, learning about new technologies for building my own API, learning about message brokers, pulling my hair out over git's eccentricities and ultimately, having enough material to write this blog post.
In this post I'm going to break down the different functional components of Zpark, discuss what each does, and why-or not-that component is necessary. If I can achieve one goal, it will be to retire to a tropical island ASAP. If I can achieve a second goal, it will be to give aspiring bot creaters (like yourself, presumably) a strong mental model of a Spark bot to aid their development.
Cisco Encrypted Traffic Analytics (ETA) sounds just a little bit like magic the first time you hear about it. Cisco is basically proposing that when you turn on ETA, your network can (magically!) detect malicious traffic (ie, malware, trojans, ransomware, etc) inside encrypted flows. Further, Cisco proposes that ETA can differentiate legitimate encrypted traffic from malicious encrypted traffic.
The immediate mental model that springs to mind is that of a web proxy that intercepts HTTP traffic. In order to intercept TLS-encrypted HTTPS traffic, there's a complicated dance that has to happen around building a Certificate Authority, distributing the CA's public certificate to every device that will connect through the proxy and then actually configuring the endpoints and/or network to push the HTTPS traffic to the proxy. This is often referred to as "man-in-the-middle" (MiTM) because the proxy actually breaks into the encrypted session between the client and the server. In the end, the proxy has access to the clear-text communication.
Is ETA using a similar method and breaking into the encrypted session?
In this article, I'm going to use an analogy to describe how ETA does what it does. Afterwards, you should feel more comfortable about how ETA works and not be worried about any magic taking place in your network. ?
For a long while now I've been brainstorming how I could leverage the API that's present in the Cisco Spark collaboration platform to create a bot. There are lots of goofy and fun examples of bots (ie, Gifbot) that I might be able to draw inspiration from, but I wanted to create something that would provide high value to myself and anyone else that choose to download and use it. The idea finally hit me after I started using Zabbix for system monitoring. Since Zabbix also has a feature-rich API, all the pieces were in place to create a bot that would act as a bit of middle-ware between Zabbix and Spark. I call the bot: Zpark.
If you're an IT professional and you have at least a minimal awareness of what Cisco is doing in the market and you don't live under a rock, you would've heard about the major launch that took place in June: "The network. Intuitive." The anchor solution to this launch is Cisco's Software Defined Access (SDA) in which the campus network becomes automated, highly secure, and highly scalable.
The launch of SDA is what's called a "Tier 1" launch where Cisco's corporate marketing muscle is fully exercised in order to generate as much attention and interest as possible. As a result, there's a lot of good high-level material floating around right now around SDA. What I'm going to do in this post is lift the hood on the solution and explain what makes the SDA network fabric actually work.
I want to draw some attention to a new document I've written titled "Troubleshooting Cisco Network Elements with the USE Method". In it, I explain how I've taken a model for troubleshooting a complex system-the USE Method, by Brendan Gregg-and applied it to Cisco network devices. By applying the USE Method, a network engineer can perform methodical troubleshooting of a network element in order to determine why the NE is not performing/acting/functioning as it should.
So... I'm a little embarrased to admit this but I only very recently found out that there are significant differences in how Virtual Port Channels (vPC) behave on the Nexus 5k vs the Nexus 7k when it comes to forming routing adjacencies over the vPC.
I've read the vPC Best Practice whitepaper and have often referred others to it and also referred back to it myself from time to time. What I failed to realize is that I should've been taking the title of this paper more literally: it is 100% specific to the Nexus 7k. The behaviors the paper describes, particularly around the data plane loop prevention protections for packets crossing the vPC peer-link, are specific to the n7k and are not necessarily repeated on the n5k.
At Cisco's GSX conference at the start of FY17, the DevNet team made a programming scavenger hunt by posting daily challenges that required using things like containers, Cisco Shipped, Python, and RESTful APIs in Cisco software in order to solve puzzles. In order to submit an answer, the team created an API that contestants had to use (in effect creating another challenge that contestants had to solve).
This post contains the artifacts I created while solving some of the challenges.
I know it's cliche and I know I'm biased because I have an @cisco.com email address, but I've truthfully never seen anything like CPOC before. And the customer's I've worked with at CPOC haven't either. It's extremely gratifying to take something you built "on paper" and prove that it works; to take it to the next level and work those final kinks out that the paper design just didn't account for.
If you want more information about CPOC, get in touch with me or leave a comment below. Or ask your Cisco SE (and if they don't know, have them get in touch with me).
Anyways, on to the point of this post. When I was building the topology for the customer, I kept notes about random things I ran into that I wanted to remember later or those "oh duh!" moments that I probably should've known the answer to but had forgotten or overlooked at the time. This post is just a tidy-up of those notes, in no particular order.
I wanted to jot down some quick notes relating to running a virtual Firepower sensor on ESXi and how to validate that all the settings are correct for getting traffic from the physical network down into the sensor.
Firepower is the name of Cisco's (formerly Sourcefire's) so-called Next-Gen IPS. The IPS comes in many form-factors, including beefy physical appliances, integrated into the ASA firewall, and as a discrete virtual machine.
Since the virtual machine (likely) does not sit in-line of the traffic that needs to be monitored, traffic needs to be fed into the VM via some method such as a SPAN port or a tap of some sort.
The idea for this post came from someone I was working with recently. Thanks Fan (and Carson, and Shree) :-)
In Service Software Upgrade (ISSU) is a method of upgrading software on a switch without interrupting the flow of traffic through the switch. The conditions for successfully completing an ISSU are usually pretty strict and if you don't comply, the hitless upgrade can all of a sudden become impacting.
The conditions for ISSU on the Nexus 5000 are pretty well documented (cisco.com link) however, there are a couple bits of knowledge that are not. This post is a reminder of the ISSU conditions you need to comply with and a call out to the bits of information that aren't so well documented.
The oft-requested and long awaited arrival of TACACS+ support in Cisco's Identity Services Engine (ISE) is finally here starting in version 2.0. I've been able to play with this feature in the lab and wanted to blog about it so that existing ISE and ACS (Cisco's Access Control Server, the long-time defacto TACACS+ server) users know what to expect.
Below are five facts about how TACACS+ works in ISE 2.0.
I will be presenting at the Cisco Connect Canada tour in Edmonton and Calgary on November 3rd and 5th, respectively. My presentation is about that three letter acronym that everyone loves to hate: SDN :-)
I will talk about SDN in general terms and describe what it really means; what we're really doing in the network when we say that it's "software defined". No unicorns or fairy tales here, just engineering.
Next I'll talk about three areas where Cisco is introducing programmability into its data center solutions:
Application Centric Infrastructure
Virtual Topology System
Below are the notes I made for myself while researching these topics and preparing for the presentation. At the bottom of this post is a Q&A section with some frequently asked questions.
At the time that I'm writing this I've been working at Cisco for just over 3 years as a Systems Engineer. Prior to that I worked for multiple Cisco customers and was heavily involved in Cisco technologies. I know what a monster cisco.com is and how hard it can be to find what you're looking for.
Since starting at Cisco, the amount of time I've spent on cisco.com has shot up dramatically. Add to that studying for my CCIE and it goes up even more. In fact, cisco.com is probably the number 1 or 2 site I visit on a daily basis (in close competition with Google/searching).
After spending all this time on the site and given how vast the site is and how hard it can be to find that specific piece of information you're looking for, I'm writing this post as an aid to help other techies, like myself, use the site more effectively.
"I need a clarification, where if a member link fails, what will happen to the traffic already sent over that link ? Is there any mechanism to notify the upper layer about the loss and ask it to resend ? How this link failure will be handled for data traffic and control traffic ?"
— Mohamed Anwar
I think his questions are really important because he hits on two really key aspects of a failure event: what happens in the data plane and what happens in the control plane.
A network designer needs to bear both of these aspects in mind as part of their design. Overlooking either aspect will almost always open the network up to additional risk.
I think it's well understood that port channels add resiliency in the data plane (I cover some of that in the previous article). What may not be well understood is that port channels also contribute to a stable control plane! I'll talk about that below. I'll also address Mohamed's question about what happens to traffic on the failed link.
I was preparing a presentation the other day about the high level differences between IOS, IOS-XE and NX-OS and one of the things I included in the presentation was the various platform and branch identifiers that's used in each OS. It's just a bit of trivia that I thought would be interesting and might come in handy one day. I'm posting the information I collected below so everyone can reference it.
As a follow-on to my previous article on onePK — Cisco onePK: Now I Get It — I recorded a screencast in which I talk about what a onePK-enabled network is capable of. I also demonstrate two applications which make use of onePK to gather telemetry from the network and also program the network.
MTU Checker - Verifies that when the MTU of an interface is changed on the CLI, that the adjoining interface MTU matches Routing For Dollars - Programs the forwarding table of the routers in the network based on the cost — in terms of dollars — of the various links in the network Disclaimer: The opinions and information expressed in this blog article are my own and not necessarily those of Cisco Systems.
I had an opportunity recently to sit in a Cisco onePK lab and it opened my eyes to exactly what Cisco is doing with onePK, why it's going to be so important as Software Defined Networking (SDN) continues to gain traction, and why onePK is different than what anyone else is doing in the industry.
onePK is a key element within Cisco's announced Open Network Environment SDN strategy. onePK is an easy-to-use toolkit for development, automation, rapid service creation and more. It enables you to access the valuable data inside your network via easy-to-use APIs.
Since having my own eyes opened, I've been pondering how to explain my new found understanding in a way that others will grasp. In particular to business decision makers (BDMs) and technical decision makers (TDMs). I'm really, really, struggling to come up with a good analogy for BDMs. I'm still working on that one. Surprisingly, I'm also struggling to come up with a sound analogy that will work with the majority of TDMs that I know. Maybe I shouldn't be so surprised at that since all the TDMs I deal with are on the infrastructure side of things (networks, storage, compute, platform) and really don't deal with software. There's a gap there that I somehow need to bridge. I'm still pondering how to successfully do that.
However, there is a slice of the TDM population that I believe I can reach right now. These folks, like myself, have software and network experience. Maybe through open source projects, previous careers, or just mucking about with LAMP stacks in their own lab/home network, they understand programming semantics, APIs, and extending the functionality of third-party software.
I'm going to use a popular open source software package to draw some parallels with what Cisco onePK will soon allow organizations to do to their networks.
Here's a topic that comes up more and more now that FabricPath is getting more exposure and people are getting more familiar with the technology: Can FabricPath be used to interconnecting data centers?
FabricPath has some characteristics that make it appealing for DCI. Namely, it extends Layer 2 domains while maintaining Layer 3 — ie, routing — semantics. End host MAC addresses are learned via a control plane, FP frames contain a Time To Live (TTL) field which purge looping packets from the network, and there are no such thing as blocked links — all links are forwarding and Equal Cost Multi-Pathing (ECMP) is used within the fabric. In addition, since FabricPath does not mandate a particular physical network topology, it can be used in spine/leaf architectures within the data center or point-to-point connections between data centers.
Similar to my previous post on the Nexus 2000 (Nexus 2000 Model Number Cheat Sheet), this post will explain what the letters and numbers mean in the Nexus 7000 IO module part numbers. This will allow you to quickly identify the characteristics of the card just by looking at the part number which in turn should help you out as you're building BOMs and picking the right card for the job.
Update July 2, 2013: Updated to reflect release of the Nexus 7700 and F3 modules.
This is the third article in my series on Data Center Interconnection (DCI). In the first (Why is there a "Wrong Way" to Interconnect Data Centers?) I wrote about the risks associated with DCI when the method chosen is to stretch Layer 2 domains between the data centers.
In the second article (DCI: Why is Stretched Layer 2 Needed?) I wrote about why the need exists for stretching Layer 2 domains between sites and also touched on why it's such a common element in many DCI strategies.
I've been working on something that at this point in my career I never thought I'd be doing: another Cisco Certified Network Associate (CCNA) certification. The CCNA Voice, to be exact. Now that I'm in a job role where I'm expected to be somewhat of a jack-of-all-trades, I can no longer avoid learning voice :-) For a long time I've focused on just the underlying network bits and left the voice "stuff" to others. Since I now need to talk intelligently about Cisco voice solutions, products, and architectures, I decided to go through the CCNA Voice curriculum as a way to establish some foundational knowledge.
This post is about the tools and methods I used to build a small lab to support my studies.
I'm not sure why I've taken such an interest in mDNS, service discovery, and the Bonjour protocol, but I have. It probably has something to do with my not being able to use AirPlay at home for such a long time because, like any true network geek, I put my wireless devices on a separate VLAN from my home media devices. I mean, duh. So now I keep an eye out for different methods of enabling mDNS in the network in anticipation of my own experience in my home network becoming one of my customer's experience in their enterprise network.
Do you ever find yourself in a conversation with someone where you attempt to explain a concept in detail and you realize that you don't know that concept at the level of detail that you thought you did? That happened to me recently. I thought I had a better handle on TRILL and FabricPath than I really did. Since I retain things far better when I write them down, I'm going to blog the differences between TRILL and FabricPath when it comes to address learning and what role the control plane plays in building the network topology
Service Profile Renaming Yes, finally, you can rename service profiles. No more struggling to name your profiles perfectly the first time. When a profile is renamed, all the unique attributes including the MACs, WWNs, UUID, etc, are preserved. This can be done when the server is live and online without any impact.
VM-FEX for Microsoft Hyper-V and KVM In addition to vSphere, VM-FEX (which I've written about here) is now available when using the Hyper-V or KVM hypervisors on UCS.
A colleague of mine pointed something out the other day: the numbers and letters that make up the Nexus 2000 (FEX) model actually have meaning! No, I haven't been living under a rock. I think it's pretty clear that with a model number like "2248TP-E" the "22" indicates this is the 2200 series FEX and the "48" indicates it's got 48 ports. But what about the letters that follow the numbers?
As a follow-up to my previous article on Port Channels titled "4 Types of Port Channels and When They're Used" I wanted to talk a bit about the long-standing rule that says you should always create your Etherchannel (EC) bundles with a number of links that works out to a power of two (ie, 2,4 or 8 links). That rule is less applicable today than it used to be.
The other day I was catching up on recorded content from Cisco Live! and I saw mention of yet another implementation of port channels (this time called Enhanced Virtual Port Channels). I thought it would make a good blog entry to describe the differences of each, where they are used, and what platforms each is supported on.
Cisco's Identity Services Engine (ISE) is a powerful rule-based engine for enabling policy-based network access to users and devices. ISE allows policy enforcement around the Who?, What?, and When? of network access.
Who is this user? A guest? An internal user? A member of the Finance department?
What device is the user bringing onto the network? A corporate PC? A Mac? A mobile device?
When are they connecting? Are they connecting to the secure network during regular business hours or at 02:00 in the morning?
These questions can all be answered easily within ISE and are all standard policy conditions that are relatively easy to implement. In the post below I'm going to focus on the How? — How is the user or device connecting to the network? Asked another way, the question is Wired? or Wireless?
FabricPath is Cisco's proprietary, TRILL-based technology for encapsulating Ethernet frames across a routed network. Its goal is to combine the best aspects of a Layer 2 network with the best aspects of a Layer 3 network.
Layer 2 plug and play characteristics
Layer 2 adjacency between devices
Layer 3 routing and path selection
Layer 3 scalability
Layer 3 fast convergence
Layer 3 Time To Live field to drop looping packets
Layer 3 failure domain isolation
An article on FabricPath could go into a lot of detail and be many pages long but I'm going to concentrate on five facts that I found particularly interesting as I've learned more about FabricPath.
I read an excellent blog post by Scott Lowe (@scott_lowe) this week on Single Root I/O Virtualization (SR-IOV) titled "What is SR-IOV?". It's an older post but it did a great job of solidifying my understanding and filling in the knowledge gaps. One thing that stuck out was this bit:
SR-IOV requires support in the BIOS as well as in the operating system instance or hypervisor that is running on the hardware.
We're all hardcore network engineers here right? We all sling packets using nothing but the CLI on our gear? We've all got the "CLI OR DIE" bumper sticker? OK. We're all on the same page then. So, when you're configuring Cisco Identity Services Engine (ISE) and the documentation says it's mandatory to enable "ip http server" on your switches in order to do central web authentication (CWA) (ie, the captive portal for authenticating users on guest devices) that probably makes you uncomfortable right?
Fear not. It's not as bad as it sounds. I'll explain why.
The shared services area of the network is meant to provide common services — such as DNS, DHCP, and Internet access — to multiple logical networks/VRFs/customers. Cisco publishes a validated design for shared services that describes the use of multiple virtual firewalls and routers to provide connectivity between the shared services module and the VRFs in the network. I'm going to describe a method of collapsing the shared services firewalls and virtual routers into a single instance running on a single box using some of the features found in Juniper's Junos platform.
I attended the Cisco Plus Canada Roadshow in Calgary recently and sat in on a day of presentations related to Cisco's data center/cloud offerings. The sessions where quite good and I ended up taking quite a few notes. I thought I'd blog my notes in order to share what was presented.
A great little "feature" of Cisco's Identity Services Engine is that out of the box, the administrator account expires after 45 days if the password is not changed during that time. The documentation says that if you have trouble logging in you should click the "Problem logging in?" link and use the default administrative user/pass. This is of course ridiculous and does not work.
Below are the steps for properly resetting an admin password and for changing the security policy so the lockout doesn't happen again.
This post is going to provide a very basic introduction to configuring VRFs on Cisco IOS and Juniper's Junos. There's so many configuration combinations and options for virtual routing that it would be impossible to go through everything in great detail. At the end of the post I'll provide links to documentation where you can get detail if you want it.
I read two interesting articles on VTP (Cisco's VLAN Trunking Protocol) this week.
The first is an older article from networkworld.com that reminds us all that VTP clients are also capable of updating VLANs on the network, not just servers.
When I first heard that a VTP client can update a VTP server under the right conditions, I was frankly a non-believer. No way. I'd seen evidence to the contrary in several documents at cisco.