Explain Cisco ETA to Me in a Way That Even My Neighbor Can Understand It

Cisco Encrypted Traffic Analytics (ETA) sounds just a little bit like magic the first time you hear about it. Cisco is basically proposing that when you turn on ETA, your network can (magically!) detect malicious traffic (ie, malware, trojans, ransomware, etc) inside encrypted flows. Further, Cisco proposes that ETA can differentiate legitimate encrypted traffic from malicious encrypted traffic.

Uhmm, how?

The immediate mental model that springs to mind is that of a web proxy that intercepts HTTP traffic. In order to intercept TLS-encrypted HTTPS traffic, there's a complicated dance that has to happen around building a Certificate Authority, distributing the CA's public certificate to every device that will connect through the proxy and then actually configuring the endpoints and/or network to push the HTTPS traffic to the proxy. This is often referred to as "man-in-the-middle" (MiTM) because the proxy actually breaks into the encrypted session between the client and the server. In the end, the proxy has access to the clear-text communication.

Is ETA using a similar method and breaking into the encrypted session?

In this article, I'm going to use an analogy to describe how ETA does what it does. Afterwards, you should feel more comfortable about how ETA works and not be worried about any magic taking place in your network. ?

Putting a Letter in the Post⌗

I'm going to compare encrypted communication with putting a letter into the post.

Small sidebar: ETA looks specifically at TLS encrypted traffic so that's what I'm drawing an analogy to; I won't claim that this makes sense for other encrypted protocols.

If I write a letter to you, then it means I've probably lost all access to email for some bizarre reason. It also means I have a choice to make: I could either write it on something like a postcard where the letter, postage, and addressing are all on the "outside" or, I can write the letter on a piece of paper and put the paper inside an envelope that has the postage and addressing on it.

The former option might be simpler, but at the cost of allowing anyone to read my letter as it's transported from me to you. The later option is very much like encrypting data traffic: the encryption is a wrapper around the data that protects the contents from being read by unintended third-parties. (You'll have to give me some leeway here and assume that while an envelope can easily be ripped open while the letter is in transit, this generally doesn't happen and if it does happen to you, then what the hell are you into to deserve that kind of attention?!)

Now we're going to assume that while the letter is waiting in your mailbox, that neighbor of yours who is always a little too interested in what's happening on your side of the fence gets their hands on it and starts to inspect it. This is the same neighbor that you're going to explain ETA to later on so remember to retrieve your mail from them at that time.

What sort of information can they learn about this piece of mail without ripping the envelope open? (They're nosy, not rude)

Watching Mail Delivery⌗

Let's assume Nosy Neighbor has been watching your mailbox for some time. They've been taking note of how much mail you receive, the size of each piece of mail, the weight, and how often you receive something from the same sender. They might notice patterns such as:

Mail from the same sender in a regular office-sized envelope that's received roughly every 4 weeks: likely some sort of account statement.
Mail from the same sender that's received in a larger, letter-sized envelope just once a year: could be some sort of annual report, perhaps for an investment.
Multiple pieces of mail received with each being from a unique sender all within a somewhat short window of time and each having the same size and weight: might indicate people sending RSVPs for an event such as a wedding.
Mail received in a cardboard box: very unlikely to be a letter and is likely delivery of a product, perhaps from an online retailer (which would be confirmed in the next section).

Data traffic can be inspected using a similar methodology. In ETA this is called the Sequence of Packet Lengths and Times (SPLT). SPLT observes the following metrics for the first few packets in an encrypted flow:

Packet length (size). This is like looking at the weight and dimensions of a piece of mail.
Time interval between packets. This is like looking at how often a similar piece of mail arrives from the same sender.

What this looks like visually is something like this:

The vertical lines below the horizontal line represent traffic from Google to the client and the veritcal lines above the horizontal represent the opposite direction.

You can see that when you represent the first few packets in a time-series like this that there is a noticeable pattern. And a small reminder: ETA isn't looking at packet contents at this point (that's the next section) and yet there's still a discernible pattern visible. Compare the benign traffic pattern above with this malicious pattern:

This is the pattern for a piece of malware called Bestafera (which, surprisingly, I cannot find any reference to link to). Even at a glance it's obvious that this is a very different application than the first. The time intervals between the packets are very different and the direction of traffic is opposite as well: this malware is exfiltrating data out to the destination (traffic above the horizontal line is from client to server) whereas Google was sending data towards the client.

The SPLT data is used to detect anomalous behavior; traffic patterns that don't conform to typical, benign traffic in the environment. Like the example above illustrates, if typical SPLT patterns reflect users doing common work activities such as browsing Facebook and tracking their Bitcoin portfolio, then it would be rather suspicious to see bulky traffic flowing out of the organization and perhaps even more so when the SPLT pattern shows the upload happening without much preamble between client and server.

A logical question at this point would be, "What if I'm uploading something to my cloud storage provider? How will I update my Gram feed? Does ETA classify that traffic as malicious?" That's why ETA looks at more than just SPLT: it also inspects the traffic itself.

Inspecting the Envelope⌗

Let's make some more assumptions about your neighbor (And why not? It's been fun so far). Let's assume they have done all they want to do with recording weight, size, etc, and they now pick up a piece of mail and start to inspect the envelope/packaging. What sort of data do they have access to?

Sender name and address
Recipient name and address
Is the sender a business?
Is the recipient a business?
Is the addressing hand-written or typed?
Does it have a postage stamp or a shipping label?
Does the envelope bend? Is there something hard and inflexible inside?
Does the envelope have one of those little windows on it?

You can start to see how a lot can be inferred about the contents of the mail from what's on the outside.

How does this work with encrypted traffic though considering it's... uhh... encrypted and whatnot. Well, while the actual data (ie, the letter) is inaccessible, there is some addressing and initial information that is exchanged in the clear, prior to encryption. Here's an example of what that looks like when you browse to packetmischief.ca:

I only expanded a couple of the branches in order to fit everything in the screenshot, but even still, there's a lot of detail here right in the clear.

The TLS version being proposed (v1.2)
The list of cipher suites that the client is proposing to use
The server name that the client is connecting to (www.packetmischief.ca)

In the collapsed branches of the packet there's also information about renegotiation parameters, elliptic curve algorithms, and a lot more.

If we look beyond these specific details and compare how different browsers and different TLS clients and libraries initialize a TLS session, some patterns start to emerge that can be used as signatures to identify, say, Firefox's NSS library vs a specific version of OpenSSL.

This TLS metadata and signature is used to help identify the type of application or library that has initiated the TLS session. If the signature indicates the client is a library like NSS or BoringSSL, then it's likely the client is a web browser. If the signature indicates a certain version of OpenSSL, then it might be possible to correlate that to something malicious that bundles that specific OpenSSL version. If the certificate from the server is self-signed, then that's a clue. And so on.

But now the logical question is, how does ETA eliminate false positives? How does ETA know that OpenSSL version such-and-such is an indicator of something suspicious? And does ETA provide an ETA on these answers?

The Magic Cloud⌗

If there is any actual magic involved in ETA, this is where it would come into play. And I say that because all I can really do for this section of the post is do some hand waving, throw out the magic words "machine learning", and hope that you nod and smile. I don't have a good understanding of how ML works and I believe neither does most of the general population. Hence... magic.

The bit I do understand is this: there are a bunch of fancy algorithms running in Cisco's cloud that gets sent some of the data that ETA collects from the network (not everything is sent up because, privacy). This data is used to train the algorithms to recognize what looks benign and what looks malicious for your specific environment. Part of the power of having all this run in Cisco's cloud is that the ML platform algorithms are further informed by the security intelligence data that Cisco collects from its global footprint of security appliances, firewalls, sensors, and partners (to the tune of 1.5 million malware samples, 600 billion email messages, and 16 billion web requests every day [Source: Cisco Live! BRKSEC-2010]). This is what the marketing material is describing when it says a "global threat map" is used to protect the "local environment".

Cisco calls their machine learning platform Cisco Cognitive Analytics (CTA) so when you see that name, whether in the context of ETA or some other security solution, just think "big compute brain in the cloud".

This big cloudy brain is using all of the data at its disposal plus all of its training on your specific data to classify each encrypted flow based on its SPLT and TLS signatures. And it can do this with impressive results: over 99% accuracy with only 1 false positive in 10,000 TLS sessions.

No Thanks, I Have my Secure Web Gateway⌗

Yep, fair enough. But consider this: do you only have a web gateway? Hopefully the answer is no and you have multiple security assets deployed to achieve a layered architecture. Layers, layers, layers. You can't rely on any one asset to protect you.

Further consider how hard it's becoming to apply secure web gateway technology to TLS encrypted sessions.

First, there's the long-running headaches of running a CA and deploying the CA's certificate to all of the client devices in the environment. All of the platforms, all of the operating systems. With all of their different management tools. And then there's the pain of redirecting flows to the gateway (WCCP anyone?!). But even if you've invested the time and money and have that all working, there's still...
Security mechanisms such as HTTP Public Key Pinning (HPKP) that instruct a web client to only connect to the site if the certificate the site presents is signed by a specific list of CAs. Hint: the certificate your web gateway presents to your clients will be signed by your internal CA which will not be on that list and your users will call to let you know how this displeases them.
A new version of TLS, v1.3, adds some changes to the protocol that break web gateway (aka, "middleboxes") behavior with respect to intercepting and inspecting TLS-encrypted sessions. Even after the horrible initial breakage was fixed with draft 22 of the protocol, the spec still makes changes that will limit the information that web gateways are able to see in the TLS exchange making them less effective.

And also keep in mind that ETA is embedded in the network so it sees traffic to/from the Internet and also traffic that remains entirely inside your firewall (web gateways can't do anything about the traffic they don't see).

You should now be able to explain Cisco ETA to anyone who has a post box. Thanks for reading. Have a groovy day.

Disclaimer: The opinions and information expressed in this blog article are my own and not necessarily those of Cisco Systems.