The AWS Cloud Development Kit (CDK) is an "open source software development framework to define your cloud application resources using familiar programming languages". When CDK launched in 2019, I remember reading the announcement and thinking, "Ok, AWS wants their own Terraform-esque tool. No surprise given how popular Terraform is." Months later, my friend and colleague Matt M. was telling me how he was using CDK in a project he was working on and how crazy cool it was.

I finally decided to give CDK a go for one of my projects. Here is what I discovered.

Composing and sharing

A key concept in CDK is that everything is a construct. A construct represents cloud components and can be as small as a single resource or much more complex such as a multi-account distributed application. Constructs can be nested allowing a construct to use other constructs. Constructs are composed into stacks that are deployed to AWS.

This concept of constructs becomes really powerful when you think about reusable infrastructure-as-code artifacts. For example, consider a scenario where you have to deploy an AWS Virtual Private Cloud (VPC) multiple times (perhaps in different accounts, or into different dev/test/prod environments). And let's say that you always want a specific Security Group configured in that VPC which allows ingress traffic from a jump host. The VPC and Security Group can all be defined in a CDK construct; the construct is made of the CDK code that defines this infrastructure. The code below demonstrates such a construct:

from aws_cdk import core as cdk
from aws_cdk.aws_ec2 import (
    Peer,
    Port,
    Protocol,
    SecurityGroup,
    Vpc
)

class CdkNowIGetIt(cdk.Construct):

    def __init__(self, scope: cdk.Stack, construct_id: str, **kwargs) -> None:
        super().__init__(scope, construct_id, **kwargs)

        # Create the VPC resource.
        self._vpc = Vpc(self, "MyVPC", cidr="10.10.0.0/16")
        # Create a Security Group within the VPC that is used to allow
        # management traffic from designated jump hosts.
        self._sg = SecurityGroup(self, "MySG", vpc=self._vpc,
                                 allow_all_outbound=False,
                                 description="Management traffic from jump boxes",
                                 security_group_name="jumpbox-mgmt-traffic")

        # Add ingress rules to the Security Group for the jump host
        # 10.255.0.10 to TCP/22 and TCP/3389.
        self._sg.add_ingress_rule(peer=Peer.ipv4("10.255.0.10/32"),
                                  connection=Port(protocol=Protocol.TCP,
                                                  string_representation="host1",
                                                  from_port=22,
                                                  to_port=22))
        self._sg.add_ingress_rule(peer=Peer.ipv4("10.255.0.10/32"),
                                  connection=Port(protocol=Protocol.TCP,
                                                  string_representation="host1",
                                                  from_port=3389,
                                                  to_port=3389))

This construct can be consumed by a CDK app to instantiate copies of the infrastructure. Since each copy is being created from the same code blueprint, they all end up looking the same, just as desired.

On the topic of composition, the code above shows an example of this. The Vpc and SecurityGroup classes are themselves CDK constructs. These constructs are authored by AWS as part of the aws-cdk.aws-ec2 Python module. These constructs are composed together to form a new construct called CdkNowIGetIt.

This idea can be taken even further. CDK constructs can be packaged and shared. So the example of deploying multiple copies of the VPC can be expanded to actually sharing the construct with other builders or engineers to allow them to deploy their own copies of the infrastructure. And coming back to the composability of CDK constructs, those engineers could compose the VPC construct together with their own or other third-party constructs to build their entire infrastructure stack. Imagine having your own library of constructs that are vetted and approved for use in your environment that developers then consume in their code. This would really help ensure consistency, repeatability, and governance of the infrastructure.

In the example below, I show one possible method for packaging the CDK construct from above which has been written in Python:

~/git/cdk-now-i-get-it% python3 setup.py sdist
running sdist
[...]
Writing cdk_now_i_get_it-0.0.1/setup.cfg
Creating tar archive
~/git/cdk-now-i-get-it% ls -l dist
total 4
-rw-r--r--  1 joel  joel  2299 Apr  4 15:50 cdk_now_i_get_it-0.0.1.tar.gz

AWS CloudFormation and Terraform both have their own concept of modules which have varying degrees of composability and reusability. In practice, I see composition happening much more with Terraform than CloudFormation, so I feel that Terraform and CDK are fairly evenly matched here.

CDK excels on the next point, however.

Native language

As the CDK website says, "AWS CDK uses the familiarity and expressive power of programming languages for modeling your applications". In other words, instead of using a language such as YAML, JSON, or something bespoke to model the infrastructure, CDK uses native TypeScript, Python, and other supported programming languages to define the infrastructure.

This ability opens up an amazing amount of possibilities. Not only can the code describe cloud infrastructure, but it can do anything else the language is capable of as well.

  • Conditionals (if some condition is true, build the infrastructure this way, else, build it that way)
  • Loops (!) (for i in 1..10, build a cloud resource)
  • Unit tests (mock the API calls; did my conditions, loops, and calls all execute as expected?)
  • Integrations with other systems (look up parameters in a third-party data source)

Building on the example VPC construct from above, consider that instead of hardcoding the IP address of the VPC and jump host and the ports the jump host is allowed to connect to, you want to dynamically acquire those values by looking them up in a database. For good measure, let's also throw in some conditions and loops.

Here is what the modified construct looks like:

from aws_cdk import core as cdk
from aws_cdk.aws_ec2 import (
    Peer,
    Port,
    Protocol,
    SecurityGroup,
    Vpc
)

class CdkNowIGetIt(cdk.Construct):

    def __init__(self, scope: cdk.Stack, construct_id: str,
                 vpc_cidr: str, jump_host: str, mgmt_ports: list,
                 **kwargs) -> None:
        super().__init__(scope, construct_id, **kwargs)

        # args:
        # - vpc_cidr (str): The CIDR range for the VPC.
        # - jump_host (str): An optional IP address for the jump host. If this
        #                    is not specified, the Security Group will not be
        #                    created.
        # - mgmt_ports (list): A list of TCP ports which the jump host is
        #                      allowed to connect to.

        # Create the VPC resource with the given CIDR range.
        self._vpc = Vpc(self, "MyVPC", cidr=vpc_cidr)

        # Security Group only created if the jump host parameter was
        # specified.
        if jump_host is not None and len(jump_host) > 0:
            self.create_sg(jump_host, mgmt_ports)

    def create_sg(self, jump_host, mgmt_ports):
        # Create a Security Group within the VPC that is used to allow
        # management traffic from designated jump hosts.
        self._sg = SecurityGroup(self, "MySG", vpc=self._vpc,
                                 allow_all_outbound=False,
                                 description="Management traffic from jump boxes",
                                 security_group_name="jumpbox-mgmt-traffic")

        # Add ingress rules to the Security Group
        for port in mgmt_ports:
            self._sg.add_ingress_rule(peer=Peer.ipv4(jump_host),
                                      connection=Port(protocol=Protocol.TCP,
                                                      string_representation="jump",
                                                      from_port=int(port),
                                                      to_port=int(port)))

And here is what the CDK app code looks like which calls the above construct:

from cdk_now_i_get_it.cdk_now_i_get_it_v2 import (
    CdkNowIGetIt as CdkNowIGetIt_v2
)

class MyStackv2(cdk.Stack):

    def __init__(self, scope: cdk.App, id: str, vpc_cidr: str,
                 jump_host: str, ports: list, **kwargs):
        super().__init__(scope, id, **kwargs)

        self._network = CdkNowIGetIt_v2(self, "CdkNowIGetItv2",
                                        vpc_cidr,
                                        jump_host,
                                        ports)

def get_params_from_database():
    # Ok, this isn't exacty a database, but it makes the point. This could
    # also be a call to a relational DB, an API call to an IPAM system, or
    # anything else.
    import csv
    with open("network.csv", newline='') as csvfile:
        reader = csv.reader(csvfile, delimiter=",")
        for row in reader:
            if row[0] == "MyVPC":
                return {
                    "vpc_cidr": row[1],
                    "jump_host": row[2],
                    "ports": row[3].split(":")
                }

app = cdk.App()

params = get_params_from_database()
stack_v2 = MyStackv2(app, "MyStackv2",
                     params["vpc_cidr"],
                     params["jump_host"],
                     params["ports"])
app.synth()

The network.csv file represents the configuration "database" and holds the VPC CIDR address, the jump host IP address, and TCP ports the jump host is allowed to connect to. (Note: the full code is available on Github)

Proven defaults

The fact that CDK provides proven defaults in its constructs is vastly underrated. Consider again the creation of a VPC. Creating a VPC itself isn't much use; the VPC needs subnets, one or more gateways, one or more route tables, security groups, and at least one network access control list. There could be upwards of a dozen additional resources that all need to be defined in the code.

By providing proven defaults, CDK creates these necessary resources as part of the Vpc construct. The way this works varies by construct, but the Vpc construct takes a CIDR range as a parameter which is assigned to the VPC. The construct then carves this CIDR up into 3 public and 3 private subnets , provides a NAT gateway per zone, and an internet gateway for the VPC.

These defaults don't work in every case, so it is possible to customize this behavior. But the best part is how easy that customization is. Customization here does not mean building or defining additional constructs. Again, this will depend on the construct, but for the Vpc construct it's just a matter of passing some parameters to the construct to modify its default behavior:

from aws_cdk import core as cdk
from aws_cdk.aws_ec2 import (
    Peer,
    Port,
    Protocol,
    SecurityGroup,
    SubnetConfiguration,
    SubnetType,
    Vpc
)

class CdkNowIGetIt(cdk.Construct):

    def __init__(self, scope: cdk.Stack, construct_id: str,
                 vpc_cidr: str, jump_host: str, mgmt_ports: list,
                 subnet_len: int, **kwargs) -> None:
        super().__init__(scope, construct_id, **kwargs)

        # args:
        # - vpc_cidr (str): The CIDR range for the VPC.
        # - jump_host (str): An optional IP address for the jump host. If this
        #                    is not specified, te Security Group will not be
        #                    created.
        # - mgmt_ports (list): A list of TCP ports which the jump host is
        #                      allowed to connect to.
        # - subnet_len (int): The prefix length for subnet CIDR addresses.

        # Create the VPC resource. The VPC does not have an internet gateway,
        # or NAT gateway. Subnets are created in 2 zones.
        subnets = [SubnetConfiguration(name="MyVPC-Private",
                                 subnet_type=SubnetType.ISOLATED,
                                 cidr_mask=subnet_len)]
        self._vpc = Vpc(self, "MyVPC", cidr=vpc_cidr,
                        max_azs=2,
                        nat_gateways=None,
                        subnet_configuration=subnets)
        ...

Summary

CDK rocks. Its use of native programming languages makes it incredibly powerful. The ability to use the full power of the native language you're writing your CDK code in makes CDK unique amoung infrastructure-as-code tools. And the proven defaults reduce the time it takes to get your code into usable shape. Need a VPC? It could be as little as one line of code. Creating a Lambda function? CDK will create the IAM role for you.

Reference

All of the code samples in this post are available on Github at github.com/knightjoel/cdk-now-i-get-it

To learn more about AWS CDK, visit these links:


Disclaimer: The opinions and information expressed in this blog article are my own and not necessarily those of Amazon Web Services or Amazon, Inc.