EKS Terraform and Helm

17/09/18, by Dario Ferrer

This post is heavily focussed on the Kubernetes Load Balancer concept. This is the first post in a series about managing Kubernetes services in AWS and GCP. We are trying to use a pure Infrastructure as Code approach and SREs best practices. This is far from being the only post about the topic but we decided to write it precisely due to the lack of detailed technical information about some of the aspects surrounding Kubernetes(K8s) in the “cloud world”.

During the last years, a big chunk of DevOps and SRE communities are adopting Docker containers pipelines deployed on Kubernetes in the cloud, mainly AWS and GCP, which is a great tool, and I love it too, but some of the good SRE best practises are being relegated to a second place during this K8s transition and it should not be happening.

Infrastructure as Code

Since Terraform appeared in our lifes, some of us trashed Cloudformation (AWS), Deployment Manager (GCP), and some other tools like Canonical Juju, custom code, scripts, etc. Now we are speaking about pure Infrastructure as Code. It’s easy to think about infrastructure as network, VMs, firewall rules, etc. Following this principle non infrastructure bit would be data and the application code. If this is the case, then we would use Terraform or a similar tool to define the infrastructure. Then we would use some combination of deployment tools and configuration management solutions (ex: Jenkins + Puppet) to provision our applications. Well, this boundary is not so clear when you start talking about Kubernetes. There are some components of K8s that we consider to be part of the infrastructure, like network layers or load balancers. There is an interesting debate on whether these components should be considered part of the application, but I’m going to focus on the technical aspect of it.

Load Balancers

I think it is interesting to put some emphasis on this topic, as it is not clearly explained in the Kubernetes documentation. First of all, the concept of “load balancer” varies between the different cloud providers. A GCP HTTP global load balancer is not the same as AWS ALB. Also Kubernetes itself has a kind of load balancer built in the Ingress Controller which basically distributes traffic once the connection has reach the K8s cluster. In the official K8s documentation the “load balancer” is considered to be part of the K8s cluster. It has some integrations via Service Annotations to use specific resources form the different cloud providers, like an already existing ssl certificate for example. When you “deploy” this “Service” in your cluster, what it really does is to create an internal K8s pod with the Ingress Controller and “expose” it to the cluster ip addresses of the cluster’s “worker” instances (virtual machines). Then it’s provisioning a specific type of cloud native load balancer (depending on the Annotations) and attaching this load balancer to the cluster’s workers instances. So this “Kubernetes” load balancer is actually a cloud native load balancer attached to the kubernetes cluster. Personally, I found it very confusing that they treat the “External Load Balancer” as a Kubernetes native component. In my opinion this is happening because K8s was born in the cloud (GCP) and it was designed to integrate with cloud native resources.

Can we deploy the cloud load balancer separately and connect it to the Kubernetes cluster?

Yes

Which way is better?

If you are comfortable with K8s deployment commands and have an integrated way of deploying it, via Jenkins or Spinnaker, you may be happy with Kubernetes deploying and managing your cloud load balancers. In our case, we like to deploy all the infrastructure using Terraform, which is a pure “Infrastructure As Code” approach, we can then deploy related cloud resources to the load balancer, such as CDN distributions, firewall rules, subnets, etc, using Terraform as well. There are some other resources, like the network plugin to use in K8s, that follow the same pattern (native cloud vs kubernetes internal), but they are out of the scope of this article.

Example

Confusing? Let’s have a look at an example. This is a HELM Kubernetes package deployed in AWS EKS, I’ll showcase the 2 ways of deploying the “Load Balancer” service.

For the EKS cluster itself I used this EKS terraform module, along with this VPC module for creating 3 private and 3 public AWS VPC subnets, each pair (public/private) subnets in a different AZ (availability zone). Note: I’m not including here the whole Terraform code, but just the relevant snippets, please also note that they contain user defined variables, so just copy and paste won’t work for you. The EKS Terraform module will bring up a Kubernetes cluster that uses the network CNI plugin, this means that each pod will have a network ip address directly attached to an ENI (virtual network interface) which will belong to one of the K8s workers in the cluster. This means that the workers will have one or more ENIs with several secondary ip addresses. These addresses are part of the private subnets range that we have defined in our VPC. The security groups attached to those ENIs is has been created in Terraform as well.

HELM

We use HELM to package our Kubernetes deployments. In our template we use a public HELM chart for the Ingress Controller “Deployment”.

Extract of requirements.yaml:

dependencies:
  - name: nginx-ingress
    version: 0.15.0
    repository: https://kubernetes-charts.storage.googleapis.com

Load Balancer Managed by Terraform

This Chart has a lot of default parameters that can be overridden. We are passing the following config in values.yaml.

nginx-ingress:
  controller:
    hostNetwork: true
    service:
      type: NodePort
      annotations:
        kubernetes.io/ingress.class: nginx
      nodePorts:
        http: "30284"
      targetPorts:
        http: http

Right, the most important variable passed here is the nginx-ingress.controller.service.type=NodePort, this variable will prevent K8s from creating an AWS load balancer and will simply create an internal K8s load balancer “Service” that will expose the Ingress Controller pod and will listen at the workers ENI IPs. This is all living in “behind the nat” internal ip addresses. So, where is the Internet facing load balancer?. It is defined directly in Terraform, and it has a Terraform created ACM ssl cert and a Terraform created Route53 dns Alias record. Here is the Terraform code to define it.

acm.tf:

resource "aws_acm_certificate" "eks_certificate" {
  domain_name       = "*.${var.domain_name}"
  validation_method = "DNS"

  tags = {
    Name          = "wildcard.${var.domain_name}"
    Environment   = "${var.environment}"
    Description   = "Wildcard certificate for ${var.domain_name}"
    ManagedBy     = "Terraform"
  }
}

data "aws_route53_zone" "zone" {
  name         = "${var.domain_name}"
}

resource "aws_route53_record" "acm_validator" {
  name    = "${aws_acm_certificate.eks_certificate.domain_validation_options.0.resource_record_name}"
  type    = "${aws_acm_certificate.eks_certificate.domain_validation_options.0.resource_record_type}"
  zone_id = "${data.aws_route53_zone.zone.id}"
  records = ["${aws_acm_certificate.eks_certificate.domain_validation_options.0.resource_record_value}"]
  ttl     = 60
}

resource "aws_acm_certificate_validation" "dns_validation" {
  certificate_arn         = "${aws_acm_certificate.eks_certificate.arn}"
  validation_record_fqdns = ["${aws_route53_record.acm_validator.fqdn}"]
}

route53.tf:

resource "aws_route53_record" "myapp" {
  zone_id = "${data.aws_route53_zone.zone.id}"
  name    = "myapp.${var.domain_name}"
  type    = "A"

  alias {
    name                   = "${module.eks_alb.dns_name}"
    zone_id                = "${module.eks_alb.load_balancer_zone_id}"
    evaluate_target_health = true
  }
}

For the Load Balancer, we decided to use an AWS ALB (Application Load Balancer) which terminates the https connections and handles the ssl using the ACM certificate. It creates a target group using a port defined in the variable ${var.target_groups} which we defined as 30284, this is the same port that we specified for the Ingress Controller Service described above. Please note that we are also using a Terraform module from the Forge in the snippet below.

alb.tf:

module "eks_alb" {
  source                   = "terraform-aws-modules/alb/aws"
  version                  = "v3.4.0"
  load_balancer_name       = "eks-mayara-services"
  security_groups          = ["${module.eks.worker_security_group_id}", "${aws_security_group.alb_eks_sg.id}"]
  subnets                  = "${module.vpc.public_subnets}"
  tags                     = "${map("Environment", "${var.environment}")}"
  vpc_id                   = "${module.vpc.vpc_id}"
  https_listeners          = "${list(map("certificate_arn", "${aws_acm_certificate.eks_certificate.arn}", "port", 443))}"
  https_listeners_count    = "1"
  http_tcp_listeners       = "${list(map("port", "80", "protocol", "HTTP"))}"
  http_tcp_listeners_count = "1"
  target_groups            = "${var.target_groups}"
  target_groups_count      = "${length(var.target_groups)}"
  log_bucket_name          = "${aws_s3_bucket.log_bucket.id}"
}

Ok, so we have a Kubernetes Services listening in the private IPs of the workers instances and a public facing ALB, all the infra, networking, security groups, etc, has been defined and controlled using Terraform, great, but how do you attach the ALB to the K8s cluster? We just have to attach the autoscalling group or groups created for the EKS workers to the ALB target group created in the ALB module:

resource "aws_autoscaling_attachment" "asg_attachment_eks" {
  autoscaling_group_name = "${element(module.eks.workers_asg_names, count.index)}"
  alb_target_group_arn   = "${element(module.eks_alb.target_group_arns, count.index)}"
  count                  = "${length(var.target_groups)}"
}

As you can see, this is also managed by Terraform. That’s it, this works, it is pure infrastructure as code and can be deployed on multiple environments.

Load balancer defined in Kubernetes.

The other way of getting a similar solution is to let K8s to create the cloud load balancer, we control the load balancer type and characteristics based on the “Annotations” passed to the Ingress Controller Service. There are various (some of them very confusing) documentation sources out there, this a list of Annotations that I found to configure an ALB. In our example, we created instead the classic ELB Ingress Controller before switching to the full Terraform solution described above, you will need slightly different annotations for the Ingress controller but the functionality will be very similar.

So, this is how the HELM values.yaml looks like:

nginx-ingress:
  rbac:
    serviceAccountName: "ingress"
  controller:
    service:
      type: LoadBalancer
      targetPorts:
        http: http
        https: http
      annotations:
        service.beta.kubernetes.io/aws-load-balancer-ssl-cert: "arn:aws:acm:us-east-1:412633938064:certificate/076eaf7b-c86b-4320-a54b-fakeid987"
        service.beta.kubernetes.io/aws-load-balancer-backend-protocol: "http"
        service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "https"
        service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: '3600'

This works as well, an ALB is created and attached directly to the workers ASG, but:

You have to make sure that all your subnets are properly labeled, I found very little documentation on this by the time of deploying it, I labeled the subnets in Terraform as follows:

 public_subnet_tags         = {
    "kubernetes.io/role/elb" = ""
    "KubernetesCluster"      = "${var.cluster_name}"
  }
  private_subnet_tags                 = {
    "kubernetes.io/role/internal-elb" = ""
    "KubernetesCluster"               = "${var.cluster_name}"
  }

As you see in the Terraform snippet each subnet has two tags, the first one with an empty value and the key kubernetes.io/role/elb or kubernetes.io/role/internal-elb and the second one with the key KubernetesCluster and the actual name of your K8s cluster as value. Kubernetes will create the load balancer using the AWS api, it will also grab info about the subnets, security groups, ACM, etc, so you need to create an IAM role with permissions to do all that, allocate it to the workers Autoscaling group or instances and then create an RBAC mapping, so that Kubernetes auth also allows the Ingress Controlled pod to access the AWS api. This implies a bit of security concern as traditionally the Kubernetes applications are defined directly by the application developers that may not necessarily be aware of the security implications of managing such infrastructure. We created and applied the RBAC config in Terraform. In the following snippet we can see a new ServiceAccount called ingress with permissions to do all the required tasks.

rbac.yaml:

kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: ingress-role
rules:
- apiGroups: [""]
  resources: ["secrets", "configmaps", "services", "endpoints"]
  verbs:
    - get
    - watch
    - list
    - proxy
    - use
    - redirect
    - create
    - update
- apiGroups: [""]
  resources: ["pods"]
  verbs:
    - list
- apiGroups: [""]
  resources: ["events"]
  verbs:
    - redirect
    - patch
    - post
    - create
    - update
- apiGroups:
    - "extensions"
  resources:
    - "ingresses"
    - "ingresses/status"
  verbs:
    - get
    - watch
    - list
    - proxy
    - use
    - redirect
    - update
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: ingress-role
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: ingress-role
subjects:
- kind: ServiceAccount
  name: ingress
  namespace: kube-system

You’ll need to do a kubectl apply -f rbac.yaml to apply that, or integrate it in Terraform with a local provider. As you can see, we have first defined a role and then we have created and bound the ServiceAccount with the role. In the above values.yaml we can see that we are passing such property to the Service

nginx-ingress.rbac.serviceAccountName: "ingress"

Last but not least, we have to pass all those already created parameters (cert arn, subnets, SGs, etc) to the Service annotations. This can be automated with proper CD tools of course, but it adds unnecessary complexity.

So that is it, now a Kubernetes deployment will create the ELB for you. If you’re using HELM, like us, helm deploy . will deploy the whole app + ELB.

17/09/18 EKS Terraform and Helm, by Dario Ferrer