Setting up private GKE automation using Terraform

Introduction

One of your most important decisions when creating a GKE cluster is deciding whether it will be public or private. Public Clusters are assigned with public and private IP addresses both and they can be accessed anywhere on the internet. However, on the other hand, Private clusters are assigned with a private IP address. Having said that, clusters are isolated from inbound and outbound traffic. Nodes in the private cluster cannot be accessed by the client. The nodes can only connect to the internet with the cloud NAT. So, this provides better security.

‍

Today, we are going to learn and deploy GKE private clusters. In GKE, private clusters are the clusters whose nodes are isolated from inbound and outbound traffic by assigning them internal IP addresses only. Private clusters in GKE have the option of exposing the control plane endpoint as a publicly accessible address or as a private address.

‍

Please keep in mind, Nodes in private clusters are assigned with the Private IP. this means they are isolated from inbound and outbound communication until you configured cloud NAT. NAT service will allow nodes in the private network to access the internet, enabling them to download the required images from the Docker hub or the public registry else go for the private registry if you restrict both incoming and outgoing traffic. One more point that we should remember is that Private clusters can also have private endpoints as well as public endpoints.

‍

Kubernetes (K8s) is one of the best container orchestration platforms today, but it is also complex. Because Kubernetes aims to be a general solution for specific problems, it has many moving parts. Each of these parts must be understood and configured to meet your unique requirements.

‍

In this demo, we will create the following resources

A network named vpc-h-cluster-1.

A Subnetwork named subnet1.

A private cluster named gke-h-cluster-1 has private nodes and has client access to the public endpoint.

Managed node pool with n sets of nodes. This will be based on environment.

A Cloud Nat gateway named nat-h-cluster-1

‍

To provide outbound internet access for your private nodes, such as to pull images from an external registry, use Cloud NAT to create and configure a Cloud Router. Cloud NAT lets private clusters establish outbound connections over the internet to send and receive packets.

‍

Pre-requisite

1. A GCP Account with one Project.

2. Service Account. Make sure the SA must have appropriate permission to provision the resources.

3. Gcloud CLI with the configured service account for terraform deployment.

4. Terraform

‍

Now we’re ready to get started:

Here is the Terraform configuration code that will be used to deploy our entire setup.

Following is the directory hierarchy:

Devops-toolchain

├── GKE_cluster_commission.sh

├── README.md

├── gke.tf

├── keys

│   └── env.key

├── nat.tf

├── outputs.tf

├── route.tf

├── terraform.tfstate

├── terraform.tfvars

├── variables.tf

└── vpc.tf

Step1.  We have the terraform source files placed in the gitlab repository. Please checkout to get the more understanding.

https://gitlab.com/tweeny-dev/devops-toolchain/-/tree/GKE-Cluster/AWS_to_GCP_migration_config

Step2. Following is the code for provisioning Virtual Private Cloud (VPC) with custom subnet creation.

‍

vpc.tf

1tterraform { 
2  required_providers { 
3    google = { 
4      source  = "hashicorp/google" 
5      version = "4.27.0" 
6    } 
7  } 
8 
9  required_version = ">= 0.14" 
10} 
11
12provider "google" { 
13  project = var.project_id 
14  region  = var.region 
15 # credentials = file(var.auth_file) 
16} 
17
18# VPC 
19resource "google_compute_network" "vpc" { 
20  name                    = "vpc-${var.cluster_name}" 
21#  creating custom subnet for VPC 
22   auto_create_subnetworks = "false" 
23} 
24
25resource "google_compute_subnetwork" "subnet" { 
26  name          = "subnet1"  
27# We are creating custom subnet for us-central1 region only 
28region        = "us-central1" 
29network       = google_compute_network.vpc.name 
30ip_cidr_range = "10.0.0.0/24" 
31}

‍

Step3. Following code will create the cluster and the node pool.

‍

gke.tf

1resource "google_container_cluster" "primary" {
2  name     = var.cluster_name
3  location = var.region
4
5  # We can't create a cluster with no node pool defined, but we want to only use
6  # separately managed node pools. So we create the smallest possible default
7  # node pool and immediately delete it.
8  remove_default_node_pool = true
9  initial_node_count       = 1
10
11  network    = google_compute_network.vpc.name
12  subnetwork = google_compute_subnetwork.subnet.name
13  private_cluster_config {
14    # Disabled private endpoint and public endpoint is enabled 
15    enable_private_endpoint = "false"
16  # Making the nodes as private by which they won't have public ip allocated
17    enable_private_nodes    = "true"
18    master_ipv4_cidr_block  = var.master_ipv4_cidr_block
19
20}
21
22ip_allocation_policy {
23  }
24dynamic "master_authorized_networks_config" {
25    for_each = var.master_authorized_networks_config
26    content {
27      dynamic "cidr_blocks" {
28        for_each = lookup(master_authorized_networks_config.value, "cidr_blocks", [])
29        content {
30          cidr_block   = cidr_blocks.value.cidr_block
31          display_name = lookup(cidr_blocks.value, "display_name", null)
32        }
33      }
34    }
35  }
36}
37
38#Separately Managed Node Pool
39resource "google_container_node_pool" "primary_nodes" {
40  name       = google_container_cluster.primary.name
41  location   = var.region
42  node_locations = var.node_locations
43  cluster    = google_container_cluster.primary.name
44  node_count = var.gke_num_nodes
45# we have the autoscaling enabled to scale the application when needed
46  autoscaling {
47    min_node_count = var.min_node
48    max_node_count = var.max_node
49  }
50
51  management {
52    auto_repair  = "true"
53    auto_upgrade = "true"
54  }
55  node_config {
56
57    labels = {
58      env = var.project_id
59    }
60
61    # preemptible  = true
62    image_type   = var.image_type
63    machine_type = var.machine_type
64    tags         = ["gke-node", "${var.project_id}-gke"]
65    metadata = {
66      disable-legacy-endpoints = "true"
67    }
68        service_account = var.service_account
69    oauth_scopes = [
70      "https://www.googleapis.com/auth/cloud-platform"
71    ]
72  }
73}
7410:37
75nat.tf
76module "cloud-nat" {
77  source                             = "terraform-google-modules/cloud-nat/google"
78  version                            = "~> 2.0"
79  project_id                         = var.project_id
80  region                             = var.region
81  router                             = google_compute_router.router.name
82  name                               = "nat-${var.cluster_name}"
83  source_subnetwork_ip_ranges_to_nat = "ALL_SUBNETWORKS_ALL_IP_RANGES"
84}
8510:37
86route.tf
87resource "google_compute_router" "router" {
88  project = var.project_id
89  name    = "router-${var.cluster_name}"
90  network = google_compute_network.vpc.name
91  region  = var.region
92}
9310:38
94variables.tf
95variable "project_id" {
96  description = "project id"
97}
98
99variable "region" {
100  description = "region"
101}
102
103
104variable "gke_num_nodes" {
105  description = "number of gke nodes"
106}
107
108variable "master_ipv4_cidr_block" {
109  description = "The IP range in CIDR notation (size must be /28) to use for the hosted master network"
110  type        = string
111  default     = "10.13.0.0/28"
112
113}
114variable "master_authorized_networks_config" {
115  description = <<EOF
116  The desired configuration options for master authorized networks. Omit the nested cidr_blocks attribute to disallow external access (except the cluster node IPs, which GKE automatically whitelists)
117  ### example format ###
118  master_authorized_networks_config = [{
119    cidr_blocks = [{
120      # We are not restricting the access to control pane. If needed then we need define CIDR range to allow the access.
121      cidr_block   = "0.0.0.0/0"
122      display_name = "example_network"
123    }],
124  }]
125EOF
126  type        = list(any)
127  default     = []
128}
129variable "cluster_name" {
130  description = "cluster name"
131  type        = string
132
133}
134
135variable "image_type" {
136  description = "container image type"
137  type        = string
138  default     = "cos_containerd"
139
140}
141
142variable "machine_type" {
143  description = "node image type"
144  type        = string
145  default     = "e2-standard-2"
146}
147
148variable "node_locations" {
149  description = "Zone names on which nodes will be provisioned"
150  type = list(string)
151}
152
153variable "min_node" {
154  description = "minimum no of nodes for autoscaling"
155  type        = number
156}
157
158variable "max_node" {
159  description = "maximum no of nodes for autoscaling"
160  type        = number
161}
162
163variable "service_account" {
164  description = "service account name"
165  type = string
166}

Step4. Now, its time to create the NAT-gateway for our private cluster. Here we have used the module which is capable to create the nat-gateway.

‍

nat.tf

1module "cloud-nat" { 
2  source                             = "terraform-google-modules/cloud-nat/google" 
3  version                            = "~> 2.0" 
4  project_id                         = var.project_id 
5  region                             = var.region 
6  router                             = google_compute_router.router.name 
7  name                               = "nat-${var.cluster_name}" 
8  source_subnetwork_ip_ranges_to_nat = "ALL_SUBNETWORKS_ALL_IP_RANGES" 
9}

Step5: nat-gateway needs the router configuration to have the communication with the outer network. Let’s define the router configuration.

‍

route.tf

1resource "google_compute_router" "router" {
2  project = var.project_id
3  name    = "router-${var.cluster_name}"
4  network = google_compute_network.vpc.name
5  region  = var.region
6}

Step6: We have made use of the variables in terraform script so now we need to define those variables.

‍

variables.tf

1variable "project_id" {
2  description = "project id"
3}
4
5variable "region" {
6  description = "region"
7}
8
9
10variable "gke_num_nodes" {
11  description = "number of gke nodes"
12}
13
14variable "master_ipv4_cidr_block" {
15  description = "The IP range in CIDR notation (size must be /28) to use for the hosted master network"
16  type        = string
17  default     = "10.13.0.0/28"
18
19}
20variable "master_authorized_networks_config" {
21  description = <<EOF
22  The desired configuration options for master authorized networks. Omit the nested cidr_blocks attribute to disallow external access (except the cluster node IPs, which GKE automatically whitelists)
23  ### example format ###
24  master_authorized_networks_config = [{
25    cidr_blocks = [{
26      # We are not restricting the access to control pane. If needed then we need define CIDR range to allow the access.
27      cidr_block   = "0.0.0.0/0"
28      display_name = "example_network"
29    }],
30  }]
31EOF
32  type        = list(any)
33  default     = []
34}
35variable "cluster_name" {
36  description = "cluster name"
37  type        = string
38
39}
40
41variable "image_type" {
42  description = "container image type"
43  type        = string
44  default     = "cos_containerd"
45
46}
47
48variable "machine_type" {
49  description = "node image type"
50  type        = string
51  default     = "e2-standard-2"
52}
53
54variable "node_locations" {
55  description = "Zone names on which nodes will be provisioned"
56  type = list(string)
57}
58
59variable "min_node" {
60  description = "minimum no of nodes for autoscaling"
61  type        = number
62}
63
64variable "max_node" {
65  description = "maximum no of nodes for autoscaling"
66  type        = number
67}
68
69variable "service_account" {
70  description = "service account name"
71  type = string
72}

Step7: We have only declared some of the variables and not defined their value. Terraform provides a way to define our secret variables in terraform.tfvars file. We have the following variables defined.

project_id = "<project-id>"
region     = "us-central1"
service_account = "<service-account-name>"

Step8: Finally, we want to print the basic cluster information as output post the successful provisioning.

Here is the information.

‍

output.tf

1output "region" {
2  value       = var.region
3  description = "GCloud Region"
4}
5
6output "project_id" {
7  value       = var.project_id
8  description = "GCloud Project ID"
9}
10
11output "kubernetes_cluster_name" {
12  value       = google_container_cluster.primary.name
13  description = "GKE Cluster Name"
14}
15
16output "kubernetes_cluster_host" {
17  value       = google_container_cluster.primary.endpoint
18  description = "GKE Cluster Host"
19}

Step9: We have prepared the parameterized shell script to run the terraform code to provision the private GKE cluster.

‍

GKE_cluster_commission.sh

1#!/bin/bash
2
3# This script takes 2 parameters CLUSTER_NAME and ENVIRONMENT (dev or prod)
4cd $BASE_PROJECT_PATH
5ENV=$2
6ROOT_DIR="$(pwd)"
7rm -rf .terraform
8if [[ $2 == "dev" ]]
9then
10        # DEV cluster deployment with 2 AZ and 2 nodes
11        terraform init
12        terraform validate
13        terraform plan -var "cluster_name=$1" -var-file "$ROOT_DIR/vars/$ENV.tfvars" -out=".$ENV.plan"
14        terraform apply ".$ENV.plan"
15elif [[ $2 == "prod" ]]
16then
17        # PROD cluster deployment with 3 AZ and 3 nodes
18        terraform init
19        terraform validate
20        terraform plan -var "cluster_name=$1" -var-file "$ROOT_DIR/vars/$ENV.tfvars" -out=".$ENV.plan"
21        terraform apply ".$ENV.plan"
22else
23        echo "Please enter DEV or PROD as input to 2nd paramater"
24fi

Execute the following command to run the script. We have the source code directory set in different file so we need to reference the source path while running the script. (/keys/env.key).

source ./keys/env.key && bash GKE_cluster_commission.sh <cluster-name> <environment>

For this demo, we are using the local terraform state file which will be limited to the system but we can keep the terraform state file in any cloud location (AWS S3 or GCS etc) by defining the backend configuration where we can keep the versions of state file and also can be accessible to multiple people.

‍

Step10: Terraform will take up to 20 minutes to provision the cluster and if everything goes well, we will see the following resources has been created.

Command: terraform state list

‍

Google Cloud Console Screenshots:

Private cluster status is green and ready to use

Cluster is regional and has the public end point enabled

Private cluster with custom VPC/subnet

Custom subnet with primary and secondary ip address

Cloud NAT

Now the question is How to Access/Connect your private cluster?

Since in this private cluster public endpoint is enabled, we can connect to the cluster anywhere from the outside of GCP network. We can restrict the access to the cluster by defining the CIDR range to the authorized network. In this demo, we have allowed all the network to connect to cluster in authorized network (0.0.0.0).

‍

It is possible to restrict the access only to specific instances which needs access to cluster by listing there ip address in the authorized network.

Ex: authorized_network = [10.20.30.40/32,36.78.134.80/32]

As our cluster has the public endpoint enabled and opened the connectivity from any network, we can connect to cluster from any server with following command.

‍

gcloud container clusters get-credentials cluster-1 --region asia-south1 --project oceanic-granite-383005

‍

Deploy sample application to Kubernetes cluster

As the cluster has been created and accessed the successfully, now its time to deploy the sample application. We are deploying nginx application to test the cluster.

‍

Run the following command to create the deployment

kubectl create deployment nginx –image=nginx

Following command will expose nginx through LoadBalancer

kubectl expose deployment nginx –type=LoadBalancer –port=80

Nginx deployment has been created and exposed to the external load balancer.