Containers, Kubernetes & Databases

Unless you’ve been living in one for a couple of years you’ve probably been hearing about how containers and Kubernetes are the new way to deploy and manage applications.

Applications are being broken down into small services or ‘microservices’ and with all these new microservices comes the requirement to manage or ‘orchestrate’ them. This is where Kubernetes come in.

Microsoft have publicly released SQL Server 2017 & 2019 as Containers on Dockerhub.

As database specialists with great interest in new technologies, our tech experts Simon Lane and Tim Reddish followed Microsoft’s guide on how to deploy this container on AKS. Read their step-by-step findings below.

Microsoft’s guide seems very straightforward, but after reading it we wondered if – since Kubernetes originated at Google – we could leverage Google Cloud’s regional subnets, regional Kubernetes (GKE) and regional persistent disks to deploy SQL Server onto GCP in a failover configuration.

In theory, GCP’s regional persistent disks which synchronously replicate between zones, should allow a database deployment running on a regional Kubernetes cluster to failover to another availability zone in the event of zonal failure, and the regional disk should be automatically mounted on the new node allowing the database Pod to start in the failover zone.

We found that with small modifications to Microsoft’s config we could achieve this

Let’s start by creating a regional kubernetes cluster from GCP Cloud shell using a simple BASH script. We’re using pre-emptible nodes to save money on this dev project.


#!/bin/bash
# Find the Kubernetes latest master versions available in this region
LATEST_VESRSION=$(gcloud beta container get-server-config --region europe-west2 --format='value(validMasterVersions[0])')
CLUSTER=dev-cluster-1
PROJECT=xyn-development
REGION=europe-west2

# Create a regional VPC Native cluster with preemptible nodes & masters around the London area
gcloud beta container clusters create ${CLUSTER} \
  --cluster-version=${LATEST_VERSION} \
  --machine-type=n1-standard-4 \
  --region=europe-west2 \
  --num-nodes=1 \
  --preemptible \
  --region=${REGION} \
  --project=${PROJECT} \
  --enable-ip-alias \
  --node-locations=europe-west2-a,europe-west2-b
# Use gcloud to authenticate kubectl
gcloud container clusters get-credentials ${CLUSTER} --project ${PROJECT} --region ${REGION}

After completion the output should be something like:

NAME           LOCATION      MASTER_VERSION  MASTER_IP       MACHINE_TYPE   NODE_VERSION  NUM_NODES  STATUS
dev-cluster-1  europe-west2  1.11.8-gke.6    35.197.203.109  n1-standard-4  1.11.8-gke.6  2          RUNNING
Fetching cluster endpoint and auth data.
kubeconfig entry generated for dev-cluster-1.

Now we have a VPC native cluster running across two zones, we can start the configuration.

Firstly we have to define the storage with a Storage Class using a parameter called ‘replication-type’ set to ‘regional-pd

$ kubectl apply -f sc.yaml

sc.yaml

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: repd-eu-west2ab
provisioner: kubernetes.io/gce-pd
parameters:
  type: pd-standard
  replication-type: regional-pd
  zones: europe-west2-a, europe-west2-b

Now need to create a Persistent Volume Claim, a request for storage referencing the storage class:

$ kubectl apply -f pvc.yaml

pvc.yaml

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mssqldb-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 200Gi
  storageClassName: repd-eu-west2ab

Check the PVC:


$ kubectl get pvc
NAME          STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
mssqldb-pvc   Bound     pvc-3a38cce4-77c2-11e9-b69f-42010a9a0083   200Gi      RWO            repd-eu-west2ab   50s

At this point we can start the database. We will use a Deployment based on Microsoft’s YAML file. This will run a start a Pod with a single replica, requesting volumes using the PVC we created earlier:


deployment.yaml:

apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: mssql-deployment
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: mssql
    spec:
      terminationGracePeriodSeconds: 10
      containers:
      - name: mssql
        image: mcr.microsoft.com/mssql/server:2017-latest
        ports:
        - containerPort: 1433
        env:
        - name: MSSQL_PID
          value: "Developer"
        - name: ACCEPT_EULA
          value: "Y"
        - name: MSSQL_SA_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mssql
              key: SA_PASSWORD
        volumeMounts:
        - name: mssqldb
          mountPath: /var/opt/mssql
      volumes:
      - name: mssqldb
        persistentVolumeClaim:
          claimName: mssqldb-pvc

This deployment also needs a password passing to it. Kubernetes allows us to create a secret


$ kubectl create secret generic mssql --from-literal=SA_PASSWORD="DTab4VkVebb8T5Fm2019"

Deploy the instance:


$ kubectl apply -f deployment.yaml

Let’s view where the Pod is deployed:


$ kubectl get pods -l app=mssql -o wide
NAME                                READY     STATUS    RESTARTS   AGE       IP           NODE                                           NOMINATED NODE
mssql-deployment-69477bbb4c-kht9z   1/1       Running   0          1m        10.20.1.11   gke-dev-cluster-1-default-pool-2f445e15-4hpv   

To access the instance let’s setup a TCP load balancer to access this externally. This will be a regional resource that doesn’t proxy:


$ kubectl apply -f loadbalancer.yaml

loadbalancer.yaml

apiVersion: v1
kind: Service
metadata:
  name: mssql-lb-service
spec:
  type: LoadBalancer
  selector:
    app: mssql
  ports:
  - protocol: TCP
    port: 1433
    targetPort: 1433

$ kubectl get services
NAME               TYPE           CLUSTER-IP    EXTERNAL-IP      PORT(S)          AGE
kubernetes         ClusterIP      10.24.0.1                443/TCP          17m
mssql-lb-service   LoadBalancer   10.24.7.143   35.230.138.226   1433:30381/TCP   4m

Another way to find the load balancer IP:


$ kubectl get svc mssql-lb-service -o jsonpath='{.status.loadBalancer.ingress[].ip}'

Create an ingress firewall rule allowing port 1433 – ***Warning – this allows all internet IP ranges, restrict to source IPs if needed*** :


$ gcloud compute firewall-rules create allow-sql-server --allow tcp:1433

Let us now login to SQL Server from a linux server with the SQL Server command line sqlcmd:


$ sqlcmd -W -S tcp:35.230.138.226 -U sa -P DTab4VkVebb8T5Fm2019

1> select @@version
2> go

Microsoft SQL Server 2017 (RTM-CU14) (KB4484710) - 14.0.3076.1 (X64)
        Mar 12 2019 19:29:19
        Copyright (C) 2017 Microsoft Corporation
        Developer Edition (64-bit) on Linux (Ubuntu 16.04.6 LTS)

(1 rows affected)

A healthy SQL server instance isn’t much fun, let’s try and bring it down by simulating a zonal failure. We can do this by deleting the managed instance group  that is running this Pod. (Under normal circumstances we do not touch the instances and instance groups that run the GKE cluster).

GKE will have created two Zonal Managed Instance Groups which is how the cluster will autoscale horizontally. If we find the node name our SQL server is running on we can simulate a zonal failure by manually deleting that MIG.

Find the node instance the database Pod is currently running on:


$ kubectl get pods -l app=mssql -o jsonpath='{.items..spec.nodeName}'
gke-dev-cluster-1-default-pool-2f445e15-4hpv

In the GCP web console I can see that this instance is in a MIG called  gke-dev-cluster-1-default-pool-2f445e15-grp

By deleting this MIG manually we can simulate a zone disaster:


$ gcloud compute instance-groups managed delete gke-dev-cluster-1-default-pool-2f445e15-grp  --zone=europe-west2-a

We are logged out of SQL Server when the Pod is destroyed. After a couple of minutes the Pod has migrated to another zone and node instance, and the disk mounted in the new zone:


$ kubectl get pods -l app=mssql -o jsonpath='{.items..spec.nodeName}'
gke-dev-cluster-1-default-pool-7cd1df38-11vq

Let’s log in again:


$ sqlcmd -S tcp:35.242.164.155 -U sa -P DTab4VkVebb8T5Fm2019

1> select @@version
2> go

Microsoft SQL Server 2017 (RTM-CU14) (KB4484710) - 14.0.3076.1 (X64)
        Mar 12 2019 19:29:19
        Copyright (C) 2017 Microsoft Corporation
        Developer Edition (64-bit) on Linux (Ubuntu 16.04.6 LTS)

(1 rows affected)

We can look at SQL Server startup:


# sqlcmd -W -S tcp:35.230.138.226 -U sa -P DTab4VkVebb8T5Fm2019
1> exec sys.sp_readerrorlog 0,1, 'Starting';
2> go
LogDate ProcessInfo Text
------- ----------- ----
2019-05-16 11:14:36.390 Server SQL Server is starting at normal priority base (=7). This is an informational message only. No user action is required.
2019-05-16 11:14:36.930 spid6s Starting up database 'master'.
2019-05-16 11:14:37.850 spid6s SQL Server Audit is starting the audits. This is an informational message. No user action is required.
2019-05-16 11:14:38.030 spid23s Always On: The availability replica manager is starting. This is an informational message only. No user action is required.
2019-05-16 11:14:38.040 spid6s Starting up database 'msdb'.
2019-05-16 11:14:38.040 spid9s Starting up database 'mssqlsystemresource'.
2019-05-16 11:14:38.060 spid9s Starting up database 'model'.
2019-05-16 11:14:39.150 spid9s Starting up database 'tempdb'.

Logging in to the SQL Server container shows us the processes:


$ kubectl exec -it mssql-deployment-69477bbb4c-9krsm -- /bin/bash

# ps aux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           1  0.0  0.1 148380 23732 ?        Ssl  11:14   0:00 /opt/mssql/bin/sqlservr
root           9  1.1  5.5 4564944 854468 ?      Sl   11:14   1:21 /opt/mssql/bin/sqlservr
root        1048  0.2  0.0  18244  3180 ?        Ss   13:09   0:00 /bin/bash
root        1059  0.0  0.0  34424  2812 ?        R+   13:09   0:00 ps aux

There are a number of things we’d like to cover such as the performance side of running SQL Server on Kubernetes and we’ll pick these up in future articles.

To benchmark the performance of this SQL server deployment in GKE, we’d like to use HammerDB to get an On-Line Transaction Processing Benchmark. Of the varied tools available HammerDB is the leading benchmarking and load testing software for the world’s most popular databases and works well on SQL Server.

SQL Server running on GKE with a regional persistent disk for DR purposes recovered relatively quickly during our simulated failure. But what happens when this recovery time isn’t fast enough for systems that are mission critical? The go to On-Prem option for HA in SQL Server in recent years has always been Always On availability groups and this is also an option for SQL Server running in Kubernetes.