Unless you’ve been living in one for a couple of years you’ve probably been hearing about how containers and Kubernetes are the new way to deploy and manage applications.
Applications are being broken down into small services or ‘microservices’ and with all these new microservices comes the requirement to manage or ‘orchestrate’ them. This is where Kubernetes come in.
Microsoft have publicly released SQL Server 2017 & 2019 as Containers on Dockerhub.
As database specialists with great interest in new technologies, our tech experts Simon Lane and Tim Reddish followed Microsoft’s guide on how to deploy this container on AKS. Read their step-by-step findings below.
Microsoft’s guide seems very straightforward, but after reading it we wondered if – since Kubernetes originated at Google – we could leverage Google Cloud’s regional subnets, regional Kubernetes (GKE) and regional persistent disks to deploy SQL Server onto GCP in a failover configuration.
In theory, GCP’s regional persistent disks which synchronously replicate between zones, should allow a database deployment running on a regional Kubernetes cluster to failover to another availability zone in the event of zonal failure, and the regional disk should be automatically mounted on the new node allowing the database Pod to start in the failover zone.
We found that with small modifications to Microsoft’s config we could achieve this
Let’s start by creating a regional kubernetes cluster from GCP Cloud shell using a simple BASH script. We’re using pre-emptible nodes to save money on this dev project.
#!/bin/bash
# Find the Kubernetes latest master versions available in this region
LATEST_VESRSION=$(gcloud beta container get-server-config --region europe-west2 --format='value(validMasterVersions[0])')
CLUSTER=dev-cluster-1
PROJECT=xyn-development
REGION=europe-west2
# Create a regional VPC Native cluster with preemptible nodes & masters around the London area
gcloud beta container clusters create ${CLUSTER} \
--cluster-version=${LATEST_VERSION} \
--machine-type=n1-standard-4 \
--region=europe-west2 \
--num-nodes=1 \
--preemptible \
--region=${REGION} \
--project=${PROJECT} \
--enable-ip-alias \
--node-locations=europe-west2-a,europe-west2-b
# Use gcloud to authenticate kubectl
gcloud container clusters get-credentials ${CLUSTER} --project ${PROJECT} --region ${REGION}
After completion the output should be something like:
NAME LOCATION MASTER_VERSION MASTER_IP MACHINE_TYPE NODE_VERSION NUM_NODES STATUS
dev-cluster-1 europe-west2 1.11.8-gke.6 35.197.203.109 n1-standard-4 1.11.8-gke.6 2 RUNNING
Fetching cluster endpoint and auth data.
kubeconfig entry generated for dev-cluster-1.
Now we have a VPC native cluster running across two zones, we can start the configuration.
Firstly we have to define the storage with a Storage Class using a parameter called ‘replication-type’ set to ‘regional-pd’
$ kubectl apply -f sc.yaml
sc.yaml
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: repd-eu-west2ab
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-standard
replication-type: regional-pd
zones: europe-west2-a, europe-west2-b
Now need to create a Persistent Volume Claim, a request for storage referencing the storage class:
$ kubectl apply -f pvc.yaml
pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mssqldb-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 200Gi
storageClassName: repd-eu-west2ab
Check the PVC:
$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
mssqldb-pvc Bound pvc-3a38cce4-77c2-11e9-b69f-42010a9a0083 200Gi RWO repd-eu-west2ab 50s
At this point we can start the database. We will use a Deployment based on Microsoft’s YAML file. This will run a start a Pod with a single replica, requesting volumes using the PVC we created earlier:
deployment.yaml:
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: mssql-deployment
spec:
replicas: 1
template:
metadata:
labels:
app: mssql
spec:
terminationGracePeriodSeconds: 10
containers:
- name: mssql
image: mcr.microsoft.com/mssql/server:2017-latest
ports:
- containerPort: 1433
env:
- name: MSSQL_PID
value: "Developer"
- name: ACCEPT_EULA
value: "Y"
- name: MSSQL_SA_PASSWORD
valueFrom:
secretKeyRef:
name: mssql
key: SA_PASSWORD
volumeMounts:
- name: mssqldb
mountPath: /var/opt/mssql
volumes:
- name: mssqldb
persistentVolumeClaim:
claimName: mssqldb-pvc
This deployment also needs a password passing to it. Kubernetes allows us to create a secret
$ kubectl create secret generic mssql --from-literal=SA_PASSWORD="DTab4VkVebb8T5Fm2019"
Deploy the instance:
$ kubectl apply -f deployment.yaml
Let’s view where the Pod is deployed:
$ kubectl get pods -l app=mssql -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
mssql-deployment-69477bbb4c-kht9z 1/1 Running 0 1m 10.20.1.11 gke-dev-cluster-1-default-pool-2f445e15-4hpv
To access the instance let’s setup a TCP load balancer to access this externally. This will be a regional resource that doesn’t proxy:
$ kubectl apply -f loadbalancer.yaml
loadbalancer.yaml
apiVersion: v1
kind: Service
metadata:
name: mssql-lb-service
spec:
type: LoadBalancer
selector:
app: mssql
ports:
- protocol: TCP
port: 1433
targetPort: 1433
$ kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.24.0.1 443/TCP 17m
mssql-lb-service LoadBalancer 10.24.7.143 35.230.138.226 1433:30381/TCP 4m
Another way to find the load balancer IP:
$ kubectl get svc mssql-lb-service -o jsonpath='{.status.loadBalancer.ingress[].ip}'
Create an ingress firewall rule allowing port 1433 – ***Warning – this allows all internet IP ranges, restrict to source IPs if needed*** :
$ gcloud compute firewall-rules create allow-sql-server --allow tcp:1433
Let us now login to SQL Server from a linux server with the SQL Server command line sqlcmd:
$ sqlcmd -W -S tcp:35.230.138.226 -U sa -P DTab4VkVebb8T5Fm2019
1> select @@version
2> go
Microsoft SQL Server 2017 (RTM-CU14) (KB4484710) - 14.0.3076.1 (X64)
Mar 12 2019 19:29:19
Copyright (C) 2017 Microsoft Corporation
Developer Edition (64-bit) on Linux (Ubuntu 16.04.6 LTS)
(1 rows affected)
A healthy SQL server instance isn’t much fun, let’s try and bring it down by simulating a zonal failure. We can do this by deleting the managed instance group that is running this Pod. (Under normal circumstances we do not touch the instances and instance groups that run the GKE cluster).
GKE will have created two Zonal Managed Instance Groups which is how the cluster will autoscale horizontally. If we find the node name our SQL server is running on we can simulate a zonal failure by manually deleting that MIG.
Find the node instance the database Pod is currently running on:
$ kubectl get pods -l app=mssql -o jsonpath='{.items..spec.nodeName}'
gke-dev-cluster-1-default-pool-2f445e15-4hpv
In the GCP web console I can see that this instance is in a MIG called gke-dev-cluster-1-default-pool-2f445e15-grp
By deleting this MIG manually we can simulate a zone disaster:
$ gcloud compute instance-groups managed delete gke-dev-cluster-1-default-pool-2f445e15-grp --zone=europe-west2-a
We are logged out of SQL Server when the Pod is destroyed. After a couple of minutes the Pod has migrated to another zone and node instance, and the disk mounted in the new zone:
$ kubectl get pods -l app=mssql -o jsonpath='{.items..spec.nodeName}'
gke-dev-cluster-1-default-pool-7cd1df38-11vq
Let’s log in again:
$ sqlcmd -S tcp:35.242.164.155 -U sa -P DTab4VkVebb8T5Fm2019
1> select @@version
2> go
Microsoft SQL Server 2017 (RTM-CU14) (KB4484710) - 14.0.3076.1 (X64)
Mar 12 2019 19:29:19
Copyright (C) 2017 Microsoft Corporation
Developer Edition (64-bit) on Linux (Ubuntu 16.04.6 LTS)
(1 rows affected)
We can look at SQL Server startup:
# sqlcmd -W -S tcp:35.230.138.226 -U sa -P DTab4VkVebb8T5Fm2019
1> exec sys.sp_readerrorlog 0,1, 'Starting';
2> go
LogDate ProcessInfo Text
------- ----------- ----
2019-05-16 11:14:36.390 Server SQL Server is starting at normal priority base (=7). This is an informational message only. No user action is required.
2019-05-16 11:14:36.930 spid6s Starting up database 'master'.
2019-05-16 11:14:37.850 spid6s SQL Server Audit is starting the audits. This is an informational message. No user action is required.
2019-05-16 11:14:38.030 spid23s Always On: The availability replica manager is starting. This is an informational message only. No user action is required.
2019-05-16 11:14:38.040 spid6s Starting up database 'msdb'.
2019-05-16 11:14:38.040 spid9s Starting up database 'mssqlsystemresource'.
2019-05-16 11:14:38.060 spid9s Starting up database 'model'.
2019-05-16 11:14:39.150 spid9s Starting up database 'tempdb'.
Logging in to the SQL Server container shows us the processes:
$ kubectl exec -it mssql-deployment-69477bbb4c-9krsm -- /bin/bash
# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.1 148380 23732 ? Ssl 11:14 0:00 /opt/mssql/bin/sqlservr
root 9 1.1 5.5 4564944 854468 ? Sl 11:14 1:21 /opt/mssql/bin/sqlservr
root 1048 0.2 0.0 18244 3180 ? Ss 13:09 0:00 /bin/bash
root 1059 0.0 0.0 34424 2812 ? R+ 13:09 0:00 ps aux
There are a number of things we’d like to cover such as the performance side of running SQL Server on Kubernetes and we’ll pick these up in future articles.
To benchmark the performance of this SQL server deployment in GKE, we’d like to use HammerDB to get an On-Line Transaction Processing Benchmark. Of the varied tools available HammerDB is the leading benchmarking and load testing software for the world’s most popular databases and works well on SQL Server.
SQL Server running on GKE with a regional persistent disk for DR purposes recovered relatively quickly during our simulated failure. But what happens when this recovery time isn’t fast enough for systems that are mission critical? The go to On-Prem option for HA in SQL Server in recent years has always been Always On availability groups and this is also an option for SQL Server running in Kubernetes.