Zero RPO for TKG? How to get Synchronous Disaster Recover for your Tanzu Cluster

Kubecon and VMware Explore are coming up. One of our most popular sessions from our VMware Explore(and VMworld) is the Stretched Cluster for VMware/vVols. Now, you all may notice that SRM and other DR solutions do not work with Tanzu, but I want all of you to know that PX-DR Sync or Metro-DR is supported for Tanzu. This allows you to have ZERO RPO when failing Stateful workloads from 1 cluster to another. This can be from one vSphere cluster to another each running TKG.

Metro-DR

More information for how to setup Sync-DR with Tanzu can be found here in our docs page.

https://docs.portworx.com/operations/operate-kubernetes/disaster-recovery/px-metro/

Pay close attention to the docs as Tanzu has some special steps in the setup because of the way the Cloud Drives are created and managed with raw CNS volumes.

This is done with a shared etcd between the two distinct TKG clusters. That etcd can run at a third site where you would run the “witness node”. I run this in a standalone admin k8s cluster that runs all my internal services like etcd, externaldns, harbor and more. Just so you know this etcd is used by Portworx Enterprise only and is not the one used by k8s.

Slightly better image of Metro-DR

At the end of the process you have 2 TKG Clusters and 1 Portworx Cluster. We use Async schedules to copy the objects between clusters. The data is synchronously copied between nodes only limited by the latency. (Max for sync-dr is 10ms). This means the deployment for Postgres or Cassandra in the picture above is copied on a schedule and the non-live or target cluster is scaled to 0 replicas. The RPO is 0 since the data is copied instantly, the RTO is based on how fast you can spin up the replicas on the target.

Even though Portworx Enterprise and Metro-DR works with any storage target supported by Tanzu (VSAN, NFS Datastores, VMFS Datastores, other vVOls). The SPBM and vVols integrations from Pure Storage with the FlashArray are the most used anywhere. The effort for the integration and collaboration betweet Pure and VMware Engineering is amazing. Cody Hosterman and his team have done some amazing things. Metro-DR works great with Pure vVols and is the perfect cloud-native compliment to your stretched vVols VM’s using FlashArray ActiveCluster. If you are interested in using both together let your Pure Storage team know or send me a message on the twitter and I will track them down for you.

Setting up Portworx on a Tanzu Kubernetes Grid aka TKG Cluster

First, this process works today on clusters made with the TKG tool that does not use the embedded management cluster. For clarity I call those clusters TKC or TKC Guest Clusters. The run as VM’s. You just can’t add block devices outside of the Cloud Native Storage (VMware’s CSI Driver). At least I couldn’t.

Now TKG deploys using a Photon 3.0 template. When I wrote this blog and recorded the demo the current latest version is TKG 1.2.1 and the k8s template is 1.19.3-vmware.

Check the release notes here: https://docs.portworx.com/reference/release-notes/portworx/#improvements-4

First generate base64 encoded versions of your user and password to vCenter.

# Update the following items in the Secret template below to match your environment:

VSPHERE_USER: Use output of printf <vcenter-server-user> | base64
VSPHERE_PASSWORD: Use output of printf <vcenter-server-password> | base64

The vsphere-secret.yaml save this to a file with your own user and password to vCenter (from above).


apiVersion: v1
kind: Secret
metadata:
  name: px-vsphere-secret
  namespace: kube-system
type: Opaque
data:
  VSPHERE_USER: YWRtaW5pc3RyYXRvckB2c3BoZXJlLmxvY2Fs
  VSPHERE_PASSWORD: cHgxLjMuMEZUVw==


kubectl apply the above spec after you update the above template with your user and password.

Follow these steps:

# create a new TKG cluster
tkg create cluster tkg-portworx-cluster -p dev -w 3 --vsphere-controlplane-endpoint-ip 10.21.x.x 

# Get the credentials for your config
tkg get credentials tkg-portworx-cluster

# Apply the secret and the operator for Portworx
kubectl apply -f vsphere-secret.yaml
kubectl apply -f 'https://install.portworx.com/2.6?comp=pxoperator'

#generate your spec first, you get this from generating a spec at https://central.portworx.com
kubectl apply -f tkg-px.yaml 

# Wait till it all comes up.
watch kubectl get pod -n kube-system

# Check pxctl status
PX_POD=$(kubectl get pods -l name=portworx -n kube-system -o jsonpath='{.items[0].metadata.name}')
kubectl exec $PX_POD -n kube-system -- /opt/pwx/bin/pxctl status

You can now create your own or use the premade storageClass

kubectl get sc
NAME                             PROVISIONER                     RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
default (default)                csi.vsphere.vmware.com          Delete          Immediate           false                  7h50m
px-db                            kubernetes.io/portworx-volume   Delete          Immediate           false                  7h44m
px-db-cloud-snapshot             kubernetes.io/portworx-volume   Delete          Immediate           false                  7h44m
px-db-cloud-snapshot-encrypted   kubernetes.io/portworx-volume   Delete          Immediate           false                  7h44m
px-db-encrypted                  kubernetes.io/portworx-volume   Delete          Immediate           false                  7h44m
px-db-local-snapshot             kubernetes.io/portworx-volume   Delete          Immediate           false                  7h44m
px-db-local-snapshot-encrypted   kubernetes.io/portworx-volume   Delete          Immediate           false                  7h44m
px-replicated                    kubernetes.io/portworx-volume   Delete          Immediate           false                  7h44m
px-replicated-encrypted          kubernetes.io/portworx-volume   Delete          Immediate           false                  7h44m
stork-snapshot-sc                stork-snapshot                  Delete          Immediate           false                  7h44m

Now Deploy Kube-Quake

The example.yaml is from my fork of the kube-quake repo on github where I redirected the data to be on a persistent volume.

kubectl apply -f https://raw.githubusercontent.com/2vcps/quake-kube/master/example.yaml
deployment.apps/quakejs created
service/quakejs created
configmap/quake3-server-config created
persistentvolumeclaim/quake3-content created

k get pod
NAME                      READY   STATUS              RESTARTS   AGE
quakejs-668cd866d-6b5sd   0/2     ContainerCreating   0          7s
k get pvc
NAME             STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
quake3-content   Bound    pvc-6c27c329-7562-44ce-8361-08222f9c7dc1   10Gi       RWO            px-db          2m

k get pod
NAME                      READY   STATUS    RESTARTS   AGE
quakejs-668cd866d-6b5sd   2/2     Running   0          2m27s

k get svc
NAME         TYPE           CLUSTER-IP     EXTERNAL-IP   PORT(S)                                         AGE
kubernetes   ClusterIP      100.64.0.1     <none>        443/TCP                                         20h
quakejs      LoadBalancer   100.68.210.0   <pending>     8080:32527/TCP,27960:31138/TCP,9090:30313/TCP   2m47s

Now point your browser to: http://<some node ip>:32527
Or if you have the LoadBalancer up and running go to the http://<Loadbalancer IP>:8080

Portworx and TKG – Portworx Scalable Storage in TKG

Portworx + Pure Storage = awesome

I have recently been pretty occupied with learning TKG and oh yeah also Portworx. I wanted to share what I have learned so far when it comes to getting Portworx up and running in a TKG Cluster. So without too much introduction lets dive right in.

Create a new cluster

You need 3 worker nodes for Portworx.

tkg create cluster px1 --plan=dev -w 3

Install Portworx

Get IP’s for Ansible inventory
TKG uses DHCP for all of the deployed Kubernetes VM’s which is fine. This command will create an inventory.ini in order to run ansible playbooks against the cluster. Remember if you add nodes to update the inventory.ini.

kubectl get nodes -o jsonpath='{.items[*].status.addresses[?(@.type=="ExternalIP")].address}' | awk -v ORS='\n' '{ for (i = 1; i <= NF; i++) print $i }' >inventory.ini

Run the Ansible Playbook
This playbook is install the linux headers the TKG Photon template does not include. Copy this playbook and save it to playbook.yaml for example.

--- 
- hosts: all 
  become: yes 
  tasks: 
  - name: upgrade photon 
    raw: tdnf install -y linux-devel-$(uname -r)
ansible-playbook -i inventory.ini -b -v playbook.yaml -u capv

Notice that the username for the TKG nodes is capv.

# Follow this link from portworx for more details.

https://docs.portworx.com/cloud-references/auto-disk-provisioning/vsphere/

Create the vsphere credentials in a secret

Create a vsphere-secret.yaml file and paste the yaml below making sure replace the credentials with your own generated with the base64 example below.

#VSPHERE_USER: Use output of printf <vcenter-server-user> | base64
#VSPHERE_PASSWORD: Use output of printf <vcenter-server-password> | base64
apiVersion: v1
kind: Secret
metadata:
  name: px-vsphere-secret
  namespace: kube-system
type: Opaque
data:
  VSPHERE_USER: YWRtaW5pc3RyYXRvckB2c3BoZXJlLmxvY2Fs
  VSPHERE_PASSWORD: cHgxLjMuMEZUVw==

Then apply the secret

kubectl apply -f vsphere-secret.yaml

# Hostname or IP of your vCenter server

export VSPHERE_VCENTER=vc01.fsa.lab


# Prefix of your shared ESXi datastore(s) names. Portworx will use datastores who names match this prefix to create disks.

export VSPHERE_DATASTORE_PREFIX=px1


# Change this to the port number vSphere services are running on if you have changed the default port 443

export VSPHERE_VCENTER_PORT=443

export VSPHERE_DISK_TEMPLATE=type=thin,size=200

export VER=$(kubectl version --short | awk -Fv '/Server Version: /{print $3}')

curl -fsL -o px-spec.yaml "https://install.portworx.com/2.6?kbver=$VER&c=portworx-demo-cluster&b=true&st=k8s&csi=true&vsp=true&ds=$VSPHERE_DATASTORE_PREFIX&vc=$VSPHERE_VCENTER&s=%22$VSPHERE_DISK_TEMPLATE%22"

kubectl apply -f px-spec.yaml

So the curl command at the end of this code block will create the px-spec.yaml file that will install Portworx in your cluster. Notice all the variables that have to be set for this to work. If you skip any of these above or below you will have problems.

Create a repl = 3 storage class or whatever you want to test.

Copy the text below to a new file called px-repl3-sc.yaml

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
    name: px-repl3-sc
provisioner: kubernetes.io/portworx-volume
parameters:
   repl: "3"

Then apply the new StorageClass

kubectl apply -f px-repl3-sc.yaml

PX Backup also will get you the PX-Central UI

helm install px-backup portworx/px-backup --namespace px-backup --create-namespace --set persistentStorage.enabled=true,persistentStorage.storageClassName="px-repl3-s"

This will get you up and running on a trial license and enough to experiment and learn Portworx. If you are new to helm make sure to learn more here.

Kubespray and vSphere VMs

I build and destroy Kubernetes clusters nearly weekly. Doing it on VMs makes this super easy. I also need to demo Pure Service Orchestrator so having in guest iSCSI is a must. Following this repo should give any vSphere admin an easy way to learn kubectl, helm and PSO quite easily (of course PSO works with Pure FlashArray and FlashBlade). This uses Terraform to create the VM and Kubespray to install k8s. Ansible can also be used for a few automations of package installs and updates.

I am going to try something new and not recreate the github readme and just share the repo link.

https://github.com/2vcps/tf4vsphere

Migrate Persistent Data into PKS with Pure vVols

While I discussed in my VMworld session this week some of the architectural decisions to be made while deploying PKS on vSphere my demo revolved around once it is up and running how to move existing data into PKS.

First, using the Pure FlashArray and vVols we are able to automate that process and quickly move data from another k8s cluster into PKS. It is not limited to that but this is the use case I started with.

Part 1 of the demo shows taking the persistent data from a deployment on and cloning it over the vVol that is created by using the vSphere Cloud Provider with PKS. vVols are particularly important because they keep the data in a native format and make copy/replication and snapshotting much easier.

Part 2 is the same process just scripted using Python and Ansible.

Demo Part 1 – Manual process of migrating data into PKS

Demo Part 2 – Using Python and Ansible to migrate data into PKS

How to automate the Migration with some Python and Ansible

The code I used is available from code.purestorage.com. Which also links to the GitHub repo https://github.com/PureStorage-OpenConnect/k8s4vvols

They let me on a stage. Again. 🙂

Use PKS Enterprise on VMware SDDC and Pure Storage

Use PKS Enterprise on VMware SDDC and Pure Storage

Pivotal Container Services (PKS) provides a deeply integrated Kubernetes (k8s) architecture for the VMware SDDC. It is a joint engineering project from VMware and Pivotal. In my conversations with Pure Storage customers or potential customers around Kubernetes I often get asked about how Pure Storage can help a PKS Enterprise environment. The good news is there is a very easy path to utilizing k8s with Pure + VMware + PKS.

The Architecture

Using Pure with PKS is actually very straight forward. Since Pure FlashArray is already leading choice for all VMware environments it is not anything out of the ordinary to support PKS. 

Understanding the underlying technology that integrates PKS into VMware you may soon realize that highly reliable, stateless and shared storage is the best choice when deploying PKS. 

The choice between drivers (shown in the graphic above) to deliver the Storage is up to you. The vSphere Cloud Provider provides automated creation and management of the virtual disks presented to containers in PKS. This supports the use of vVols and enables great possibilities for your PKS environment.  Pure Service Orchestrator utilizes a direct connection to Pure Storage FlashArrays, FlashBlades and Cloud Block Stores. It is installed with a single Helm command or Kubernetes Operator. It includes Smart Provisioning in order to place volumes on the most optimal storage device in your fleet.

The choice of which tool will be dictated by your workload. It is not an exclusive choice either. It is easy to do both. After VMworld I hope to publish the details on how to install PSO on PKS. If you have really good github search foo you may be able to find the bosh deployment.

Highly Reliable

Pure Storage has measured 6×9’s of uptime across its customer base. Many storage solutions for container environments will require hours of planning and weeks of proper implementation to provide high availability. Do not spend time re-architecting your storage infrastructure for PKS. Spend your time delivering k8s to your customers so they can deliver innovation for your business.  Use the Pure Storage devices you already have. You may not even need a whole new dedicated array (don’t tell sales I said that). 

Stateless Arrays for Stateful Data

Migrating data should be eliminated from your daily tasks. As FlashArrays move further into the future where data always stays in place. The ability to keep the data in place for multiple hardware generations is a proven benefit of Pure. Migrating persistent storage in k8s even on VMware is a non-trivial task. Depending on your scale this could take weeks of planning and careful flawless execution to accomplish non-disruptively. The underlying hardware should not be a concern for delivering applications. Pure Storage has made this a reality since the FlashArray debut 7 years ago.

Shared Storage

Delivering highly reliable data across multiple PKS and vSphere clusters, allowing applications to failover if the compute in an availability zone becomes unavailable, is key to delivering a cloud experience for your k8s rollout. While the Pure sales teams would gladly help you acquire a FlashArray per vSphere cluster hosting PKS this is simply un-needed for nearly all situations. Especially as you start on your Kubernetes journey.

But Why PURE?

Simple; vVols on the FlashArray combined with the PKS integration with vSphere enables mobility of data and freedom unavailable on a legacy datastore. Have a group that rolled their own k8s? FlashArray can clone their persistent data instantly into PKS using vVols. Need to copy data from a bare metal (non-VM) k8s cluster to PKS? Pure vVols makes this possible. Have multiple k8s clusters within PKS today that require the same data for test/dev/prod Pure Storage enables this nearly instantly. Pure Storage FlashArray Snapshots and Clones move at the speed of an API call from any of our SDK’s from Python to Powershell to Ansible to Terraform and more to give you an easy way to fit Pure Storage into your Infrastructure as Code tools. 

You can probably spend the next 5 hours reading blogs and papers of all the other benefits of Pure Storage and they all apply to your PKS on vSphere environment but I wanted to provide a few examples directly related to operating PKS on Pure.

VMworld 2019 Session

In my session for VMworld in San Francisco I will demonstrate how Pure Storage is able to instantly migrate persistent volumes from “other” k8s clusters to PKS. Make sure you make it to this session if you considering PKS.

Thanks, @CodyHosterman. I am Incorrigible.

When Mr. Top10 vBlogger mentions you and your VMworld Session. It is appropriate to always say thank you. If you are interested in what is going on with Pure Storage at VMworld be sure to read through Cody’s post to see all of our sessions. I will have some demos in the booth of Kubernetes on VMware vSphere with PKS (and more). So please be sure to come by and check them out.

Unsure what Cody means…

VMworld 2018 in Las Vegas

I was going to write my own post, but Cody Hosterman already did a great one.

Cody’s VMworld 2018 and Pure Storage Blog

The sessions are filling up so it will be a good idea to register and get there early. I am very excited about talking about Kubernetes on vSphere. It will follow my journey of learning containers and Kubernetes over the last 2 years or so. Hope everyone learns something.

Last year,  here I am talking about containers in front of a container. Boom!

UNMAP – Do IT!

Pretty sure my friend Cody Hosterman has talked about this until he turned blue in the face.  Just a point I want to quickly re-iterate here for the record. Run unmap on your vSphere Datastores.

Read this if you are running Pure Storage, but even if you run other arrays (especially all-flash) find a way to do UNMAP on a regular basis:

http://www.codyhosterman.com/2016/01/flasharray-unmap-script-with-the-pure-storage-powershell-sdk-and-poweractions/

Additionally, start to learn the ins-n-outs of vSphere 6 and automatic unmap!

http://blog.purestorage.com/direct-guest-os-unmap-in-vsphere-6-0-2/

Speaking of In-n-out…. I want a double double before I start Whole 30.

in-n-out

Register: VMUG Webinar and Pure Storage September 22

Register here: http://tinyurl.com/pq5fd9k

September 22 at 1:00pm Eastern time Pure Storage and VMware will be highlighting the results of ESG Lab Validation paper. The study on consolidating workloads with VMware and Pure Storage used a single FlashArray //m50 and deployed five virtualized mission-critical workloads VMware Horizon View, Microsoft Exchange Server, Microsoft SQL Server (OLTP), Microsoft SQL Server (data warehouse) and Oracle (OLTP). While I won’t steal all the thunder it is good to note that all of this was run with zero tuning on the applications. Want out of the business of tweaking and tuning everything in order to get just a little more performance from your application? Problem Solved. Plus check out the FlashArray and the consistent performance even during failures.

Tier 1 workloads in 3u of Awesomeness

wpid1910-media_1442835406510.png

You can see in the screenshot the results of running tier one application on an array made to withstand real-world ups and downs of the datacenter. Things happen to hardware and software even, but it is good to see the applications still doing great. We always tell customers, it is not how fast the array is in a pristine benchmark, but how does it respond when things are not going well, when controller loses power or a drive (or two) fails. That is what sets Pure Storage apart (that and data reduction and real Evergreen Storage).

Small note: Another proven environment with near 32k block sizes. This one hung out between 20k and 32k, don’t fall for 4k or 8k nonsense benchmarks. When the blocks hit the array from VMware this is just what we see.

Register for the Webinar
http://tinyurl.com/pq5fd9k
You can win a GoPro too.