Storage Quotas in Kubernetes

One thing since we released Pure Service Orchestrator I get asked is, “How do we control how much developer/user can deploy?”

I played around with some of the settings from the K8s documentation for quotas and limits. I uploaded these into my gists on GitHub.

git clone git@gist.github.com:d0fba9495975c29896b98531b04badfd.git
#create the namespace as a cluster-admin
kubectl create -f dev-ns.yaml
#create the quota in that namespace
kubectl -n development create -f storage-quota.yaml
#or if you want to create CPU and Memory and other quotas too
kubectl -n development create -f quota.yaml

This allows users in that namespace to be limitted to a certain number of Persistent Volume Claims (PVC) and/or total requested storage. Both can be useful in scenarios where you don’t want someone to create 10,000 1Gi volumes on an array or create one giant 100Ti volume.

Credit to dilbert.com When I searched for quotas on the internet this made me laugh. I work with salespeople a lot.

VMworld 2018 in Las Vegas

I was going to write my own post, but Cody Hosterman already did a great one.

Cody’s VMworld 2018 and Pure Storage Blog

The sessions are filling up so it will be a good idea to register and get there early. I am very excited about talking about Kubernetes on vSphere. It will follow my journey of learning containers and Kubernetes over the last 2 years or so. Hope everyone learns something.

Last year, here I am talking about containers in front of a container. Boom!

Getting Started with Pure Service Orchestrator and Helm

Why Pure Service Orchestrator?

At Pure we have been working hard to develop a way to provide a persistent data layer that is able to meet the expectations of our customers for ease of use and simplicity. The first iteration of this was the release as the Docker and Kubernetes Plugins.

The plugins provided automated storage provisioning. Which solved a portion of the problem. All the while, we were working on the service that resided within those plugins. A service that would allow us to bring together managing many arrays. Both block and file.

The new Pure Service Orchestrator will allow smart provisioning over many arrays. On-demand persistent storage for developers placed on the best array or adhering to your policies based on labels.

To install you can use the traditional shell script as described in the readme file here.

The second way that may fit into your own software deployment strategy is using Helm. Since using Helm provides a very quick and simple way to install and it may be new to you the rest of this post will be how to get started with PSO using Helm.

Installing Helm

Please be sure to install Helm using the correct RBAC intructions.

I describe the process in my blog here.

http://54.88.246.86/2018/03/27/getting-started-with-helm-for-k8s/

Also, get acquainted with the official Helm documentation at the following site:

https://docs.helm.sh/using_helm/

Once Helm is fully functioning with your Kubernetes cluster run the following commands to setup and Pure Storage Helm repo:

helm repo add pure https://purestorage.github.io/helm-charts
helm repo update
helm search pure-k8s-plugin

Additionally, you need to create a YAML file with the following formate and contents:

arrays:
  FlashArrays:
    - MgmtEndPoint: "1.2.3.4"
      APIToken: "a526a4c6-18b0-a8c9-1afa-3499293574bb"
      Labels:
        rack: "22"
        env: "prod"
    - MgmtEndPoint: "1.2.3.5"
      APIToken: "b526a4c6-18b0-a8c9-1afa-3499293574bb"
  FlashBlades:
    - MgmtEndPoint: "1.2.3.6"
      APIToken: "T-c4925090-c9bf-4033-8537-d24ee5669135"
      NFSEndPoint: "1.2.3.7"
      Labels:
        rack: "7b"
        env: "dev"
    - MgmtEndPoint: "1.2.3.8"
      APIToken: "T-d4925090-c9bf-4033-8537-d24ee5669135"
      NFSEndPoint: "1.2.3.9"
      Labels:
        rack: "6a"

You can run a dry run of the installation if you want to see the output but not change anything on your cluster. It is important to remember the path to the yaml file you created above.

helm install --name pure-storage-driver pure/pure-k8s-plugin -f <your_own_dir>/yourvalues.yaml --dry-run --debug

If you are satisfied with the output of the dry run you can run the install now.

helm install --name pure-storage-driver pure/pure-k8s-plugin -f <your_own_dir>/yourvalues.yaml

Please check the GitHub page hosting the Pure Storage repo for more detail.

https://github.com/purestorage/helm-charts/tree/master/pure-k8s-plugin#how-to-install

Setting the Default StorageClass

Since we do not want to assume you only have Pure Storage in you environment we do not force ‘pure’ as the default StorageClass in Kubernetes.

If you already installed the plugin via helm and need to set the default class to pure run this command.

kubectl patch storageclass pure -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

If you have another storage class set to default and you wish to change it to Pure you must first remove the default tag from the other StorageClass and then run the command above. Having two defaults will produce undesired results. To remove the default tag run this command.

kubectl patch storageclass <your-class-name> -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'

Read more about these commands from the K8s documentation.

https://kubernetes.io/docs/tasks/administer-cluster/change-default-storage-class/

Demo

Maybe you are a visual learner check out these two demos showing the Helm installation in action.

Updating your Array information

If you need to add a new FlashArray or FlashBlade simply add the information to your YAML file and update via Helm. You may edit the config map within Kubernetes and there are good reasons to do it that way, but for simplicity we will stick to using helm for changes to the array info YAML file. Once your file contains the new array or label run the following command.

helm upgrade pure-storage-driver pure/pure-k8s-plugin -f <your_own_dir>/yourvalues.yaml --set ...

Upgrading using Helm

With the same general process you can use the following command and update the version of Pure Service Orchestrator.

helm upgrade pure-storage-driver pure/pure-k8s-plugin -f <your_own_dir>/yourvalues.yaml --version <target version>

Upgrading from the legacy plugin to the Helm version

Follow the instructions here:

https://github.com/purestorage/helm-charts/tree/master/pure-k8s-plugin#how-to-upgrade-from-the-legacy-installation-to-helm-version

There are a few platform specific considerations you should make if you are using any of the following.

Containerized Kubelet (Some flavors of K8s do this, Rancher and Openshift are two).
CentOS/RHEL Atomic Linux
CoreOS
OpenShift
OpenShift Containerized Deployment

Be certain to read through the notes if you use any of these platform versions.

https://github.com/purestorage/helm-charts/tree/master/pure-k8s-plugin#how-to-upgrade-from-the-legacy-installation-to-helm-version

https://github.com/purestorage/helm-charts/tree/master/pure-k8s-plugin#platform-specific-considerations

Creating a Helm Repo with Github

Next step in learning helm is being able to take an existing helm package and put it in your own repo.

There are ways to do this with github pages. I don’t really want mess withthat right now, how can I use a Github repo to host my changes to the deployment?

For installing helm and an additional demo please see part 1 of this series.

http://54.88.246.86/2018/03/27/getting-started-with-helm-for-k8s/

Continue reading “Creating a Helm Repo with Github”

Getting Started with Helm for K8s

Over the last few weeks I was setting up Kubernetes in the lab. One thing I quickly learned was managing and editing yaml files for deployments, services and persistent volume claims became confusing and hard. Even when I had things commited in github sometimes I would make edits then not push them then rebuild my K8s cluster.

The last straw was when 2 of our Pure developers said that editing yaml in vi wasn’t very cool and to start using helm.

Needless to say that was good advice. I still have to remember to push my repos to github. Now my demostration applications are more “cloud native”. I can create and edit them in one environment and use helm install in another and have it just work.

Continue reading “Getting Started with Helm for K8s”

Using Snapshots with the Pure Storage Plugin for Kubernetes

One request from customers is not only provision persistent storage for Kubernetes but also integrate into workflows that may need to snap and copy the data for different environments. Much like we do this with powershell or python for SQL and Oracle environments to accelerate development or QA. Pure has enabled snapshots using the Pure Provisioner as part of our Kubernetes Plugin.

In this demo I am showing how I can take a users data directory for JupyterHub and clone it for another user to take advantage of all the benefits of Pure’s snapshots and clones. You instantly get access to a copy of the dataset. The dataset doesn’t take up room on the backend storage. Only globally unique changes will grow the volume. In this use case the Data Science team will see increases in productivity as they are not waiting for data to download from the cloud or copy from another place on the array.

The command to run the snap using kubectl is below:

kubectl exec <pure provisioner pod name> -- snapshot create -n <namespace> <pvc-claim-name>

Kubernetes and the Pure Storage FlexVolume Plugin

First, if you are using Pure Storage and Kubernetes make life easier and take a look at our plugin. Now version 1.2.2 and GA.

https://hub.docker.com/r/purestorage/k8s/

Make sure the follow the directions on the page to pull and install the plugin. If you are using Openshift pay special attention to the Readme. I will post more on this in the near future.

Cockroach DB as our Persistent Database

I want to simulate a very easy database that I can easily use in a container. That is also not the same old. I built a Go app that will write to a database over and over to kind of demonstrate the inner workings of the plugin but not necessarily supply a performance test.

To learn more about the steps I use in the video to deploy and manage CRDB in K8s please check out this link. https://www.cockroachlabs.com/docs/stable/orchestrate-cockroachdb-with-kubernetes.html

With that said, please check out how to deploy and scale a database with a persistent data platform from a Pure FlashArray. Watch this in Full screen to make the CLI commands easier to see.

What you are seeing in the video:

Deploy the initial 3 pods with volumes automatically created and connected on the Pure FA.
Initialize the cluster.
Fail a node and watch K8s redeploy a new container and re-attach the data volume.
Run a load generation application as a K8s Job.
Scale the DB cluster out to 8 nodes.

What is next?

This is a really easy and quick demo but it show the ease of using the Pure Plugin to manage the persistent data, making sure you do not lose data in the event of app crashes. Also easily scaling. This can all be done via policy and the deployment can be made even easier using Helm. In a future post we will see how we can take advantage of these methods and keep the same highly available, high performance and very easy to use persistent data platform for your application.

Four Resources that Got Me Started with Kuberenetes

In the last post I mentioned there are resources that have already gone through that do a better job than me in helping you understand containers and Kubernetes.
So if you are a virtualization admin like me and want to make 2018 the year you know enough to be dangerous I suggest the following resources.

Do Nigel Poulton’s Docker Deep Dive. A foundational understanding to containers will help the orchestration parts make sense.https://app.pluralsight.com/library/
Read Nigel’s The Kubernetes Book
Do Kubernetes the Hard Way. Once you see this the options that make K8s easier will seem a lot cooler and you will understand what they do in the background.
Go and Play with Docker and Kubernetes. Free sandboxes for you to try out.
https://labs.play-with-docker.com/
https://labs.play-with-k8s.com/

Start thinking: Does this app need a VM or a container? Once you are asking the question you will begin to think critically about the choices.

I am not sure we all need to move 100% off of VM’s today. Starting to ask the questions will help prepare us to provide these services to our customers when the workloads and workflows that require them to arise.

Anaconda with Jupyter Notebooks on Kubernetes

WARNING: YAML Heavy post. Sorry.

So I have been internally debating the best way to share this latest little thing I was working on/ learning. My goal over 2018 is to post more on migrating applications from virtual to containers managed by K8s. That transition isn’t for everything and has definetley required diving more into applications. There are many Kubernetes concepts I am going to skip over as others may already have explained them better. I do plan on doing a vSphere to K8s quick and easy to help us VCP’s and other Virtual Admins get started.

OK, getting started. Define some concepts

Anaconda, Conda for short.

Conda is a python package and environment manager for Data Science. You can download Anaconda here:
https://www.anaconda.com/download/

I wanted to keep it running in my lab and even though it works just fine on my local laptop, I switch between PC and Mac (2 of them) and wanted my environment (and data) available from a central place. Plus, I can’t learn Kubernetes without real applications to run.

Jupyter

Jupyter is an open source web application that allows you to display interactive code, equations and visualizations. I use it for Data Analytics in Python.

http://jupyter.org/

So jupyter is an application that can run in your conda environment. I want to run it as a container with persistent NFS storage in my Kuberenetes cluster in my basement. Notebooks are the files that contain the code and visualizations. I can post notebooks to github to allow others to test my work. In the github repo, I included a very basic file with some python. Once you have this all running you can play with it if you would like.

So how to get it to run. ContinuumIO the keepers of Anaconda provide a container image and some basic instructions for running the container on Docker. I googled for ways that people provide this in cluster environment. In the near future Jupyterhub will be the solution for you if you want multi-tenant jupyter deployments with Oauth and all kinds of fancy features I do not need in my tiny lab.

The following files are all available on my github at Conda-K8s. This worked in my environment with Kuberenetes 1.9. Your mileage may vary depending on access rights, version and anything you do that I don’t know about.

First create the persistent volume you will need to create and edit the following nfs-pv.yaml file.

nfs-pv.yaml

apiVersion: v1
kind: PersistentVolume
metadata:
  name: conda-notebooks
spec:
  capacity:
    storage: 100Gi
  accessModes:
    - ReadWriteMany
  nfs:
    # FIXME: use the right IP and the right path
    server: 192.168.x.x
    path: "/nfs/repos/yourvalidpath"

First make sure you edit the file with your NFS server IP and valid already created path to your NFS Share. This is where your jupyter notebook data will be stored. If the POD crashes or the host server dies it will start elsewhere in the cluster, your data will persist. Brilliant!

via GIPHY

IF you want an automated way to create, mount and manage these volumes with Pure Storage check our our awesome flexvolume plugin for Kubernetes. Right now we will focus on making it work with any NFS path. This is manual and slow, so if you are serious about analytics get the plugin, and a FlashBlade.

$kubectl create -f nfs-pv.yaml

Then to view if your volume is ready run:

$kubectl get pv

Output for my system

NAME                                 CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS     CLAIM                                        STORAGECLASS   REASON    AGE
claim-jowings                        10Gi       RWX            Retain           Released   jupyter4me/hub-db-dir                                                 3d
conda-notebooks                      100Gi      RWX            Retain           Bound      default/conda-claim                                                   3d

Now that the volume object is created we can now create the “claim”
I am not going to get into the why of doing this but as far as my tiny brain can understand it is the way K8s manages what application can connect with what persistent volume. Notice how the request section of the yaml is asking for 100Gi, the size of my volume in the last step.

nfs-pvc.yaml.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: conda-claim
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: ""
  resources:
    requests:
      storage: 100Gi

kubectl create -f nfs-pvc.yaml

To view the results

kubectl get pvc

Finally we can create the POD. The pod is what kubernetes uses to schedule a application and its most basic component. It can be just one container. It can be more, for now we won’t get into what all that means.

conda-pod.yaml

kind: Pod
apiVersion: v1
metadata:
  generateName: conda-
  labels:
    app: conda
spec:
  volumes:
    - name: conda-volume
      persistentVolumeClaim:
       claimName: conda-claim
  containers:
    - name: conda
      image: continuumio/anaconda3
      env:
      - name: JUPYTERCMD
        value: "/opt/conda/bin/conda install jupyter nb_conda -y --quiet && /opt/conda/bin/jupyter notebook --notebook-dir=/opt/notebooks --ip='*' --port=8888 --no-browser --allow-root"
      command: ["bash"]
      args: ["-c","$(JUPYTERCMD)"]
      ports:
        - containerPort: 8888
          name: "http-server"
      volumeMounts:
        - mountPath: "/opt/notebooks"
          name: conda-volume

If you take a look at the file above there are some things we are doing to get conda and jupyter to work. First notice the “env” section I created. I didn’t want to create a custom container image but rather use the default image provided by continuumio. I don’t want to accidentally become reliant on my own proprietary image. Without the command and the arguments in the $JUYPTERCMD environment variable, the container starts, has nothing to do, and shuts down. K8s sees this as a failure so it starts it again (and again and again). Also we see in the volumes section we are telling the POD to use our “conda-claim” we created in the last step. Under containers the volumeMounts declaration tells k8s to mount the pv to the mountPath inside the container.

kubectl create -f conda-pod.yaml

Now lets see what the results look like:

kubectl get pod
NAME                                     READY     STATUS    RESTARTS   AGE
conda-742lc                              1/1       Running   0          2d

Very good, the pod is running and we have a “READY 1/1”

A few things we need to connect to the jupyter notebook. Run the following command and notice the output. It gives you a URL with a token to access the web app. Obviously localhost is going to not work from my remote workstations. Save that token for later though.

$kubectl logs conda-742lc


Package plan for installation in environment /opt/conda:

The following NEW packages will be INSTALLED:

    _nb_ext_conf:     0.4.0-py36_1         
    nb_anacondacloud: 1.4.0-py36_0         
    nb_conda:         2.2.1-py36h8118bb2_0 
    nb_conda_kernels: 2.1.0-py36_0         
    nbpresent:        3.0.2-py36h5f95a39_1 

The following packages will be UPDATED:

    anaconda:         5.0.1-py36hd30a520_1  --> custom-py36hbbc8b67_0
    conda:            4.3.30-py36h5d9f9f4_0 --> 4.4.7-py36_0         
    pycosat:          0.6.2-py36h1a0ea17_1  --> 0.6.3-py36h0a5515d_0 

+ /opt/conda/bin/jupyter-nbextension enable nbpresent --py --sys-prefix
Enabling notebook extension nbpresent/js/nbpresent.min...
      - Validating: OK
+ /opt/conda/bin/jupyter-serverextension enable nbpresent --py --sys-prefix
Enabling: nbpresent
- Writing config: /opt/conda/etc/jupyter
    - Validating...
      nbpresent  OK

+ /opt/conda/bin/jupyter-nbextension enable nb_conda --py --sys-prefix
Enabling notebook extension nb_conda/main...
      - Validating: OK
Enabling tree extension nb_conda/tree...
      - Validating: OK
+ /opt/conda/bin/jupyter-serverextension enable nb_conda --py --sys-prefix
Enabling: nb_conda
- Writing config: /opt/conda/etc/jupyter
    - Validating...
      nb_conda  OK

[I 17:09:25.393 NotebookApp] [nb_conda_kernels] enabled, 3 kernels found
[I 17:09:25.399 NotebookApp] Writing notebook server cookie secret to /root/.local/share/jupyter/runtime/notebook_cookie_secret
[W 17:09:25.421 NotebookApp] WARNING: The notebook server is listening on all IP addresses and not using encryption. This is not recommended.
[I 17:09:26.044 NotebookApp] [nb_anacondacloud] enabled
[I 17:09:26.050 NotebookApp] [nb_conda] enabled
[I 17:09:26.095 NotebookApp] ✓ nbpresent HTML export ENABLED
[W 17:09:26.095 NotebookApp] ✗ nbpresent PDF export DISABLED: No module named 'nbbrowserpdf'
[I 17:09:26.098 NotebookApp] Serving notebooks from local directory: /opt/notebooks
[I 17:09:26.098 NotebookApp] 0 active kernels 
[I 17:09:26.098 NotebookApp] The Jupyter Notebook is running at: http://[all ip addresses on your system]:8888/?token=08938eb3b2bc00f350c43f7535e38f6aa339f5915e12d912
[I 17:09:26.098 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 17:09:26.099 NotebookApp] 
    
    Copy/paste this URL into your browser when you connect for the first time,
    to login with a token:
        http://localhost:8888/?token=08938<blah blah blah

We must create a “service” in Kubernetes in order for the application to be accessible. There is a ton about services and ingress into applications. Since I am running on an private cluster. Not on Google or Amazon I am going to use the simplest way for this post to create external access. That is done using the “type” under the spec. See how it says NodePort? Also I am not specifying an inbound port (you can do that if you want). I am just telling it to find the app called “conda” and forward traffic to tcp 8888.

conda-svc.yaml

kind: Service
apiVersion: v1
metadata:
  name: conda-svc
spec:
  type: NodePort
  ports:
    - port: 8888
  selector:
    app: conda

kubectl create -f conda-svc.yaml

This creates the service from the file. This is actually a cool concept that allows the inbound traffic management (ingress) be disaggregated from the application pod/deployment. That means I can swap versions of the app without changing the inbound rules or loadbalancers (lb is a whole book unto itself). To see my services now I run:

$ kubectl get svc
NAME                              TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)           AGE
conda-svc                         NodePort    10.98.67.191     <none>        8888:32250/TCP    2d
kubernetes                        ClusterIP   10.96.0.1        <none>        443/TCP           36d
mc-nash-minecraft                 NodePort    10.105.112.153   <none>        25565:31642/TCP   31d
mc-shea-minecraft                 NodePort    10.111.206.174   <none>        25565:31048/TCP   31d
mc-survival-minecraft             NodePort    10.99.46.7       <none>        25565:31723/TCP   31d
prom-2vcps-prometheus-server-np   NodePort    10.104.173.0     <none>        80:31400/TCP      30d

Great, now we see the service is forwarding port 32250 (yours will be different) to 8888. Using the node port type I can actually hit any node in my cluster and my K8s CNI will forward the traffic.

now just go to and paste your token from earlier.

http://<a node ip>:32250/

In my github repo for this project I included a basic notebook file that shows some python code to simulate coin flips many many times. Feel free to “upload” and play with it and have fun with Data Science on Juypter / Conda running in a K8s cluster.

vSphere Container Hosts Storage Networking

In the last couple of days I had a couple of questions from customers implementing some kind of container host on top of vSphere. Each was doing it to make use of either Kubernetes or Docker Volume Plugin for Pure Storage. First, there was a little confusion if the actual container needs to have iSCSI access to the array. The container needs network access for sure (I mean if you want somone to use the app) but it does not need access to the iSCSI network. Side Note: iSCSI is not required to use the persistent storage plugins for Pure. Fiber channel is supported. ISCSI may just be an easy path to using a PureFlash Array or NFS (10G network) for FlashBlade with an existing vSphere Setup.

To summarize all that: The container host VM needs access to talk directly to the storage. I accomplish this today with multiple vnics but you can do it however you like. There may be some vSwitches, physical nics and switches in the way, but the end result should be the VM talking to the FlashArray or FlashBlade.

More information on configuring our plugins is here:

Docker/DCOS/Mesos – https://store.docker.com/plugins/pure-docker-volume-plugin
Kubernetes and OpenShift – https://hub.docker.com/r/purestorage/k8s/

Basically the container host needs to be able to talk to the MGMT interface of the array, to do it’s automation of creating host objects, volumes and connecting them together (also removing them when you are finished). The thing is to know the plugin does all the work for you. Then when your application manifest requests the storage the plugin mounts the device to the required mount point inside the container. The app (container) does not know or care anything about iSCSI, NFS or Fiber Channel (and it should not).

Container HOST Storage Networking

Container hosts as VM’s Storage Networking

If you are setting up iSCSI in vSphere for Pure, you should probably go see Cody’s pages on doing this most of this is a good idea as a foundation for what I am about to share.

https://www.codyhosterman.com/pure-storage-vmware-overview/flasharray-and-vmware-best-practices/iscsi-setup/

Make sure you can use MPIO. Follow the linux best practices for Pure Storage. Inside your container hosts.

Do it the good old (new) gui way

So what I normally do is setup 2 new port groups on my VDS.

something like… iscsi-1 and iscsi-2 I know I am very original and creative.

Set the uplink for the Portgroup

We used to setup “in guest iSCSI” for VM’s that needed array based snaphost features way back in the day. This is basically the same piping. After creating the new port groups edit the settings in the HTML5 GUI as shown below.

Set the Failover Order

Go for iSCSI-1 on Uplink 1 and iSCSI-2 on Uplink 2

I favor putting the other Uplink into “Unused” as this gives me the straightest troubleshooting path in case something downstream isn’t working. You can put it in “standby” and probably be just fine.