Monitor your Kubernetes Cluster

by Kevin W Monroe on 16 January 2018

This article originally appeared on Kevin Monroe’s blog

Keeping an eye on logs and metrics is a necessary evil for cluster admins. The benefits are clear: metrics help you set reasonable performance goals, while log analysis can uncover issues that impact your workloads. The hard part, however, is getting a slew of applications to work together in a useful monitoring solution.

In this post, I’ll cover monitoring a Kubernetes cluster with Graylog (for logging) and Prometheus (for metrics). Of course that’s not just wiring 3 things together. In fact, it’ll end up looking like this:

As you know, Kubernetes isn’t just one thing — it’s a system of masters, workers, networking bits, etc(d). Similarly, Graylog comes with a supporting cast (apache2, mongodb, etc), as does Prometheus (telegraf, grafana, etc). Connecting the dots in a deployment like this may seem daunting, but the right tools can make all the difference.

I’ll walk through this using conjure-up and the Canonical Distribution of Kubernetes (CDK). I find the conjure-up interface really helpful for deploying big software, but I know some of you hate GUIs and TUIs and probably other UIs too. For those folks, I’ll do the same deployment again from the command line.

Before we jump in, note that Graylog and Prometheus will be deployed alongside Kubernetes and not in the cluster itself. Things like the Kubernetes Dashboard and Heapster are excellent sources of information from within a running cluster, but my objective is to provide a mechanism for log/metric analysis whether the cluster is running or not.

The Walk Through

First things first, install conjure-up if you don’t already have it. On Linux, that’s simply:

sudo snap install conjure-up --classic

There’s also a brew package for macOS users:

brew install conjure-up

You’ll need at least version 2.5.2 to take advantage of the recent CDK spell additions, so be sure to sudo snap refresh conjure-up or brew update && brew upgrade conjure-up if you have an older version installed.

Once installed, run it:

conjure-up

You’ll be presented with a list of various spells. Select CDK and press Enter.

At this point, you’ll see additional components that are available for the CDK spell. We’re interested in Graylog and Prometheus, so check both of those and hit Continue.

You’ll be guided through various cloud choices to determine where you want your cluster to live. After that, you’ll see options for post-deployment steps, followed by a review screen that lets you see what is about to be deployed:

In addition to the typical K8s-related applications (etcd, flannel, load-balancer, master, and workers), you’ll see additional applications related to our logging and metric selections.

The Graylog stack includes the following:

apache2: reverse proxy for the graylog web interface
elasticsearch: document database for the logs
filebeat: forwards logs from K8s master/workers to graylog
graylog: provides an api for log collection and an interface for analysis
mongodb: database for graylog metadata

The Prometheus stack includes the following:

grafana: web interface for metric-related dashboards
prometheus: metric collector and time series database
telegraf: sends host metrics to prometheus

You can fine tune the deployment from this review screen, but the defaults will suite our needs. Click Deploy all Remaining Applications to get things going.

The deployment will take a few minutes to settle as machines are brought online and applications are configured in your cloud. Once complete, conjure-up will show a summary screen that includes links to various interesting endpoints for you to browse:

Exploring Logs

Now that Graylog has been deployed and configured, let’s take a look at some of the data we’re gathering. By default, the filebeat application will send both syslog and container log events to graylog (that’s /var/log/*.log and /var/log/containers/*.log from the kubernetes master and workers).

Grab the apache2 address and graylog admin password as follows:

juju status --format yaml apache2/0 | grep public-address
    public-address: <your-apache2-ip>
juju run-action --wait graylog/0 show-admin-password
    admin-password: <your-graylog-password>

Browse to http://<your-apache2-ip> and login with admin as the username and <your-graylog-password> as the password. Note: if the interface is not immediately available, please wait as the reverse proxy configuration may take up to 5 minutes to complete.

Once logged in, head to the Sources tab to get an overview of the logs collected from our K8s master and workers:

Drill into those logs by clicking the System / Inputs tab and selecting Show received messages for the filebeat input:

From here, you may want to play around with various filters or setup Graylog dashboards to help identify the events that are most important to you. Check out the Graylog Dashboard docs for details on customizing your view.

Exploring Metrics

Our deployment exposes two types of metrics through our grafana dashboards: system metrics include things like cpu/memory/disk utilization for the K8s master and worker machines, and cluster metrics include container-level data scraped from the K8s cAdvisor endpoints.

Grab the grafana address and admin password as follows:

juju status --format yaml grafana/0 | grep public-address
    public-address: <your-grafana-ip>
juju run-action --wait grafana/0 get-admin-password
    password: <your-grafana-password>

Browse to http://<your-grafana-ip>:3000 and login with admin as the username and <your-grafana-password> as the password. Once logged in, check out the cluster metric dashboard by clicking the Home drop-down box and selecting Kubernetes Metrics (via Prometheus):

We can also check out the system metrics of our K8s host machines by switching the drop-down box to Node Metrics (via Telegraf)

The Other Way

As alluded to in the intro, I prefer the wizard-y feel of conjure-up to guide me through complex software deployments like Kubernetes. Now that we’ve seen the conjure-up way, some of you may want to see a command line approach to achieve the same results. Still others may have deployed CDK previously and want to extend it with the Graylog/Prometheus components described above. Regardless of why you’ve read this far, I’ve got you covered.

The tool that underpins conjure-up is Juju. Everything that the CDK spell did behind the scenes can be done on the command line with Juju. Let’s step through how that works.

Starting From Scratch

If you’re on Linux, install Juju like this:

sudo snap install juju --classic

For macOS, Juju is available from brew:

brew install juju

Now setup a controller for your preferred cloud. You may be prompted for any required cloud credentials:

juju bootstrap

We then need to deploy the base CDK bundle:

juju deploy canonical-kubernetes

Starting From CDK

With our Kubernetes cluster deployed, we need to add all the applications required for Graylog and Prometheus:

## deploy graylog-related applications
juju deploy xenial/apache2
juju deploy xenial/elasticsearch
juju deploy xenial/filebeat
juju deploy xenial/graylog
juju deploy xenial/mongodb

## deploy prometheus-related applications
juju deploy xenial/grafana
juju deploy xenial/prometheus
juju deploy xenial/telegraf

Now that the software is deployed, connect them together so they can communicate:

## relate graylog applications
juju relate apache2:reverseproxy graylog:website
juju relate graylog:elasticsearch elasticsearch:client
juju relate graylog:mongodb mongodb:database
juju relate filebeat:beats-host kubernetes-master:juju-info
juju relate filebeat:beats-host kubernetes-worker:jujuu-info

## relate prometheus applications
juju relate prometheus:grafana-source grafana:grafana-source
juju relate telegraf:prometheus-client prometheus:target
juju relate kubernetes-master:juju-info telegraf:juju-info
juju relate kubernetes-worker:juju-info telegraf:juju-info

At this point, all the applications can communicate with each other, but we have a bit more configuration to do (e.g., setting up the apache2 reverse proxy, telling prometheus how to scrape k8s, importing our grafana dashboards, etc):

## configure graylog applications
juju config apache2 enable_modules="headers proxy_html proxy_http"
juju config apache2 vhost_http_template="$(base64 <vhost-tmpl>)"
juju config elasticsearch firewall_enabled="false"
juju config filebeat \
  logpath="/var/log/*.log /var/log/containers/*.log"
juju config filebeat logstash_hosts="<graylog-ip>:5044"
juju config graylog elasticsearch_cluster_name="<es-cluster>"

## configure prometheus applications
juju config prometheus scrape-jobs="<scraper-yaml>"
juju run-action --wait grafana/0 import-dashboard \
  dashboard="$(base64 <dashboard-json>)"

Some of the above steps need values specific to your deployment. You can get these in the same way that conjure-up does:

<vhost-tmpl>: fetch our sample template from github
<graylog-ip>: juju run --unit graylog/0 ‘unit-get private-address’
<es-cluster>: juju config elasticsearch cluster-name
<scraper-yaml>: fetch our sample scraper from github; substituteappropriate values for K8S_PASSWORD and K8S_API_ENDPOINT
<dashboard-json>: fetch our host and k8s dashboards from github

Finally, you’ll want to expose the apache2 and grafana applications to make their web interfaces accessible:

## expose relevant endpoints
juju expose apache2
juju expose grafana

Now that we have everything deployed, related, configured, and exposed, you can login and poke around using the same steps from the Exploring Logs and Exploring Metrics sections above.

The Wrap Up

My goal here was to show you how to deploy a Kubernetes cluster with rich monitoring capabilities for logs and metrics. Whether you prefer a guided approach or command line steps, I hope it’s clear that monitoring complex deployments doesn’t have to be a pipe dream. The trick is to figure out how all the moving parts work, make them work together repeatably, and then break/fix/repeat for a while until everyone can use it.

This is where tools like conjure-up and Juju really shine. Leveraging the expertise of contributors to this ecosystem makes it easy to manage big software. Start with a solid set of apps, customize as needed, and get back to work!

Give these bits a try and let me know how it goes. You can find enthusiasts like me on Freenode IRC in #conjure-up and #juju. Thanks for reading!

Monitor your Kubernetes Cluster

The Walk Through

Exploring Logs

Exploring Metrics

The Other Way

The Wrap Up

Related posts

Kubernetes backups just got easier with the CloudCasa charm from Catalogic

Canonical and OpenAirInterface to collaborate on open source telecom network infrastructure

What is a Kubernetes operator?