Keeping an eye on logs and metrics is a necessary evil for cluster admins. The benefits are clear: metrics help you set reasonable performance goals, while log analysis can uncover issues that impact your workloads. The hard part, however, is getting a slew of applications to work together in a useful monitoring solution.
In this post, I’ll cover monitoring a Kubernetes cluster with Graylog (for logging) and Prometheus (for metrics). Of course that’s not just wiring 3 things together. In fact, it’ll end up looking like this:
As you know, Kubernetes isn’t just one thing — it’s a system of masters, workers, networking bits, etc(d). Similarly, Graylog comes with a supporting cast (apache2, mongodb, etc), as does Prometheus (telegraf, grafana, etc). Connecting the dots in a deployment like this may seem daunting, but the right tools can make all the difference.
I’ll walk through this using conjure-up and the Canonical Distribution of Kubernetes (CDK). I find the conjure-up interface really helpful for deploying big software, but I know some of you hate GUIs and TUIs and probably other UIs too. For those folks, I’ll do the same deployment again from the command line.
Before we jump in, note that Graylog and Prometheus will be deployed alongside Kubernetes and not in the cluster itself. Things like the Kubernetes Dashboard and Heapster are excellent sources of information from within a running cluster, but my objective is to provide a mechanism for log/metric analysis whether the cluster is running or not.
The Walk Through
First things first, install conjure-up if you don’t already have it. On Linux, that’s simply:
sudo snap install conjure-up --classic
There’s also a brew package for macOS users:
brew install conjure-up
You’ll need at least version 2.5.2 to take advantage of the recent CDK spell additions, so be sure to sudo snap refresh conjure-up or brew update && brew upgrade conjure-up if you have an older version installed.
Once installed, run it:
conjure-up
You’ll be presented with a list of various spells. Select CDK and press Enter.
At this point, you’ll see additional components that are available for the CDK spell. We’re interested in Graylog and Prometheus, so check both of those and hit Continue.
You’ll be guided through various cloud choices to determine where you want your cluster to live. After that, you’ll see options for post-deployment steps, followed by a review screen that lets you see what is about to be deployed:
In addition to the typical K8s-related applications (etcd, flannel, load-balancer, master, and workers), you’ll see additional applications related to our logging and metric selections.
The Graylog stack includes the following:
apache2: reverse proxy for the graylog web interface
elasticsearch: document database for the logs
filebeat: forwards logs from K8s master/workers to graylog
graylog: provides an api for log collection and an interface for analysis
mongodb: database for graylog metadata
The Prometheus stack includes the following:
grafana: web interface for metric-related dashboards
prometheus: metric collector and time series database
telegraf: sends host metrics to prometheus
You can fine tune the deployment from this review screen, but the defaults will suite our needs. Click Deploy all Remaining Applications to get things going.
The deployment will take a few minutes to settle as machines are brought online and applications are configured in your cloud. Once complete, conjure-up will show a summary screen that includes links to various interesting endpoints for you to browse:
Exploring Logs
Now that Graylog has been deployed and configured, let’s take a look at some of the data we’re gathering. By default, the filebeat application will send both syslog and container log events to graylog (that’s /var/log/*.log and /var/log/containers/*.log from the kubernetes master and workers).
Grab the apache2 address and graylog admin password as follows:
Browse to http://<your-apache2-ip> and login with admin as the username and <your-graylog-password> as the password. Note: if the interface is not immediately available, please wait as the reverse proxy configuration may take up to 5 minutes to complete.
Once logged in, head to the Sources tab to get an overview of the logs collected from our K8s master and workers:
Drill into those logs by clicking the System / Inputs tab and selecting Show received messages for the filebeat input:
From here, you may want to play around with various filters or setup Graylog dashboards to help identify the events that are most important to you. Check out the Graylog Dashboard docs for details on customizing your view.
Exploring Metrics
Our deployment exposes two types of metrics through our grafana dashboards: system metrics include things like cpu/memory/disk utilization for the K8s master and worker machines, and cluster metrics include container-level data scraped from the K8s cAdvisor endpoints.
Grab the grafana address and admin password as follows:
Browse to http://<your-grafana-ip>:3000 and login with admin as the username and <your-grafana-password> as the password. Once logged in, check out the cluster metric dashboard by clicking the Home drop-down box and selecting Kubernetes Metrics (via Prometheus):
We can also check out the system metrics of our K8s host machines by switching the drop-down box to Node Metrics (via Telegraf)
The Other Way
As alluded to in the intro, I prefer the wizard-y feel of conjure-up to guide me through complex software deployments like Kubernetes. Now that we’ve seen the conjure-up way, some of you may want to see a command line approach to achieve the same results. Still others may have deployed CDK previously and want to extend it with the Graylog/Prometheus components described above. Regardless of why you’ve read this far, I’ve got you covered.
The tool that underpins conjure-up is Juju. Everything that the CDK spell did behind the scenes can be done on the command line with Juju. Let’s step through how that works.
Starting From Scratch
If you’re on Linux, install Juju like this:
sudo snap install juju --classic
For macOS, Juju is available from brew:
brew install juju
Now setup a controller for your preferred cloud. You may be prompted for any required cloud credentials:
juju bootstrap
We then need to deploy the base CDK bundle:
juju deploy canonical-kubernetes
Starting From CDK
With our Kubernetes cluster deployed, we need to add all the applications required for Graylog and Prometheus:
At this point, all the applications can communicate with each other, but we have a bit more configuration to do (e.g., setting up the apache2 reverse proxy, telling prometheus how to scrape k8s, importing our grafana dashboards, etc):
Now that we have everything deployed, related, configured, and exposed, you can login and poke around using the same steps from the Exploring Logs and Exploring Metrics sections above.
The Wrap Up
My goal here was to show you how to deploy a Kubernetes cluster with rich monitoring capabilities for logs and metrics. Whether you prefer a guided approach or command line steps, I hope it’s clear that monitoring complex deployments doesn’t have to be a pipe dream. The trick is to figure out how all the moving parts work, make them work together repeatably, and then break/fix/repeat for a while until everyone can use it.
This is where tools like conjure-up and Juju really shine. Leveraging the expertise of contributors to this ecosystem makes it easy to manage big software. Start with a solid set of apps, customize as needed, and get back to work!
Give these bits a try and let me know how it goes. You can find enthusiasts like me on Freenode IRC in #conjure-up and #juju. Thanks for reading!
For a native integration for Canonical’s Kubernetes platform, Juju was the perfect fit, and the charm makes consuming CloudCasa seamless for users. […]
Canonical is excited to announce that we are collaborating with OpenAirInterface (OAI) to drive the development and promotion of open source software for open radio access networks (Open RAN). Canonical will bring automation in software lifecycle management to OAI’s RAN stack, alongside additional infrastructure capabilities. This will be […]
Kubernetes is the open source, industry-standard platform for deploying, managing and scaling containerized applications – and applications on Kubernetes are easier with operators. […]