A Deep Dive into Kubernetes Metrics — Part 6: kube-state-metrics

Published in

FreshTracks.io

5 min readAug 20, 2018

This is Part 6 (and the last) of a multi-part series about all the metrics you can gather from your Kubernetes cluster.

In Part 5, I dug deeply into all the metrics that are exposed by the Kubernetes data store, etc. In this installment, I will cover the metrics that are derived from the Kubernetes API server by installing the excellent kube-state-metrics package.

Kube-state-metrics interrogates the Kubernetes API server, and exposes a bunch of states about all the Kubernetes objects. Because it is an exporter, the kube-state-metrics package does this in the Prometheus metrics exposition format. This package could have been named the Kubernetes state exporter.

The metrics that are exposed by kube-state-metrics are somewhat non-traditional from the usual USE or RED metrics exposed by other services. Kube-state-metrics, as the name implies, exposes the state of all the Kubernetes objects in the cluster. As we will see, these are not traditional Prometheus counters or gauges. In fact, some of the values in these series remain constantly at the value 1.0!

What does state mean in this context? This depends on which Kubernetes object we are talking about, but in general when using kube-state-metrics you can gather the following state about your cluster:

Counts of each object type
All of the Kubernetes labels and their values attached to each object
The creation time (as an epoch) of each object
Some generic, object specific “info”
Other states specific to the object in question

As of this writing, there are 19 Kubernetes object types that kube-state-metrics tracks. Those objects are:

Object Counts

Often you will want to know how many of a given type of object is present in your cluster. Suppose you want to know how many Pods are running at any given time. Use this query to find out:

count(kube_pod_status_phase{phase="Running"})

There are several other attributes about Pods that are gathered. The running state is just one. To get a count of Pods that are in a “have failed” state, use this query:

count(kube_pod_status_phase{phase="Failed"})

Kubernetes Labels

Labels are a Kubernetes superpower. Because Kubernetes is managing the location of everthing on the cluster for you, you don’t know ahead of time on which Node things are deployed. By applying labels, you are able to select for those things by label values. This is very powerful.

Being able to aggregate metrics by the same runtime labels you are applying to Kubernetes objects is also very powerful. Because labels are applied to objects at runtime, it can be difficult to extract these labels from within the application. It is a bad practice to try to fetch and expose runtime environment values (like labels) from within your services. However, those runtime values can be important to your application metrics and monitoring system.

Fortunately, kube-state-metrics exposes all the Kubernetes labels for us. Let’s see how this can help us.

For this example, we are going to take a Kubernetes label from a given pod and add that label to a custom metric from a service running in that pod.

The following snippet is from a deployed service called “frontdoor”. The pod has 2 Kubernetes labels applied; version and team:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: frontdoor-deployment
spec:
  selector:
    matchLabels:
      team: blue
      version: 1.0.1
  template:
    metadata:
      labels:
        version: 1.0.1
        team: blue
[...]

According to the documentation about Pod metrics from kube-state-metrics, the series kube_pod_labels contains the following Prometheus labels:

pod=<pod-name>
namespace=<pod-namespace>
label_POD_LABEL=<POD_LABEL>

Where POD_LABEL corresponds with every Kubernetes label on the pod.

From the above example deployment one of the Prometheus series exposed from kube-state-metrics would look like this:

kube_pod_labels{
[...]
  label_name="frontdoor",
  label_version="1.0.1",
  label_team="blue"
  namespace="default",
  pod="frontdoor-xxxxxxxxx-xxxxxx",
} = 1

This pod has three labels; name, version, and team. These pod labels are translated into the Prometheus series labels with the same name but with the “label_” prefix and the label values in the series labels values. The value of this series is always 1.0. That’s cool! Now what can you do with it?

Now suppose the frontdoor service exposes metrics about the HTTP requests being serviced. You just rolled out a fix for a number of errors, and you want to see how version 1.0.1 is working. You can’t include the version of the service in the definition of the metric because it’s not exposed in the middleware you are using. How do we bring these Prometheus series together to know what version was running and look at our performance change?

The first step is to use group_left on the pod name to join the two series, then expose the version label in the query.

sum(
  rate(http_request_count{code=~"^(?:5..)$"}[5m])) by (pod)  
* 
on (pod) group_left(label_version) kube_pod_labels

Let’s break that down. The first part is a typical rate query. This shows the number of response codes that start with 500.

sum(
  rate(http_request_count{code=~"^(?:5..)$"}[5m])) by (pod)

Next we multiply by the value of kube_pod_labels, which is always 1.0. Numerically a no-op, but it brings the kube_pod_labels series to the party.

sum(
  rate(http_request_count{code=~"^(?:5..)$"}[5m])) by (pod)
*
kube_pod_labels

Next the on (pod) will join the two series on the pod label. Finally the group_left(label_version) will add the label, label_version, to the result.

Object Creation Time

Often is it helpful to know at what time objects in Kubernetes where created. Kube-state-metrics exposes a creation time for almost all the objects it tracks. The metric name follows the pattern kube_<OBJECT>_created and will include values for the name of the object and the namespace where it lives. The value is an epoch timestamp to the millisecond.

For example, the CronJob creation series is called kube_cronjob_created.

To calculate the average age of all CronJobs running use this query:

avg(time() - kube_cronjob_created )

This might be helpful if you have a CronJob that is taking too long to run.

Generic “Info”

Some of the objects tracked by kube-state-metrics have an “info” series. Typically kube_<OBJECT>_info shows information about that object. For example, the Pod info series, kube_pod_info, shows the following:

pod=<pod-name> 
namespace=<pod-namespace> 
host_ip=<host-ip> 
pod_ip=<pod-ip> 
node=<node-name>
created_by_kind=<created_by_kind>
created_by_name=<created_by_name>

State specific to Each Kubernetes Object

The rest of the data tracked by kube-state-metrics is specific to each object.

Wrapping Up

Kube-state-metrics is an excellent addition to your Kubernetes/Prometheus monitoring toolkit. Much of the state of your Kubernetes cluster is available within your Prometheus system. These states allow you to create interesting alerts and dashboards. It’s worth exploring what’s available.

FreshTracks simplifies Kubernetes visibility. Hosted Prometheus and Grafana technology with machine learning enriched data for the best Day-2 Kubernetes metrics experience.