Load Testing Prometheus Metric Ingestion

Published in

FreshTracks.io

3 min readAug 30, 2018

Prometheus is a popular open source project for time series ingestion, storage, and queries. The Prometheus ecosystem is rapidly expanding with alternative implementations such as Cortex and Thanos to meet the diverse use-cases of adopting organizations. These alternative implementations share a common exposition format that is in the process of being formalized as the OpenMetrics standard.

It is important for organizations to understand series throughput limits to properly scale their metrics ingestion system, but alternative implementations have different performance and scaling characteristics. Avalanche is a simple metrics generation tool that can be used to load test metric ingestion throughput. Avalanche serves a metrics endpoint with configurable series generation, seen in the example below.

Running Avalanche

The easiest way to run an Avalanche server is with the public Docker image.

docker run -p 9001:9001 quay.io/freshtracks.io/avalanche

Avalanche supports several flags for configuring the generated series:

metric-count

Configure the number of metric names exposed in the endpoint. This option is useful for testing how the number of series present on the endpoint impacts ingestor performance.

series-count

Configure the number of series per-metric. The total number of series rendered to the endpoint is equal to (metric-count * series-count). This option is useful for testing how the (metric-count/series-count) ratio impacts ingestor performance.

label-count

Configure the number of labels present in each series.

value-interval

Update series values every {interval} seconds.

series-interval

Update the ‘series_id’ label every {interval} seconds, cycling series. This option is helpful for testing series creation and termination performance.

metric-interval

Update metric names every {interval} seconds, cycling series. This option is helpful for testing series creation and termination performance.

Measuring Ingestion Performance

Configure Prometheus to scrape itself, plus add the following scrape config for Avalanche:

Some interesting Prometheus metrics to watch while scraping Avalanche include:

scrape_duration_seconds
prometheus_target_sync_*
prometheus_target_scrape_*
prometheus_tsdb_*
prometheus_local_storage_*
prometheus_rule_*

Clustering

Real-life Prometheus instances are likely to scrape multiple hosts and endpoints. This behavior can be easily replicated with a Kubernetes deployment. An example deployment configuration to deploy 5 instances of Avalanche looks like this:

Next, configure Prometheus to run as a pod in your cluster and update the Prometheus scrape config to use Kubernetes service discovery to find each exposed Avalanche endpoint with a scrape target config as shown in the example config below:

Node Exporter and cAdvisor metrics can provide insights into performance and resource utilization of Prometheus once it is running in a pod and scraping Avalanche endpoints. Resource usage is generally correlated to total series/second ingested and “prometheus_target_interval_length_seconds” will exceed requested scrape intervals when Prometheus is under-provisioned.

Alternative Implementations

Avalanche is useful for guiding parameter tuning, scaling, and development of other metrics collection systems that adopt the OpenMetrics standard. An example is Cortex, which has much different performance characteristics than Prometheus, and exposes a host of metrics useful for performance investigations prefixed with cortex_.

FreshTracks uses Avalanche to test and scale our Cortex system, and we’ve found that operations such as rule evaluation, chunk flushing, and series termination are particularly useful to track because certain metric workload combinations can significantly degrade ingestion performance. This data has been useful to help us proactively avoid and tune for those combinations.

Load Testing Prometheus Metric Ingestion

Running Avalanche

Measuring Ingestion Performance

Clustering

Alternative Implementations

Written by Dave Thompson