Load Testing Prometheus Metric Ingestion

Dave Thompson
FreshTracks.io
Published in
3 min readAug 30, 2018

--

Prometheus is a popular open source project for time series ingestion, storage, and queries. The Prometheus ecosystem is rapidly expanding with alternative implementations such as Cortex and Thanos to meet the diverse use-cases of adopting organizations. These alternative implementations share a common exposition format that is in the process of being formalized as the OpenMetrics standard.

It is important for organizations to understand series throughput limits to properly scale their metrics ingestion system, but alternative implementations have different performance and scaling characteristics. Avalanche is a simple metrics generation tool that can be used to load test metric ingestion throughput. Avalanche serves a metrics endpoint with configurable series generation, seen in the example below.

Example metrics generated by Avalanche.

Running Avalanche

The easiest way to run an Avalanche server is with the public Docker image.

docker run -p 9001:9001 quay.io/freshtracks.io/avalanche

Avalanche supports several flags for configuring the generated series:

metric-count

Configure the number of metric names exposed in the endpoint. This option is useful for testing how the number of series present on the endpoint impacts ingestor performance.

series-count

Configure the number of series per-metric. The total number of series rendered to the endpoint is equal to (metric-count * series-count). This option is useful for testing how the (metric-count/series-count) ratio impacts ingestor performance.

label-count

Configure the number of labels present in each series.

value-interval

Update series values every {interval} seconds.

series-interval

Update the ‘series_id’ label every {interval} seconds, cycling series. This option is helpful for testing series creation and termination performance.

metric-interval

Update metric names every {interval} seconds, cycling series. This option is helpful for testing series creation and termination performance.

Measuring Ingestion Performance

Configure Prometheus to scrape itself, plus add the following scrape config for Avalanche:

Avalanche scrape target

Some interesting Prometheus metrics to watch while scraping Avalanche include:

  • scrape_duration_seconds
  • prometheus_target_sync_*
  • prometheus_target_scrape_*
  • prometheus_tsdb_*
  • prometheus_local_storage_*
  • prometheus_rule_*

Clustering

Real-life Prometheus instances are likely to scrape multiple hosts and endpoints. This behavior can be easily replicated with a Kubernetes deployment. An example deployment configuration to deploy 5 instances of Avalanche looks like this:

Avalanche deployment config

Next, configure Prometheus to run as a pod in your cluster and update the Prometheus scrape config to use Kubernetes service discovery to find each exposed Avalanche endpoint with a scrape target config as shown in the example config below:

Avalanche pod scrape target config

Node Exporter and cAdvisor metrics can provide insights into performance and resource utilization of Prometheus once it is running in a pod and scraping Avalanche endpoints. Resource usage is generally correlated to total series/second ingested and “prometheus_target_interval_length_seconds” will exceed requested scrape intervals when Prometheus is under-provisioned.

Alternative Implementations

Avalanche is useful for guiding parameter tuning, scaling, and development of other metrics collection systems that adopt the OpenMetrics standard. An example is Cortex, which has much different performance characteristics than Prometheus, and exposes a host of metrics useful for performance investigations prefixed with cortex_.

FreshTracks uses Avalanche to test and scale our Cortex system, and we’ve found that operations such as rule evaluation, chunk flushing, and series termination are particularly useful to track because certain metric workload combinations can significantly degrade ingestion performance. This data has been useful to help us proactively avoid and tune for those combinations.

--

--