FaaS for Scalable Prometheus Rule Evaluation (Part 1)

Published in

FreshTracks.io

5 min readSep 24, 2018

Prometheus.io is a popular open source application used for persisting and querying high dimension time series data. Prometheus contains a limited set of built-in statistical functions, but more sophisticated forecasting and machine learning methods require external computation. The more computationally intensive analysis methods can be moved into FaaS (function as a service) functions for simpler deployment and easier scaling.

App Concept

This post walks through the creation of an application with the goal of forecasting cryptocurrency price changes. This is a difficult data set to forecast in real life, but the example application exposes simple concepts that can be applied to any time series data analysis. The focus of this post is not on the effectiveness of the models themselves, but rather on the techniques used to execute the models in a scalable fashion.

A working example of the application is available at www.predictatron.net and the code is available at github.com/limscoder/predictatron.

Select a currency, prediction method, and time range to visualize forecasts. The cyan series shows the actual value, while the black series shows the predicted value.

Running Prometheus

The first step of building the Predictatron app is to retrieve cryptocurrency price data and persist it in a format that’s convenient to query. Prometheus is the obvious choice for this task. The easiest way to run Prometheus is to run the Docker container. Predictatron.net runs a container within a Google Compute Engine instance, which automatically runs a configured container on startup.

The Predictatron repo contains a customized Prometheus container with scrape targets configured for the app, and the built container is published publicly at gcr.io/starry-tracker-215615/prom.

The Prometheus configuration for scraping cryptocurrency price data.

Populating Source Data

Once Prometheus is up and running, the next step is to expose data on an endpoint to scrape. The Predictatron Prometheus container is configured to scrape metrics data from an endpoint located at us-central1-starry-tracker-215615.cloudfunctions.net/coin. This endpoint is running as a Google Cloud Function.

Prometheus scrapes the endpoint at the configured interval (1 minute in this example). Every time the endpoint is scraped, it invokes a FaaS function that dynamically retrieves cryptocurrency price data from coinmarketcap and outputs the data formatted with the Prometheus exposition format. The function’s code is written with Python and can be seen in the metrics/coin directory.

Cryptocurrency price data in Prometheus exposition format

Series Inspection and Forecasting

Cryptocurrency price data appears as a series once Prometheus is successfully scraping the coin endpoint.

Time series data queried by metric name `btc_usd`.

Prometheus contains several basic built-in aggregation functions for analyzing data including: sum, avg, increase, and holt_winters. The predict_linear function uses linear regression to forecast future series values at arbitrary times, and that’s the forecasting method used by the linear option in Predictatron. The following Prometheus query predicts the series value 15 minutes (900 seconds) in the future using 6 hours of historical data.

predict_linear(btc_usd[6h], 900)

Thepredict_linear computations are quick enough that they can be queried on-demand, but it can be helpful to record forecast results as their own series in order to evaluate the effectiveness of historical forecasting. The following recording rule configuration will persist the forecast as a new series.

The derived series is named predict_linear:btc_usd and the predict_past and predict_future labels are added to identify the forecast’s parameters. The prediction series can now be queried with a time offset to compare against the source series to determine historic prediction error.

Prediction error: `predict_linear:btc_usd{predict_future=”900"} — (btc_usd offset 15m)`

FaaS for Rule Evaluation

Prometheus recording rules are sufficient for simple operations like predict_linear, but implementers will run into a couple of problems with this method. The first is that more complex rule expressions take longer to execute and scaling becomes an issue as the number of recording rules grows. The second problem is that implementers may want to invoke more complex operations that can’t be expressed as a Prometheus query expression.

The Predictatron app contains an ARIMA series forecasting method that can’t be easily expressed as a Prometheus query expression. FaaS offers a straightforward way to implement and scale more complex rule evaluations. Predictatron uses a function that defines an ARIMA model in Python.

ARIMA model in Python using `statsmodels` package.

The function is deployed to a FaaS and exposes the forecast result in the Prometheus exposition format. Prometheus is configured to scrape the function endpoint and records the forecast as a new series. The FaaS function is auto-scaled, but different models and parameters can also easily be sharded across multiple function endpoints to ensure that each individual scrape completes in the desired duration.

FaaS functions are a useful strategy to deploy and scale any type of series rule evaluation and are not limited to forecasting analysis. The architecture of Predictatron is diagrammed below.

Stateful Models

The linear regression and ARIMA models used in this example were chosen because they are simple to understand and computationally inexpensive, so they can be executed on-demand. Both of these models are univariate models, and they look only at the historical values of the time series being forecast. Univariate models in general are inadequate for forecasting a series like cryptocurrency, since the events causing price movement are not present in the series itself.

Stay tuned for Part 2 of this blog series where we will investigate more powerful and computationally intensive multivariate forecasting models. These models are too slow to train on-demand and require stateful execution, necessitating a different approach than FaaS rule evaluation.