InfluxDb top X offenders

InfluxDb top X offenders

I have an InfluxDb time series database which I am storing a metric called api_calls. This metric has two pertinent measurements, call_invoked and call_failed. Each measurement also has a tag with a vendor id associated with it.
Every minute, I have a service that collates the number of times I invoked call_invoked (fired when I start the API call) and call_failed (fired when I get an error message from the API call) and stores them into an InfluxDb. So basically, every minute I get a record (per tag, as I understand it) in the InfluxDb “table” (for lack of a better term) that has these two values associated with it.
I have been scratching my head trying to figure out how to show the top ten vendor id’s with the largest percentage of API failures.
How can I do this? I have a strong SQL background but everything I’ve tried has either not worked at all or definitely not worked correctly.


Solution 1:

I’ve made some guesses about your schema, and built up a solution using the CLI. I’m using InfluxDB v1.0.2 here.

Related:  In Influxdb, how do I concat values in my query?

Note that TOP() only became a working function around January 2016.

If your version of InfluxDB is older than this, the following won’t work.

First, some sample data:

USE foo

INSERT api_calls,vendor_id=1 call_invoked=3i,call_failed=3i 1483228860000000000
INSERT api_calls,vendor_id=2 call_invoked=3i,call_failed=8i 1483228860000000000
INSERT api_calls,vendor_id=3 call_invoked=3i,call_failed=5i 1483228860000000000
INSERT api_calls,vendor_id=4 call_invoked=3i,call_failed=9i 1483228860000000000
INSERT api_calls,vendor_id=5 call_invoked=3i,call_failed=10i 1483228860000000000
INSERT api_calls,vendor_id=6 call_invoked=3i,call_failed=1i 1483228860000000000
INSERT api_calls,vendor_id=7 call_invoked=3i,call_failed=6i 1483228860000000000
INSERT api_calls,vendor_id=8 call_invoked=3i,call_failed=9i 1483228860000000000
INSERT api_calls,vendor_id=9 call_invoked=3i,call_failed=6i 1483228860000000000
INSERT api_calls,vendor_id=10 call_invoked=3i,call_failed=5i 1483228860000000000

Then run the following query. Note that the WHERE time > x AND time < y clause could be replaced with something like WHERE time > now() - 1h.

SELECT TOP(call_failed,3), vendor_id FROM api_calls WHERE time > '2017-01-01T00:00:00Z' AND time < '2017-01-01T00:05:00Z' GROUP BY time(1m)

Which gives a result of:

name: api_calls
time                    top     vendor_id
2017-01-01T00:01:00Z    10      5
2017-01-01T00:01:00Z    9       4
2017-01-01T00:01:00Z    9       8

Note that every minute that has no data returns one row, but where there is data, there are 3 rows returned.

If we tell the CLI we want to peek at the JSON, we can type format json, and when we repeat that same query we get this:

Related:  Query Influxdb based on tags?

Does that help?

Edit – Re-worked for percentage.

Apologies – I noticed you were looking for percentage, not Top X count.

In InfluxDB, that’s two problems: the first is how to generate the percentages.

A note about nesting operators in functions: InfluxDB does not yet generally guarantee that nesting functions or operators within function arguments will work. Some do, many don’t. Of importance here: TOP() only allows field keys or tags as arguments, and not operations on tags (e.g. call_failed / total_calls). You’ll need to perform an extra query to generate the percentages, and you’ll have to store them. You can either calculate them on the “raw”, minute-by-minute values:

SELECT (call_failed / (call_failed + call_invoked)) AS pct_fail INTO api_calls FROM api_calls GROUP BY vendor_id

(GROUP BY time is implicit: re-run that query from the CLI without the INTO clause to see how).

Or you can summarize each e.g. hour:

SELECT (sum(call_failed) / (sum(call_failed) + sum(call_invoked))) AS pct_fail INTO api_calls_hourly FROM api_calls GROUP BY time(1h), vendor_id

That can be done as a one-off for existing data. For any newly arriving data, a continuous query can be used:

CREATE CONTINUOUS QUERY fail_pct_calc ON foo BEGIN SELECT (sum(call_failed) / (sum(call_failed) + sum(call_invoked))) AS pct_fail INTO api_calls_summary FROM api_calls GROUP BY time(1h), vendor_id END

(continuous queries require a GROUP BY time() clause).

Related:  How to change location of Influxdb storage folder?

There’s no technical requirement to push Continuous Query results into a new measurement – you could SELECT the calculation results back INTO api_calls, for example. But putting the raw data and the summary results in the same measurement leads to query results that have a lot of nulls. It’s often tidier in InfluxDB to push the results to a new measurement.

From there, use TOP() as before:

Edited – fixed ‘FROM’ measurement

SELECT TOP(pct_fail,3), vendor_id FROM api_calls_summary WHERE time > '2017-01-01T00:00:00Z' AND time < '2017-01-01T00:05:00Z' GROUP BY time(1m)