InfluxDb top X offenders
I have an InfluxDb time series database which I am storing a metric called api_calls. This metric has two pertinent measurements, call_invoked and call_failed. Each measurement also has a tag with a vendor id associated with it.
Every minute, I have a service that collates the number of times I invoked call_invoked (fired when I start the API call) and call_failed (fired when I get an error message from the API call) and stores them into an InfluxDb. So basically, every minute I get a record (per tag, as I understand it) in the InfluxDb “table” (for lack of a better term) that has these two values associated with it.
I have been scratching my head trying to figure out how to show the top ten vendor id’s with the largest percentage of API failures.
How can I do this? I have a strong SQL background but everything I’ve tried has either not worked at all or definitely not worked correctly.
I’ve made some guesses about your schema, and built up a solution using the CLI. I’m using InfluxDB v1.0.2 here.
TOP() only became a working function around January 2016.
If your version of InfluxDB is older than this, the following won’t work.
First, some sample data:
CREATE DATABASE foo USE foo PRECISION rfc3339 INSERT api_calls,vendor_id=1 call_invoked=3i,call_failed=3i 1483228860000000000 INSERT api_calls,vendor_id=2 call_invoked=3i,call_failed=8i 1483228860000000000 INSERT api_calls,vendor_id=3 call_invoked=3i,call_failed=5i 1483228860000000000 INSERT api_calls,vendor_id=4 call_invoked=3i,call_failed=9i 1483228860000000000 INSERT api_calls,vendor_id=5 call_invoked=3i,call_failed=10i 1483228860000000000 INSERT api_calls,vendor_id=6 call_invoked=3i,call_failed=1i 1483228860000000000 INSERT api_calls,vendor_id=7 call_invoked=3i,call_failed=6i 1483228860000000000 INSERT api_calls,vendor_id=8 call_invoked=3i,call_failed=9i 1483228860000000000 INSERT api_calls,vendor_id=9 call_invoked=3i,call_failed=6i 1483228860000000000 INSERT api_calls,vendor_id=10 call_invoked=3i,call_failed=5i 1483228860000000000
Then run the following query. Note that the
WHERE time > x AND time < y clause could be replaced with something like
WHERE time > now() - 1h.
SELECT TOP(call_failed,3), vendor_id FROM api_calls WHERE time > '2017-01-01T00:00:00Z' AND time < '2017-01-01T00:05:00Z' GROUP BY time(1m)
Which gives a result of:
name: api_calls --------------- time top vendor_id 2017-01-01T00:00:00Z 2017-01-01T00:01:00Z 10 5 2017-01-01T00:01:00Z 9 4 2017-01-01T00:01:00Z 9 8 2017-01-01T00:02:00Z 2017-01-01T00:03:00Z 2017-01-01T00:04:00Z
Note that every minute that has no data returns one row, but where there is data, there are 3 rows returned.
If we tell the CLI we want to peek at the JSON, we can type
format json, and when we repeat that same query we get this:
Does that help?
Edit – Re-worked for percentage.
Apologies – I noticed you were looking for percentage, not Top X count.
In InfluxDB, that’s two problems: the first is how to generate the percentages.
A note about nesting operators in functions: InfluxDB does not yet generally guarantee that nesting functions or operators within function arguments will work. Some do, many don’t. Of importance here:
TOP() only allows field keys or tags as arguments, and not operations on tags (e.g. call_failed / total_calls). You’ll need to perform an extra query to generate the percentages, and you’ll have to store them. You can either calculate them on the “raw”, minute-by-minute values:
SELECT (call_failed / (call_failed + call_invoked)) AS pct_fail INTO api_calls FROM api_calls GROUP BY vendor_id
GROUP BY time is implicit: re-run that query from the CLI without the
INTO clause to see how).
Or you can summarize each e.g. hour:
SELECT (sum(call_failed) / (sum(call_failed) + sum(call_invoked))) AS pct_fail INTO api_calls_hourly FROM api_calls GROUP BY time(1h), vendor_id
That can be done as a one-off for existing data. For any newly arriving data, a continuous query can be used:
CREATE CONTINUOUS QUERY fail_pct_calc ON foo BEGIN SELECT (sum(call_failed) / (sum(call_failed) + sum(call_invoked))) AS pct_fail INTO api_calls_summary FROM api_calls GROUP BY time(1h), vendor_id END
(continuous queries require a
GROUP BY time() clause).
There’s no technical requirement to push Continuous Query results into a new measurement – you could
SELECT the calculation results back
INTO api_calls, for example. But putting the raw data and the summary results in the same measurement leads to query results that have a lot of nulls. It’s often tidier in InfluxDB to push the results to a new measurement.
From there, use
TOP() as before:
Edited – fixed ‘FROM’ measurement
SELECT TOP(pct_fail,3), vendor_id FROM api_calls_summary WHERE time > '2017-01-01T00:00:00Z' AND time < '2017-01-01T00:05:00Z' GROUP BY time(1m)