Minkirri ZWiki TsdbDoneRight

home contents changes issues index options help subscribe edit

I've been looking at TSDB options and it's shocking how every one I've seen makes some fundimental mistakes in how they handle timeseries.

Counters are better than Guages or Deltas

This is more about the sensors than the TSDB, but the TSDB sometimes biases people in their choices of how to implement their sensors.

A Gauge sensor in theory tells you the instantaneous value at at the sample-time. This theory means it cannot tell you anything about what happened between samples; it could have shot up and/or down by huge amounts. In practice Gauge sensors nearly never give an instantaneous value, but the average value over a previous time period. An AC power meter doesn't give you the volts * amps at that instant because it would be useless, instead it gives you what is effectively the average of that over the past interval that includes multiple 50Hz cycles, and probably over the past expected/target sampling interval. The underlying system being measured also has limits to how fast it can change. If we know all these characteristics of the Guage sensor and what it's measuring, we can make some more accurate assumptions about what happened between samples, but that means we need extra meta-data to be accurate. Missed or lost samples mean we have a big gap with no data.

Delta's attempt to fix the problems of Guages by reporting the change in value between samples. This effectively gives you the average rate between samples. However, any dropped/lost/missed samples corrupt the next sample, which looks like a small delta over a longer-than-usual period giving a low rate. Some systems try to solve this by including the start and end times of the period the delta covers. However, that's extra data to send/store/process, and a dropped sample still means you have a time-gap where the rate is completely unknown.

A counter accurately tells you the average rate between any two samples. This means you don't require any extra metadata about the sensor or system to interpret the data accurately. Any dropped points don't corrupt the timeseries or leave unknown time-gaps, they just sacrifice some time-resolution. This also means you can reduce your storage at only the cost of resolution by just dropping intermediate samples. Counters can wrap, but wraps can be detected and correctly accounted for knowing only the maxium possible rate. Counters can also be reset, but these are relatively rare and can be nearly always detected (the counter goes backwards by less than a counter-wrap could), resulting in a small unknown period between the last sample and the reset sample, which is usually when the system was rebooting.

Nearly any kind of measurement can be done as a counter. The integral of any value gives you a "counter" where the difference between two values divided by the time between them is the average value for that time period. So a sensor that counts people in a room can be converted into a counter by summing the count of people every second to give you the accumulated people-seconds. Any two samples of that counter can give you the average number of people in the room during that period.

Samples represent the time before, not after

Many TSDB's will give the value of a timeseries at a particular point in time by returning the value of the sample before that time. But Guage, Delta, and Counter sensors all give you a value that tells you about the time period before that sample. So it's more accurate to use the sample after the time you want the value for. Its also even more accurate to linearly interpolate between the samples before and after.

There is one exception to this; "value changed" gauge sensors that only return samples when the value changes. These are pretty rare.

subject:

	( 2 subscribers )