Sampled vs. Aggregate Data

Most of our performance data is sampled: we poll each device once per collection interval (not more than once every five minutes). Many metrics and collection types report the instantaneous value of relevant performance metrics at the time of collection (e.g. a server with 25% memory usage at a given point in time); we report the average value of these metrics observed each hour of collection. Because we do not collect continuously, observed values are approximate; we recommend collecting for at least a month to reduce the impact of sampling error.

Some metrics, however, are recorded by operating systems as a count of events since system restart (or counter rollover). When the counter is stored as a 32-bit integer, we will measure the rate of events by polling the absolute counter twice within a short period, and extrapolate from this sampled rate to determine the average overall rate of events. With enough data points, these sampled values should converge towards the "true" average. When the operating system provides 64-bit counters (SSH and some SNMP implementations), we are able to calculate the number of events between polling intervals, effectively collecting events from the whole interval rather than extrapolating an approximate rate from a sample.