zlacker

Critical high-level stats such as errors should be scraped more frequently than 30 seconds. It’s important to have multiple time granularity scraping intervals, a small set of most critical stats should be scraped closer to 10s or 15s.

Prometheus has as an unaddressed flaw [0], where rate functions must be at least 2x the scrape interval. This means that if you scrape at 30s intervals, your rate charts won’t reflect the change until a minute after.

[0] - https://github.com/prometheus/prometheus/issues/3746

replies(1): >>rossju+Tf

>>yearol+(OP)
"Scrape" intervals (and the plumbing through to analysis intervals) are chosen precisely because of the denoising function aggregation provides.

Most scaled analysis systems provide precise control over the type of aggregation used within the analyzed time slices. There are many possibilities, and different purposes for each.

High frequency events are often collected into distributions and the individual timestamps are thrown away.