I'm working on an alerting solution that uses Logstash to stream AWS CloudFront logs from an S3 bucket into Graphite after doing some minor processing.
Since multiple events with the same timestamp can occur (multiple events within a second), I elected to use Carbon Aggregator to count these events per second.
The problem I'm facing is that the aggregated whisper database seems to be dropping data. The normal whisper file sees all of it, but of course it cannot account for more than 1 event per second.
I'm running this setup in docker on an EC2 instance, which isn't hitting any sort of limit (CPU, Mem, Network, Disk).
I've checked every log I could find in the docker instances and checked docker logs, however nothing jumps out.
I've set the logstash output to display the lines on stdout (not missing any) and to send them to graphite on port 2023, which is set to be the line-by-line receiver for Carbon Aggregator:
aggregation-rules.conf is set to a very simple count per second:
test.<user>.total1s (1) = count test.<user>.total
pattern = .*
retentions = 1s:24h
Happy to share more of my configuration as you request it.
I've hit a brick wall with this, I've been trying so many different things but I'm not able to see all data in the aggregated whisper db.
Any help is very much appreciated.

Carbon aggregator isn't designed to do what you are trying to do. For that use-case you'd want to use statsd to count the events per second.
Carbon aggregator is meant to aggregate across different series, for each point that it sees on the input it quantizes it to a timestamp before any aggregation happens, so you are still only going to get a single value per second with aggregator. statsd will take any number of counter increments and total them up each interval.


