This is in regards to using the streamstats command with a "by" clause, and at the same time specifying window=N to tell it to only compute the statistics using the N most recent rows.
The Splunk docs for streamstats say that the window will take into account the "by" field:
See here under "More examples" http://docs.splunk.com/Documentation/Splunk/6.0.1/SearchReference/Streamstats
Specifically it says:
Example 1: Compute the average value of foo for each value of bar including only the only 5
events with that value of bar.
... | streamstats avg(foo) by bar window=5 global=f
However this does not seem to be the case. When I use window=N with a by clause, the logic around window=N seems to ignore the by clause and it only looks at the 5 previous rows regardless of what value they had for the by clause. Of course depending on your sort order those rows may or may not have the same value for the "by" field as the current row, and when streamstats calcualted the statistics for those 5 rows, it does correctly discard rows whose by fields dont match.
The end result is confusion!
Does anyone know whether the docs are wrong or whether this is a bug in streamstats?
and can anyone think of a workaround? I need to basically have this process rows that have _time deviceName and a field called isBlank
that is either 1 or zero.
| streamstats current=f window=24 sum(isBlank) as rollingBlankHourCount by deviceName