I'm seeing some really weird behavior with streamstats on Splunk 5.0.4 running on CentOS.
I have a lookup table that contains data from indexing Nessus plugins and cross-referencing them against a variety of other security data sources (CVSS metrics, Exploit-DB, OSVDB, etc.). Since there can be multiple CVE IDs with different severity levels associated with a particular plugin, each plugin has one line in the lookup table for each CVE ID associated with it.
The lookup table that contains our open vulnerabilities is something like 2 GB, so we can't really import that lookup table into a dashboard without crushing the performance. So, I'm having to take the vulnerability data that I have in a summary index and repopulate it with data from my lookup tables that users will want to see in a dashboard.
For some types of data, like the signature name, I only want one reference to it. Based on what I've read here, the best way to dedup a multivalued field is to use streamstats and values(field).
The problem is that when I run streamstats second time on the multi-value field that contains the plugin name and the "solution" to the vulnerability, I'm seeing data that doesn't match the vulnerability ID I'm using in the lookup. So, if the plugin "Sendmail < 2.2.4" has seven CVE IDs in the lookup table, the plugin name field would have the same name seven times. But after streamstats runs, I see three or four different plugin names or solutions that are not associated with that ID.
The search is:
index=si-vulnerabilities source="SI: Daily Stats" tag=internal | lookup local=true nessus_last_scan_lookup dest,hostname OUTPUT last_scan | convert timeformat="%m/%d/%Y %H:%M:%S" ctime(last_scan) | rangemap field=severity Critical=5-5, High=4-4, Medium=3-3, Low=2-2, Info=0-1 | search range="Critical" | dedup dest,hostname,domainname,protocol,dest_port | lookup local=true open_vulnerabilities_lookup dest,hostname,domainname,protocol,dest_port,severity OUTPUT vuln_id,scanner | streamstats values(vuln_id) AS vuln_id, values(scanner) AS scanner | fields last_scan,scanner,dest,hostname,domainname,protocol,dest_port,vuln_id,signature_name,solution | mvexpand vuln_id | lookup local=true nessus_plugin_reference_lookup nessus_id AS vuln_id OUTPUT nessus_plugin_name AS signature_name | streamstats values(signature_name) AS signature_name2 | table last_scan,scanner,dest,hostname,domainname,protocol,dest_port,vuln_id,signature_name,signature_name2,solution
I realize that I can just define a new lookup table that points to the same file I'm using now with a max_match of one, but streamstats behavior here seems weird...
Thx.
C