I'm indexing a bunch of CSV files provided by an external vendor over ftp ( mapped or synched to my local drive ) there may be duplicate rows across different files. the requirement is to take the row from the file with the latest timestamp. I can achieve this by either:
a) ensuring that the order in which splunk indexes my data is in the same order of the file timstamps. can someone suggest how I can do this without having to rewrite in a script the entire 'scan directory for updated files' logic that splunk nicely provides?
b) Can I add an extra field 'fileTimeStamp'? how would I specify this into my props.conf?
c) lookup the file timestamps as a 'lookup' at search time. but if a file is newly updated at search time, but it has not been indexed yet, I may see misleading results.
suggestions please?