I am trying to read log files from Hadoop cluster. These are unstructured files which otherwise can be filtered after indexing using Regex searches. But my input data is huge and the throughput requirement is also very high. The result is only a small portion of the input. Hence is it possible to filter the input data before being indexed by Hunk so that I can avoid searching unnecessary data
↧