I have a few indexes which have around 2.5 billion events each. Unfortunately we don't have a lot of CPU to sort through this massive data and make it meaningful in a dashboard. We're currently in the process of setting up a summary index, but the requirements/fields can change at anytime which mean's we'd have to re-summerize that data.
So my question is, can we use Amazon EMR as a temporary boost in horsepower to Map and Reduce this data back into the summary index? How difficult would this be to do?
↧