Let me preface this question by stating that we currently do not have any major performance issues at this time.
Our Splunk environment is running on a VM with 8 cores and 16 GB of RAM, using iSCSI NetApp storage. Again, everything is running fine. Except: we sometimes experience "Max number of concurrent searches reached". According to the docs, that maximum number is determined by the number of CPUs on the system, unless you override that in a config file somewhere (not recommended).
My first thought was to throw more CPU at the system. Then I recalled that in a VMware environment you can run into CPU contention- if you have 8 CPUs assigned to the VM, it needs to wait for 8 CPUs to be free in the cluster before it can do what it needs to do. So in a way, as counter-intuitive as it may seem - sometimes fewer CPUs is faster than more CPUs.
We logged into the VMware console and verified that we are, in fact, experiencing significant CPU contention. Our admins have actually recommended reducing the number of assigned CPUs.
I know Splunk has a pretty clear formula for the recommended number of CPUs, but that seems to apply to a physical environment. Is anyone else here running Splunk in a virtual environment? If so, how are you dealing with CPU contention?
My other question: I need to work-around the "maximum number of concurrent searches" issue. If we reduce the number of CPUs, we'll see that warning even more. Is it really a bad idea to override that formula in the config file and manually increase the max concurrent searches allowed?
Thanks!
(And before someone recommends it, moving Splunk to a physical server is not an option for us, nor is using direct-attached storage.)