Splunk started crushing with crash logs enries like this:
[build 182037] 2013-11-14 11:02:27
Received fatal signal 6 (Aborted).
Cause:
Signal sent by PID 4283 running under UID 0.
Crashing thread: archivereader
Registers:
RIP: [0x00007F4DC9918B25] gsignal + 53 (/lib/libc.so.6)
RDI: [0x00000000000010BB]
RSI: [0x00000000000010F1]
RBP: [0x00007F4DC9A2E74C]
RSP: [0x00007F4DB7BEE068]
RAX: [0x0000000000000000]
RBX: [0x00007FFF8B497801]
RCX: [0xFFFFFFFFFFFFFFFF]
RDX: [0x0000000000000006]
R8: [0x00007F4DB7BFF700]
R9: [0x00007F4DC9A306B4]
R10: [0x0000000000000008]
R11: [0x0000000000000206]
R12: [0x00000000012747F9]
R13: [0x00000000013871A0]
R14: [0x00007F4DC9A2E74C]
R15: [0x00000000000006DC]
EFL: [0x0000000000000206]
TRAPNO: [0x0000000000000000]
ERR: [0x0000000000000000]
CSGSFS: [0x0000000000000033]
OLDMASK: [0x0000000000000000]
OS: Linux
Arch: x86-64
Backtrace:
[0x00007F4DC9918B25] gsignal + 53 (/lib/libc.so.6)
[0x00007F4DC991C670] abort + 384 (/lib/libc.so.6)
[0x00007F4DC99119F1] __assert_fail + 241 (/lib/libc.so.6)
[0x0000000000D025F3] _ZN14PolledReadPipeD2Ev + 147 (splunkd)
[0x0000000000D4D261] _ZN12PipeToLoggerD2Ev + 97 (splunkd)
[0x0000000000AA0880] _ZN14ArchiveContext7processERK8PathnameP13ISourceWriter + 1216 (splunkd)
[0x0000000000AA0E95] _ZN14ArchiveContext9readFullyEP13ISourceWriterRb + 1221 (splunkd)
[0x000000000083CFA2] _ZN16ArchiveProcessor20haveReadAsNonArchiveE14FileDescriptorlPK3Str + 578 (splunkd)
[0x000000000083EE53] _ZN16ArchiveProcessor4mainEv + 2755 (splunkd)
[0x0000000000D81A2D] _ZN6Thread8callMainEPv + 61 (splunkd)
[0x00007F4DC9C719CA] ? (/lib/libpthread.so.0)
[0x00007F4DC99CE21D] clone + 109 (/lib/libc.so.6)
Linux / css-prod-back.scartel.dc / 2.6.32-45-server / #99-Ubuntu SMP Tue Oct 16 16:41:38 UTC 2012 / x86_64
Last few lines of stderr (may contain info on assertion failure, but also could be old):
2013-11-14 11:00:55.156 +0400 splunkd started (build 182037)
Cannot open manifest file inside "/opt/splunk/var/lib/splunk/audit/db/db_1384412390_1384412390_623/rawdata": No such file or directory
splunkd: /opt/splunk/p4/splunk/branches/6.0.0/src/util/EventLoop.cpp:1756: virtual PolledReadPipe::~PolledReadPipe(): Assertion `!isActive()' failed.
2013-11-14 11:02:25.275 +0400 splunkd started (build 182037)
Cannot open manifest file inside "/opt/splunk/var/lib/splunk/audit/db/db_1384412455_1384412455_624/rawdata": No such file or directory
splunkd: /opt/splunk/p4/splunk/branches/6.0.0/src/util/EventLoop.cpp:1756: virtual PolledReadPipe::~PolledReadPipe(): Assertion `!isActive()' failed.
/etc/debian_version: squeeze/sid
glibc version: 2.11.1
glibc release: stable
Last errno: 11
Threads running: 40
argv: [splunkd -p 8089 start]
Thread: "archivereader", did_join=0, ready_to_run=Y, main_thread=N
First 8 bytes of Thread token @0x7f4db7c2b330:
00000000 00 f7 bf b7 4d 7f 00 00 |....M...|
00000008
x86 CPUID registers:
0: 0000000A 756E6547 6C65746E 49656E69
1: 00010676 04040800 000CE3BD BFEBFBFF
2: 05B0B101 005657F0 00000000 2CB4304E
3: 00000000 00000000 00000000 00000000
4: 00000000 00000000 00000000 00000000
5: 00000040 00000040 00000003 00002220
6: 00000001 00000002 00000001 00000000
7: 00000000 00000000 00000000 00000000
8: 00000400 00000000 00000000 00000000
9: 00000000 00000000 00000000 00000000
A: 07280202 00000000 00000000 00000503
80000000: 80000008 00000000 00000000 00000000
80000001: 00000000 00000000 00000001 20000800
80000002: 65746E49 2952286C 6F655820 2952286E
80000003: 55504320 20202020 20202020 45202020
80000004: 30353435 20402020 30302E33 007A4847
80000005: 00000000 00000000 00000000 00000000
80000006: 00000000 00000000 18008040 00000000
80000007: 00000000 00000000 00000000 00000000
80000008: 00003026 00000000 00000000 00000000
terminating...
We've already tried repair with splunk-fsck but no result and in fact there were no corrupted indexes. The directory that is stated in the crash-message exists, but not sure about manifest file. Here's what in stated directory:
root@css-prod-back:/opt/splunk/bin# ls /opt/splunk/var/lib/splunk/audit/db/db_1384412390_1384412390_623/rawdata/
0 slicesv2.dat
And here's another directory that splunk doesn't complain at:
root@css-prod-back:/opt/splunk/bin# ls /opt/splunk/var/lib/splunk/audit/db/db_1351635002_1351550304_461/rawdata/
journal.gz slicemin.dat slices.dat slicesv2.dat
Seems it's the same problem like here: http://answers.splunk.com/answers/108806/splunkd-keeps-on-crashing-crashing-thread-archivereader?page=1&focusedAnswerId=111215#111215 And unfourtunately it's not square bracket case like here: http://answers.splunk.com/answers/110135/splunk-crashing
So how to rebuid this "missing" manifest file (if the reason in it really)? What is its exact name? Why could this happen?
Thanks in advance for help, it is vital for us.