Hi,
I am in great troubles with a multilines events i'm trying to analyse, and associated required regex to extract fields.
An example of an event (sql query output):
---- Identification ----
Date : Mon Apr 28 19:00:00 DFT 2014
Hostname : MYHOST01
Script : RQ_TB.ksh
Version courante : 1.0
------------------------
Database Connection Information
Database server = DB2/AIX64 9.5.3
SQL authorization ID = MYF0001
Local database alias = MYF0002
--
-- SUMMARY OF USER TABLE DATA SIZES
--
select CURRENT SERVER as DBNAME, CURRENT TIMESTAMP as CURRENT_TIMESTAMP, S.USER_DATA_L_SIZE_KB, DEC( (S.USER_DATA_L_SIZE_KB/1073741824.0), 31, 11 ) as USER_DATA_L_SIZE_TB, COALESCE( CEIL( DEC( (S.USER_DATA_L_SIZE_KB/1073741824.0), 31, 11 ) ), 1 ) as USER_DATA_L_ENTITLEMENT_REQ_TB from ( select ( sum(A.DATA_OBJECT_L_SIZE) + sum(A.LONG_OBJECT_L_SIZE) + sum(A.LOB_OBJECT_L_SIZE) + sum(XML_OBJECT_L_SIZE) ) as USER_DATA_L_SIZE_KB from SYSIBMADM.ADMINTABINFO as A, ( select TABSCHEMA, TABNAME, OWNER, OWNERTYPE, TYPE, STATUS, TABLEID, TBSPACEID from SYSCAT.TABLES where OWNERTYPE = 'U' and TYPE IN ('G', 'H', 'L', 'S', 'T', 'U') ) as T where A.TABNAME = T.TABNAME and A.TABSCHEMA = T.TABSCHEMA ) as S
DBNAME CURRENT_TIMESTAMP USER_DATA_L_SIZE_KB USER_DATA_L_SIZE_TB USER_DATA_L_ENTITLEMENT_REQ_TB
------------------ -------------------------- -------------------- --------------------------------- ---------------------------------
MYF0002 2014-04-28-19.00.01.768168 110325200 0.10274834930 1.
1 record(s) selected.
--
-- BREAKDOWN OF USER TABLE DATA SIZES
--
select rtrim(A.TABSCHEMA) SCHEMA, rtrim(A.TABNAME) TABLENAME, sum(A.DATA_OBJECT_L_SIZE) as DATA_OBJECT_L_SIZE_KB, sum(A.LONG_OBJECT_L_SIZE) as LONG_OBJECT_L_SIZE_KB, sum(A.LOB_OBJECT_L_SIZE) as LOB_OBJECT_L_SIZE_KB, sum(XML_OBJECT_L_SIZE) as XML_OBJECT_L_SIZE_KB, ( sum(A.DATA_OBJECT_L_SIZE) + sum(A.LONG_OBJECT_L_SIZE) + sum(A.LOB_OBJECT_L_SIZE) + sum(XML_OBJECT_L_SIZE) ) as USER_DATA_L_SIZE_KB, T.COMPRESSION, T.PCTPAGESSAVED as Taux_de_compression from SYSIBMADM.ADMINTABINFO as A, ( select TABSCHEMA, TABNAME, OWNER, OWNERTYPE, TYPE, STATUS, COMPRESSION, TABLEID, TBSPACEID, PCTPAGESSAVED from SYSCAT.TABLES where OWNERTYPE = 'U' and TYPE IN ('G', 'H', 'L', 'S', 'T', 'U') ) as T where A.TABNAME = T.TABNAME and A.TABSCHEMA = T.TABSCHEMA group by A.TABSCHEMA, A.TABNAME, T.COMPRESSION, T.PCTPAGESSAVED order by A.TABSCHEMA, A.TABNAME
SCHEMA TABLENAME DATA_OBJECT_L_SIZE_KB LONG_OBJECT_L_SIZE_KB LOB_OBJECT_L_SIZE_KB XML_OBJECT_L_SIZE_KB USER_DATA_L_SIZE_KB COMPRESSION TAUX_DE_COMPRESSION
-------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- --------------------- --------------------- -------------------- -------------------- -------------------- ----------- -------------------
SCHEMA01 ADVISE_INDEX 128 0 144 0 272 N -1
SCHEMA01 ADVISE_INSTANCE 128 0 0 0 128 N -1
SCHEMA01 ADVISE_MQT 128 0 144 0 272 N -1
SCHEMA01 ADVISE_PARTITION 128 0 144 0 272 N -1
SCHEMA01 ADVISE_TABLE 128 0 144 0 272 N -1
SCHEMA01 ADVISE_WORKLOAD 128 0 144 0 272 N -1
SCHEMA01 EXPLAIN_ARGUMENT 128 0 144 0 272 N -1
SCHEMA01 EXPLAIN_DIAGNOSTIC 128 0 0 0 128 N -1
SCHEMA01 EXPLAIN_DIAGNOSTIC_DATA 128 0 144 0 272 N -1
SCHEMA01 EXPLAIN_INSTANCE 128 0 0 0 128 N -1
SCHEMA01 EXPLAIN_OBJECT 128 0 0 0 128 N -1
SCHEMA01 EXPLAIN_OPERATOR 128 0 0 0 128 N -1
SCHEMA01 EXPLAIN_PREDICATE 128 0 144 0 272 N -1
SCHEMA01 EXPLAIN_STATEMENT 128 0 144 0 272 N -1
SCHEMA01 EXPLAIN_STREAM 128 0 144 0 272 N -1
SCHEMA02 TE_DEC_ENC_REM 14523392 0 0 0 14523392 N 0
Because i need to be able to extract all data within the event, i'm indexing it in multi-line events with config as:
props.conf:
[db2compress]
# your settings
MAX_EVENTS=100000
NO_BINARY_CHECK=1
TIME_FORMAT=%a %b %d %H:%M:%S DFT %Y
TIME_PREFIX=Date :
REPORT-extract_regfields = regfields
EXTRACT-hostname = (?i)Hostname : (?P<HOSTNAME>\w+)
EXTRACT-database_server = (?i)Database server = (?P<DATABASE_SERVER>[0-9a-zA-Z/]+)
EXTRACT-sql_auth_id = (?i)SQL authorization ID = (?P<SQL_AUTH_ID>\w+)
EXTRACT-database_alias = (?i)Local database alias = (?P<DATABASE_ALIAS>\w+)
EXTRACT-entitlement = (?im)^\w+\s+\d+\-\d+\-\d+\-\d+\.\d+\.\d+\.\d+\s+\d+\s+\d+\.\d+\s+(?P<ENTITLEMENT>[^\.]+)
EXTRACT-size_KB = (?im)^(?:[^\.\n]*\.){3}\d+\s+(?P<SIZE_KB>[^ ]+)
EXTRACT-size_TB = (?im)^\w+\s+\d+\-\d+\-\d+\-\d+\.\d+\.\d+\.\d+\s+\d+\s+(?P<SIZE_TB>[^ ]+)
Transforms.conf:
[regfields]
REGEX = (?im)^(?P<SCHEMA>\w+)\s+(?P<TABLENAME>\w+)\s+(?P<DATA_OBJECT_L_SIZE_KB>\d+)\s+(?P<LONG_OBJECT_L_SIZE_KB>\d+)\s+(?P<LOB_OBJECT_L_SIZE_KB>\d+)\s+(?P<XML_OBJECT_L_SIZE_KB>\d+)\s+(?P<USER_DATA_L_SIZE_KB>\d+)\s+(?P<COMPRESSION>\w+)\s+(?P<TAUX_DE_COMPRESSION>[\-]*\d+)
MV_ADD = True
Events are being indexed with success as multi-lines, and everything could seem to be ok.
BUT, the regex used to extract fields from the schema detail:
(?im)^(?P<SCHEMA>\w+)\s+(?P<TABLENAME>\w+)\s+(?P<DATA_OBJECT_L_SIZE_KB>\d+)\s+(?P<LONG_OBJECT_L_SIZE_KB>\d+)\s+(?P<LOB_OBJECT_L_SIZE_KB>\d+)\s+(?P<XML_OBJECT_L_SIZE_KB>\d+)\s+(?P<USER_DATA_L_SIZE_KB>\d+)\s+(?P<COMPRESSION>\w+)\s+(?P<TAUX_DE_COMPRESSION>[\-]*\d+)
Does not to seem to do the job, when i try to achieve some simple stats with Splunk, i get impossible results (such as a simple stats count(TABLENAME) by SCHEMA)
When i check in details with a "stats values(DATA_OBJECT_L_SIZE_KB) by SCHEMA,TABLENAME" for example, i see the values contains every values of the full event field and not the result of the association between keys (and so the value for this table only) as it should be
when i achieve a "table SCHEMA,TABLENAME,DATA_OBJECT_L_SIZE_KB" for example, then the data is correct but even a stats after the table command reports bad results.
So i think the issue in my regex, but this is driving me crazy and i can't get to know why...
If i remove the multi-line mode of the regex with such command:
index=db2compress sourcetype=db2compress
| rex max_match=1 "(?m-s)^(?P<SCHEMA>\w+)\s+(?P<TABLENAME>\w+)\s+(?P<DATA_OBJECT_L_SIZE_KB>\d+)\s+(?P<LONG_OBJECT_L_SIZE_KB>\d+)\s+(?P<LOB_OBJECT_L_SIZE_KB>\d+)\s+(?P<XML_OBJECT_L_SIZE_KB>\d+)\s+(?P<USER_DATA_L_SIZE_KB>\d+)\s+(?P<COMPRESSION>\w+)\s+(?P<TAUX_DE_COMPRESSION>[\-]*\d+)"
Then i off course only get the first result, so the multi line is required, i'm thinking in something with back line return or something or like that, but everything i tried has failed.
Thank you VERY VERY much for any help !