How to configure filebeat.yml dissect processor for pipe separated multiline logs - filebeat

I have my filebeat.yml successfully reading a log file. Each entry in the log is multiline, and pipe separated. Something like:
datetime | blurb | blurb2 | <?xml><maintag .....
more xml
more xml
more xml
</maintag>
The multiline processor is working correctly and creating , but I'm then wanting to use a dissect processor to strip out just the 4th part - the xml.
I have tried variants of:
processors:
- dissect:
field: "message"
tokenizer: "${sw.date} | ${sw.blurb1} | ${sw.blurb2} | ${sw.message_xml}"
target_prefix: ""
But when I start filebeat, it's throwing:
{"log.level":"error","#timestamp":"2022-10-06T08:51:42.612Z","log.origin":{"file.name":"instance/beat.go","file.line":1022},"message":"Exiting: Failed to start crawler: starting input failed: could not unpack config: missing field accessing 'filebeat.inputs.1.processors' (source:'filebeat.yml')","service.name":"filebeat","ecs.version":"1.6.0"}
Exiting: Failed to start crawler: starting input failed: could not unpack config: missing field accessing 'filebeat.inputs.1.processors' (source:'filebeat.yml')
Can anyone advise what I'm getting wrong? The message suggests to me a missing field in my dissect processor definition, but from the docs it looks right to me?
Many thanks!

Ack! Found it! Would really be useful if I could learn the difference between $ and % in my tokenizer!

Related

Writing apache beam pCollection to bigquery causes type Error

I have a simple beam pipeline, as follows:
with beam.Pipeline() as pipeline:
output = (
pipeline
| 'Read CSV' >> beam.io.ReadFromText('raw_files/myfile.csv',
skip_header_lines=True)
| 'Split strings' >> beam.Map(lambda x: x.split(','))
| 'Convert records to dictionary' >> beam.Map(to_json)
| beam.io.WriteToBigQuery(project='gcp_project_id',
dataset='datasetID',
table='tableID',
create_disposition=bigquery.CreateDisposition.CREATE_NEVER,
write_disposition=bigquery.WriteDisposition.WRITE_APPEND
)
)
However upon runnning I get a typeError, stating the following:
line 2147, in __init__
self.table_reference = bigquery_tools.parse_table_reference(if isinstance(table,
TableReference):
TypeError: isinstance() arg 2 must be a type or tuple of types
I have tried defining a TableReference object and passing it to the WriteToBigQuery class but still facing the same issue. Am I missing something here? I've been stuck at this step for what feels like forever and I don't know what to do. Any help is appreciated!
This probably occurred since you installed Apache Beam without the GCP modules. Please make sure to do following (in a virtual environment).
pip install apache-beam[gcp]
It's a weird error though so feel free to file a Github issue against the Apache Beam project.

KIBANA - WAZUH pattern index

I have a project to install wazuh as FIM on linux, AIX and windows.
I managed to install Manager and all agents on all systems and I can see all three connected on the Kibana web as agents.
I created test file on the linux agent and I can find it also on web interface, so servers are connected.
Here is test file found in wazuh inventory tab
But, I am not recieving any logs if I modify this test file.
This is my settings in ossec.conf under syscheck on agent server>
<directories>/var/ossec/etc/test</directories>
<directories report_changes="yes" check_all="yes" realtime="yes">/var/ossec/etc/test</directories>
And now I ma also strugling to understand meanings of index patterns, index templates and fields.
I dont understand what they are and why we need to set it.
My settings on manager server - /usr/share/kibana/data/wazuh/config/wazuh.yml
alerts.sample.prefix: 'wazuh-alerts-*'
pattern: 'wazuh-alerts-*'
On the kibana web I also have this error when I am trying to check ,,events,, -the are no logs in the events.
Error: The field "timestamp" associated with this object no longer exists in the index pattern. Please use another field.
at FieldParamType.config.write.write (http://MYIP:5601/42959/bundles/plugin/data/kibana/data.plugin.js:1:627309)
at http://MYIP:5601/42959/bundles/plugin/data/kibana/data.plugin.js:1:455052
at Array.forEach (<anonymous>)
at writeParams (http://MYIP:5601/42959/bundles/plugin/data/kibana/data.plugin.js:1:455018)
at AggConfig.write (http://MYIP:5601/42959/bundles/plugin/data/kibana/data.plugin.js:1:355081)
at AggConfig.toDsl (http://MYIP:5601/42959/bundles/plugin/data/kibana/data.plugin.js:1:355960)
at http://MYIP:5601/42959/bundles/plugin/data/kibana/data.plugin.js:1:190748
at Array.forEach (<anonymous>)
at agg_configs_AggConfigs.toDsl (http://MYIP:5601/42959/bundles/plugin/data/kibana/data.plugin.js:1:189329)
at http://MYIP:5601/42959/bundles/plugin/wazuh/4.2.5-4206-1/wazuh.chunk.6.js:55:1397640
Thank you.
About FIM:
here you can find the FIM documentation in case you don't have it:
https://documentation.wazuh.com/current/user-manual/capabilities/file-integrity/fim-configuration.html
https://documentation.wazuh.com/current/user-manual/reference/ossec-conf/syscheck.html.
The first requirement for this to work would be to ensure a FIM alert is triggered, could you check the alerts.json file on your manager? It is usually located under /var/ossec/logs/alerts/alerts.json In order to test this fully I would run "tail -f /var/ossec/logs/alerts/alerts.json" and make a change in yout directory , if no alerts is generated, then we will need to check the agent configuration.
About indexing:
Here you can find some documentation:
https://www.elastic.co/guide/en/elasticsearch/reference/current/index-templates.html
https://www.elastic.co/guide/en/kibana/current/managing-index-patterns.html#scripted-fields
https://documentation.wazuh.com/current/user-manual/kibana-app/reference/elasticsearch.html
Regarding your error, The best way to solve this is to delete the index. To do this:
got to Kibana -> Stack management -> index patterns and there delete wazuh-alerts-*.
Then if you enter to Wazuh App the health check will create it again or you can follow this to create your index:
Go to kibana -> stack management -> index pattern and select Create index pattern.
Hope this information helps you.
Regards.
thank you for your answer.
I managed to step over this issue, but I hit another error.
When I check tail -f /var/ossec/logs/alerts/alerts.json I got never ending updating, thousands lines with errors like.
{"timestamp":"2022-01-31T12:40:08.458+0100","rule":{"level":5,"description":"Systemd: Service has entered a failed state, and likely has not started.","id":"40703","firedtimes":7420,"mail":false,"groups":["local","systemd"],"gpg13":["4.3"],"gdpr":["IV_35.7.d"]},"agent":{"id":"003","name":"MYAGENTSERVERNAME","ip":"X.X.X.X"},"manager":{"name":"MYMANAGERSERVERNAME"},"id":"1643629208.66501653","full_log":"Jan 31 12:40:07 MYAGENTSERVERNAME systemd: Unit rbro-cbs-adapter-int.service entered failed state.","predecoder":{"program_name":"systemd","timestamp":"Jan 31 12:40:07","hostname":"MYAGENTSERVERNAME"},"decoder":{"name":"systemd"},"location":"/var/log/messages"}
But, I can also find alert if I change monitored file. (file> wazuhtest)
{"timestamp":"2022-01-31T12:45:59.874+0100","rule":{"level":7,"description":"Integrity checksum changed.","id":"550","mitre":{"id":["T1492"],"tactic":["Impact"],"technique":["Stored Data Manipulation"]},"firedtimes":1,"mail":false,"groups":["ossec","syscheck","syscheck_entry_modified","syscheck_file"],"pci_dss":["11.5"],"gpg13":["4.11"],"gdpr":["II_5.1.f"],"hipaa":["164.312.c.1","164.312.c.2"],"nist_800_53":["SI.7"],"tsc":["PI1.4","PI1.5","CC6.1","CC6.8","CC7.2","CC7.3"]},"agent":{"id":"003","name":"MYAGENTSERVERNAME","ip":"x.x.xx.x"},"manager":{"name":"MYMANAGERSERVERNAME"},"id":"1643629559.67086751","full_log":"File '/var/ossec/etc/wazuhtest' modified\nMode: realtime\nChanged attributes: size,mtime,inode,md5,sha1,sha256\nSize changed from '61' to '66'\nOld modification time was: '1643618571', now it is '1643629559'\nOld inode was: '786558', now it is '786559'\nOld md5sum was: '2dd5fe4d08e7c58dfdba76e55430ba57'\nNew md5sum is : 'd8b218e9ea8e2da8e8ade8498d06cba8'\nOld sha1sum was: 'ca9bac5a2d8e6df4aa9772b8485945a9f004a2e3'\nNew sha1sum is : 'bd8b8b5c20abfe08841aa4f5aaa1e72f54a46d31'\nOld sha256sum was: '589e6f3d691a563e5111e0362de0ae454aea52b7f63014cafbe07825a1681320'\nNew sha256sum is : '7f26a582157830b1a725a059743e6d4d9253e5f98c52d33863bc7c00cca827c7'\n","syscheck":{"path":"/var/ossec/etc/wazuhtest","mode":"realtime","size_before":"61","size_after":"66","perm_after":"rw-r-----","uid_after":"0","gid_after":"0","md5_before":"2dd5fe4d08e7c58dfdba76e55430ba57","md5_after":"d8b218e9ea8e2da8e8ade8498d06cba8","sha1_before":"ca9bac5a2d8e6df4aa9772b8485945a9f004a2e3","sha1_after":"bd8b8b5c20abfe08841aa4f5aaa1e72f54a46d31","sha256_before":"589e6f3d691a563e5111e0362de0ae454aea52b7f63014cafbe07825a1681320","sha256_after":"7f26a582157830b1a725a059743e6d4d9253e5f98c52d33863bc7c00cca827c7","uname_after":"root","gname_after":"root","mtime_before":"2022-01-31T09:42:51","mtime_after":"2022-01-31T12:45:59","inode_before":786558,"inode_after":786559,"diff":"1c1\n< dadadadadad\n---\n> dfsdfdadadadadad\n","changed_attributes":["size","mtime","inode","md5","sha1","sha256"],"event":"modified"},"decoder":{"name":"syscheck_integrity_changed"},"location":"syscheck"}
{"timestamp":"2022-01-31T12:46:08.452+0100","rule":{"level":3,"description":"Log file rotated.","id":"591","firedtimes":5,"mail":false,"groups":["ossec"],"pci_dss":["10.5.2","10.5.5"],"gpg13":["10.1"],"gdpr":["II_5.1.f","IV_35.7.d"],"hipaa":["164.312.b"],"nist_800_53":["AU.9"],"tsc":["CC6.1","CC7.2","CC7.3","PI1.4","PI1.5","CC7.1","CC8.1"]},"agent":{"id":"003","name":"MYAGENTSERVERNAME","ip":"x.x.xx.x"},"manager":{"name":"MYMANAGERSERVERNAME"},"id":"1643629568.67099280","full_log":"ossec: File rotated (inode changed): '/var/ossec/etc/wazuhtest'.","decoder":{"name":"ossec"},"location":"wazuh-logcollector"}
Also I can see this alert in messages logs on the manager server>
Jan 31 12:46:10 MYMANAGERSERVERNAME filebeat[186670]: 2022-01-31T12:46:10.379+0100#011WARN#011[elasticsearch]#011elasticsearch/client.go:405#011Cannot index event publisher.Event{Content:beat.Event{Timestamp:time.Time{wall:0xc07610e0563729bf, ext:10888984451164, loc:(*time.Location)(0x55958e3622a0)}, Meta:{"pipeline":"filebeat-7.14.0-wazuh-alerts-pipeline"}, Fields:{"agent":{"ephemeral_id":"dd9ff0c5-d5a9-4a0e-b1b3-0e9d7e8997ad","hostname":"MYMANAGERSERVERNAME","id":"03fb57ca-9940-4886-9e6e-a3b3e635cd35","name":"MYMANAGERSERVERNAME","type":"filebeat","version":"7.14.0"},"ecs":{"version":"1.10.0"},"event":{"dataset":"wazuh.alerts","module":"wazuh"},"fields":{"index_prefix":"wazuh-alerts-4.x-"},"fileset":{"name":"alerts"},"host":{"name":"MYMANAGERSERVERNAME"},"input":{"type":"log"},"log":{"file":{"path":"/var/ossec/logs/alerts/alerts.json"},"offset":127261462},"message":"{"timestamp":"2022-01-31T12:46:08.452+0100","rule":{"level":3,"description":"Log file rotated.","id":"591","firedtimes":5,"mail":false,"groups":["ossec"],"pci_dss":["10.5.2","10.5.5"],"gpg13":["10.1"],"gdpr":["II_5.1.f","IV_35.7.d"],"hipaa":["164.312.b"],"nist_800_53":["AU.9"],"tsc":["CC6.1","CC7.2","CC7.3","PI1.4","PI1.5","CC7.1","CC8.1"]},"agent":{"id":"003","name":"xlcppt36","ip":"10.74.96.34"},"manager":{"name":"MYMANAGERSERVERNAME"},"id":"1643629568.67099280","full_log":"ossec: File rotated (inode changed): '/var/ossec/etc/wazuhtest'.","decoder":{"name":"ossec"},"location":"wazuh-logcollector"}","service":{"type":"wazuh"}}, Private:file.State{Id:"native::706-64776", PrevId:"", Finished:false, Fileinfo:(*os.fileStat)(0xc00095ea90), Source:"/var/ossec/logs/alerts/alerts.json", Offset:127262058, Timestamp:time.Time{wall:0xc076063e1f1b1286, ext:133605185, loc:(*time.Location)(0x55958e3622a0)}, TTL:-1, Type:"log", Meta:map[string]string(nil), FileStateOS:file.StateOS{Inode:0x2c2, Device:0xfd08}, IdentifierName:"native"}, TimeSeries:false}, Flags:0x1, Cache:publisher.EventCache{m:common.MapStr(nil)}} (status=400): {"type":"illegal_argument_exception","reason":"data_stream [<wazuh-alerts-4.x-{2022.01.31||/d{yyyy.MM.dd|UTC}}>] must not contain the following characters [ , ", *, \, <, |, ,, >, /, ?]"}
Here is output form apps check.
curl "http://localhost:9200"
{
"version" : {
"number" : "7.14.2",
"build_flavor" : "default",
"build_type" : "rpm",
"build_hash" : "6bc13727ce758c0e943c3c21653b3da82f627f75",
"build_date" : "2021-09-15T10:18:09.722761972Z",
"build_snapshot" : false,
"lucene_version" : "8.9.0",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}
filebeat test output
elasticsearch: http://127.0.0.1:9200...
parse url... OK
connection...
parse host... OK
dns lookup... OK
addresses: 127.0.0.1
dial up... OK
TLS... WARN secure connection disabled
talk to server... OK
version: 7.14.2
So .. I can see alerts coming from Agent, but Its not reaching Kibana yet. On the kibana web I can see agent active and connected.

Unable to extract data with double pipe delimiter in Pig Script

I am trying to extract data which is pipe delimited in Pig. Following is my command
L = LOAD 'entirepath_in_HDFS/b.txt/part-m*' USING PigStorage('||');
Iam getting following error
2016-08-04 23:58:21,122 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse:
<line 1, column 4> pig script failed to validate: java.lang.RuntimeException: could not instantiate 'PigStorage' with arguments '[||]'
My input sample file has exactly 5 lines as following
POS_TIBCO||HDFS||POS_LOG||1||7806||2016-07-18||1||993||0
POS_TIBCO||HDFS||POS_LOG||2||7806||2016-07-18||1||0||0
POS_TIBCO||HDFS||POS_LOG||3||7806||2016-07-18||1||0||5
POS_TIBCO||HDFS||POS_LOG||4||7806||2016-07-18||1||0||0
POS_TIBCO||HDFS||POS_LOG||5||7806||2016-07-18||1||0||19.99
I tried several options like using the backslash before delimiter(\||,\|\|) but everything failed. Also, I tried with schema but got the same error.I am using Horton works(HDP2.2.4) and pig (0.14.0).
Any help is appreciated. Please let me know if you need any further details.
I have faced this case, and by checking PigStorage code source, i think PigStorage argument should be parsed into only one character.
So we can use this code instead:
L0 = LOAD 'entirepath_in_HDFS/b.txt/part-m*' USING PigStorage('|');
L = FOREACH L0 GENERATE $0,$2,$4,$6,$8,$10,$12,$14,$16;
Its helpful if you know how many column you have, and it will not affect performance because it's map side.
When you load data using PigStorage, It only expects single character as delimiter.
However if still you want to achieve this you can use MyRegExLoader-
REGISTER '/path/to/piggybank.jar'
A = LOAD '/path/to/dataset' USING org.apache.pig.piggybank.storage.MyRegExLoader('||')
as (movieid:int, title:chararray, genre:chararray);

Tail and grep combo to show text in a multi line pattern

Not sure if you can do this through tail and grep. Lets say I have a log file that I would like to tail. It spits out quite a bit of information when in debug mode. I want to grep for information pertaining to only my module, and the module name is in the log like so:
/*** Module Name | 2014.01.29 14:58:01
a multi line
dump of some stacks
or whatever
**/
/*** Some Other Module Name | 2014.01.29 14:58:01
this should show up in the grep
**/
So as you can imagine, the number of lines that would be spit out pertaining to "Module Name" could be 2, or 50 until the end pattern appears (**/)
From your description, it seems like this should work:
tail -f log | awk '/^\/\*\*\* Module Name/,/^\*\*\//'
but be wary of buffering issues. (Lines printed to the file will very likely see high latency before actually being printed.)
I think you will have to install pcregrep for this. Then this will work:
tail -f logfile | pcregrep -M "^\/\*\*\* Some Other Module Name \|.*(\n|.)*?^\*\*/$"
This worked in testing, but for some reason, I had to append your example text to the log file over 100 times before output started showing up. However, all of the "Some other Module Name" data that was written to the log file as soon as this line was invoked was eventually printed to stdout.

Unexpected error while loading data

I am getting an "Unexpected" error. I tried a few times, and I still could not load the data. Is there any other way to load data?
gs://log_data/r_mini_raw_20120510.txt.gzto567402616005:myv.may10c
Errors:
Unexpected. Please try again.
Job ID: job_4bde60f1c13743ddabd3be2de9d6b511
Start Time: 1:48pm, 12 May 2012
End Time: 1:51pm, 12 May 2012
Destination Table: 567402616005:myvserv.may10c
Source URI: gs://log_data/r_mini_raw_20120510.txt.gz
Delimiter: ^
Max Bad Records: 30000
Schema:
zoneid: STRING
creativeid: STRING
ip: STRING
update:
I am using the file that can be found here:
http://saraswaticlasses.net/bad.csv.zip
bq load -F '^' --max_bad_record=30000 mycompany.abc bad.csv id:STRING,ceid:STRING,ip:STRING,cb:STRING,country:STRING,telco_name:STRING,date_time:STRING,secondary:STRING,mn:STRING,sf:STRING,uuid:STRING,ua:STRING,brand:STRING,model:STRING,os:STRING,osversion:STRING,sh:STRING,sw:STRING,proxy:STRING,ah:STRING,callback:STRING
I am getting an error "BigQuery error in load operation: Unexpected. Please try again."
The same file works from Ubuntu while it does not work from CentOS 5.4 (Final)
Does the OS encoding need to be checked?
The file you uploaded has an unterminated quote. Can you delete that line and try again? I've filed an internal bigquery bug to be able to handle this case more gracefully.
$grep '"' bad.csv
3000^0^1.202.218.8^2f1f1491^CN^others^2012-05-02 20:35:00^^^^^"Mozilla/5.0^generic web browser^^^^^^^^
When I run a load from my workstation (Ubuntu), I get a warning about the line in question. Note that if you were using a larger file, you would not see this warning, instead you'd just get a failure.
$bq show --format=prettyjson -j job_e1d8636e225a4d5f81becf84019e7484
...
"status": {
"errors": [
{
"location": "Line:29057 / Field:12",
"message": "Missing close double quote (\") character: field starts with: <Mozilla/>",
"reason": "invalid"
}
]
My suspicion is that you have rows or fields in your input data that exceed the 64 KB limit. Perhaps re-check the formatting of your data, check that it is gzipped properly, and if all else fails, try importing uncompressed data. (One possibility is that the entire compressed file is being interpreted as a single row/field that exceeds the aforementioned limit.)
To answer your original question, there are a few other ways to import data: you could upload directly from your local machine using the command-line tool or the web UI, or you could use the raw API. However, all of these mechanisms (including the Google Storage import that you used) funnel through the same CSV parser, so it's possible that they'll all fail in the same way.