How to filter json data in filebeat yml file - filebeat

While using kafka input, I want to output only when json data contains a specific string.
I tried setting "include_lines" in filebeat.yml, but it was not filtered properly.
When the filebit.yml setting is as follows and data-set1 and 2 are input, not only data-set1 but also data-set2 are output.
I expected only data-set 1 to be output, but it wasn't.
What did I make a mistake?
part of the filebeat.yml
filebeat.inputs:
- type: kafka
hosts:
- qa-parkbae-01.hanpda.com:9092,
- qa-parkbae-02.hanpda.com:9092,
- qa-parkbae-03.hanpda.com:9092
topics: ["parkbae-test-topic1"]
group_id: "test123"
ssl.enabled: false
include_lines: ['\"event\":\"basket\"']
input data-set1 :
{"id":"parkbae","event":"basket","data":"test1"}
input data-set2 :
{"id":"parkbae","event":"ball","data":"test2"}

Related

extract/filter syslog-ng log on linux

I have configured syslog-ng to receive log from another machine, the logs are coming every minute but contains , how to filter unrequired messages from row data ?
Example:
date=2021-06-01 time=10:01:01 ABC="1" cde=2 Xyz="aaa" name=UK
date=2021-06-01 time=10:01:02 ABC="3" cde=5 name=USA
date=2021-06-01 time=10:01:03 ABC="4" cde=2
output of syslog-ng needs to be as below :
2020-06-01/data-20200601.log:
`date=2021-06-01 time=10:01:01 ABC="1" cde=2 Xyz="aaa" name=UK `
date=2021-06-01 time=10:01:02 ABC="3" cde=5 XyZ="" name=USA
date=2021-06-01 time=10:01:03 ABC="4" cde=2 XyZ="" name=""
mean filter based on KEY= and if value missing the KEY= should be logged with "" ( so missing value won't be shifted to left ) , so I can filter later as per my need :
I tried to parse with awk & sed but the size of log file generated by syslog "data-20200601.log" is around 10GB and took me long time to get this output
2021-06-01,10:01:01,1,2,aaa,UK
2021-06-01,10:01:02,3,5,,USA
2021-06-01,10:01:03,4,,,,
syslog-ng has a parser called kv-parser() that would extract all such key=value parts into syslog-ng name-value pairs.
log {
source(some_source);
parser { kv-parser(); };
destination { file("this_is_where_all_logs_go" template("${name} ${ABC}")); };
};
In the template section, as you can see, you can reference the extracted name-value pairs, using the normal syslog-ng syntax.
You can even format a series of name-value pairs into JSON or other structured formats using $(format-json), $(format-welf), etc.

How to configure filebeat.yml dissect processor for pipe separated multiline logs

I have my filebeat.yml successfully reading a log file. Each entry in the log is multiline, and pipe separated. Something like:
datetime | blurb | blurb2 | <?xml><maintag .....
more xml
more xml
more xml
</maintag>
The multiline processor is working correctly and creating , but I'm then wanting to use a dissect processor to strip out just the 4th part - the xml.
I have tried variants of:
processors:
- dissect:
field: "message"
tokenizer: "${sw.date} | ${sw.blurb1} | ${sw.blurb2} | ${sw.message_xml}"
target_prefix: ""
But when I start filebeat, it's throwing:
{"log.level":"error","#timestamp":"2022-10-06T08:51:42.612Z","log.origin":{"file.name":"instance/beat.go","file.line":1022},"message":"Exiting: Failed to start crawler: starting input failed: could not unpack config: missing field accessing 'filebeat.inputs.1.processors' (source:'filebeat.yml')","service.name":"filebeat","ecs.version":"1.6.0"}
Exiting: Failed to start crawler: starting input failed: could not unpack config: missing field accessing 'filebeat.inputs.1.processors' (source:'filebeat.yml')
Can anyone advise what I'm getting wrong? The message suggests to me a missing field in my dissect processor definition, but from the docs it looks right to me?
Many thanks!
Ack! Found it! Would really be useful if I could learn the difference between $ and % in my tokenizer!

Delimit BigQuery REGEXP_EXTRACT strings in Google Cloud Build YAML script

I have a complex query that creates a View within the BigQuery console.
I have simplified it to the following to illustrate the issue
SELECT
REGEXP_EXTRACT(FIELD1, r"[\d]*") as F1,
REGEXP_REPLACE(FIELD2, r"\'", "") AS F2,
FROM `project.mydataset.mytable`
Now I am trying to automate the creation of the view with cloud build.
I cannot workout how to delimit the strings inside the regex to work with both yaml and SQL.
- name: 'gcr.io/cloud-builders/gcloud'
entrypoint: 'bq'
args: [
'mk',
'--use_legacy_sql=false',
'--project_id=${_PROJECT_ID}',
'--expiration=0',
'--view=
REGEXP_EXTRACT(FIELD1, r"[\d]*") as F1 ,
REGEXP_REPLACE(FIELD2, r"\'", "") AS F2,
REGEXP_EXTRACT(FIELD3, r"\[(\d{3,12}).*\]") AS F3
FROM `project.mydataset.mytable`"
'${_TARGET_DATASET}.${_TARGET_VIEW}'
]
I get the following error
Failed to trigger build: failed unmarshalling build config
cloudbuild/build-views.yaml: json: cannot unmarshal number into Go
value of type string
I have tried using Cloud Build substitution parameters, and as many combinations of SQL and YAML escape sequences as I can think of to find a working solution.
Generally, you want to use block scalars in such cases, as they do not process any special characters inside them and are terminated via indentation.
I have no idea how the command is supposed to look, but here's something that's at least valid YAML:
- name: 'gcr.io/cloud-builders/gcloud'
entrypoint: 'bq'
args:
- 'mk'
- '--use_legacy_sql=false'
- '--project_id=${_PROJECT_ID}'
- '--expiration=0'
- >- # folded block scalar; newlines are folded into spaces
--view=
REGEXP_EXTRACT(FIELD1, r"[\d]*") as F1,
REGEXP_REPLACE(FIELD2, r"\'", "") AS F2,
REGEXP_EXTRACT(FIELD3, r"\[(\d{3,12}).*\]") AS F3
FROM `project.mydataset.mytable`"
'${_TARGET_DATASET}.${_TARGET_VIEW}'
- dummy value to show that the scalar ends here
A folded block scalar is started with >, the following minus tells YAML to not append the final newline to its value.

Use dotted YAML variables file in Ansible

I'm trying to achieve the following using Ansible:
Define a YAML file with some variables in the dotted format inside it (variables.yml)
database.hosts[0]: "db0"
database.hosts[1]: "db1"
database.hosts[2]: "db2"
foo.bar: 1
foo.baz: 2
Load the variables in variables.yml by using the include_vars module in my playbook (playbook.yml) and print them in a tree structure
- hosts: all
gather_facts: not
tasks:
- name: "Loading vars"
run_once: true
include_vars:
file: 'variables.yml'
- name: "Testing"
debug:
msg: "{{ foo }}"
- name: "Testing"
debug:
msg: "{{ database }}"
Running this results in the following error:
fatal: [host0]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'foo' is undefined\n\nThe error appears to be in '.../playbook.yml': line 9, column 7, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n - name: \"Testing\"\n ^ here\n"}
Which makes it clear that each property in the YAML file has been loaded as a separate property and not as properties within two trees rooted in database and foo.
Of course, the playbook works as expected if I specify the properties as follows:
database:
hosts:
- "db0"
- "db1"
- "db2"
foo:
bar: 1
baz: 2
However, I need the YAML variables file to be in the dotted format instead of in the classic indented format. Is there any way to achieve this? E.g.: a module different from include_vars or some configuration that I can add to the ansible.cfg file? I have already tried to use hash_behaviour=merge, but that didn't help.
Q: "I need the YAML variables file to be in the dotted format instead of in the classic indented format. Is there any way to achieve this?"
A: No. It's' not possible. See Creating valid variable names.

Split JSON file using apache PIG

I have a JSON input file that needs to be split into multiple files based on a keyword and the output should also retain the same JSON format.
Example:
The keyword here is the value of the object EVT.NAME. Depeneding on the value it should route it to the output.
Input has three different values (KEYPRESS,TUNE,TRICK), so 3 different output files should be created.
Input:
{"PV":"1.0","DEV":{"DEV_ID":"P0100011103"},"EVT":{"NAME":"KEYPRESS","ETS":1402672866844,"VALUE":{"KEY":"PLAY"}},"HOST":"XXX"}
{"PV":"1.0","DEV":{"DEV_ID":"P0100011103"},"EVT":{"NAME":"TUNE","ETS":1402672867117,"VALUE":{"KEY":"PLAY"}},"HOST":"XXX"}
{"PV":"1.0","DEV":{"DEV_ID":"P0100011103"},"EVT":{"NAME":"TRICK","ETS":1402672868600,"VALUE":{"KEY":"PLAY"}},"HOST":"XXX"}
{"PV":"1.0","DEV":{"DEV_ID":"P0100011103"},"EVT":{"NAME":"KEYPRESS","ETS":1402672868888,"VALUE":{"KEY":"PLAY"}},"HOST":"XXX"}
{"PV":"1.0","DEV":{"DEV_ID":"P0100011103"},"EVT":{"NAME":"TRICK","ETS":1402673179313,"VALUE":{"KEY":"FAST_FORWARD"}},"HOST":"XXX"}
Output1:
{"PV":"1.0","DEV":{"DEV_ID":"P0100011103"},"EVT":{"NAME":"KEYPRESS","ETS":1402672866844,"VALUE":{"KEY":"PLAY"}},"HOST":"XXX"}
{"PV":"1.0","DEV":{"DEV_ID":"P0100011103"},"EVT":{"NAME":"KEYPRESS","ETS":1402672868888,"VALUE":{"KEY":"PLAY"}},"HOST":"XXX"}
Output 2:
{"PV":"1.0","DEV":{"DEV_ID":"P0100011103"},"EVT":{"NAME":"TUNE","ETS":1402672867117,"VALUE":{"KEY":"PLAY"}},"HOST":"XXX"}
Output 3:
{"PV":"1.0","DEV":{"DEV_ID":"P0100011103"},"EVT":{"NAME":"TRICK","ETS":1402672868600,"VALUE":{"KEY":"PLAY"}},"HOST":"XXX"}
{"PV":"1.0","DEV":{"DEV_ID":"P0100011103"},"EVT":{"NAME":"TRICK","ETS":1402673179313,"VALUE":{"KEY":"FAST_FORWARD"}},"HOST":"XXX"}
You can use JsonLoader and JsonStorage. See this article - http://joshualande.com/read-write-json-apache-pig.
table = LOAD 'file.json'
USING JsonLoader('KEYPRESS:chararray, TUNE:chararray, TRICK:chararray');