How to filter json data in filebeat yml file - filebeat

While using kafka input, I want to output only when json data contains a specific string.
I tried setting "include_lines" in filebeat.yml, but it was not filtered properly.
When the filebit.yml setting is as follows and data-set1 and 2 are input, not only data-set1 but also data-set2 are output.
I expected only data-set 1 to be output, but it wasn't.
What did I make a mistake?
part of the filebeat.yml
- type: kafka
topics: ["parkbae-test-topic1"]
group_id: "test123"
ssl.enabled: false
include_lines: ['\"event\":\"basket\"']
input data-set1 :
input data-set2 :


extract/filter syslog-ng log on linux

I have configured syslog-ng to receive log from another machine, the logs are coming every minute but contains , how to filter unrequired messages from row data ?
date=2021-06-01 time=10:01:01 ABC="1" cde=2 Xyz="aaa" name=UK
date=2021-06-01 time=10:01:02 ABC="3" cde=5 name=USA
date=2021-06-01 time=10:01:03 ABC="4" cde=2
output of syslog-ng needs to be as below :
`date=2021-06-01 time=10:01:01 ABC="1" cde=2 Xyz="aaa" name=UK `
date=2021-06-01 time=10:01:02 ABC="3" cde=5 XyZ="" name=USA
date=2021-06-01 time=10:01:03 ABC="4" cde=2 XyZ="" name=""
mean filter based on KEY= and if value missing the KEY= should be logged with "" ( so missing value won't be shifted to left ) , so I can filter later as per my need :
I tried to parse with awk & sed but the size of log file generated by syslog "data-20200601.log" is around 10GB and took me long time to get this output
syslog-ng has a parser called kv-parser() that would extract all such key=value parts into syslog-ng name-value pairs.
log {
parser { kv-parser(); };
destination { file("this_is_where_all_logs_go" template("${name} ${ABC}")); };
In the template section, as you can see, you can reference the extracted name-value pairs, using the normal syslog-ng syntax.
You can even format a series of name-value pairs into JSON or other structured formats using $(format-json), $(format-welf), etc.

How to configure filebeat.yml dissect processor for pipe separated multiline logs

I have my filebeat.yml successfully reading a log file. Each entry in the log is multiline, and pipe separated. Something like:
datetime | blurb | blurb2 | <?xml><maintag .....
more xml
more xml
more xml
The multiline processor is working correctly and creating , but I'm then wanting to use a dissect processor to strip out just the 4th part - the xml.
I have tried variants of:
- dissect:
field: "message"
tokenizer: "${} | ${sw.blurb1} | ${sw.blurb2} | ${sw.message_xml}"
target_prefix: ""
But when I start filebeat, it's throwing:
{"log.level":"error","#timestamp":"2022-10-06T08:51:42.612Z","log.origin":{"":"instance/beat.go","file.line":1022},"message":"Exiting: Failed to start crawler: starting input failed: could not unpack config: missing field accessing 'filebeat.inputs.1.processors' (source:'filebeat.yml')","":"filebeat","ecs.version":"1.6.0"}
Exiting: Failed to start crawler: starting input failed: could not unpack config: missing field accessing 'filebeat.inputs.1.processors' (source:'filebeat.yml')
Can anyone advise what I'm getting wrong? The message suggests to me a missing field in my dissect processor definition, but from the docs it looks right to me?
Many thanks!
Ack! Found it! Would really be useful if I could learn the difference between $ and % in my tokenizer!

Delimit BigQuery REGEXP_EXTRACT strings in Google Cloud Build YAML script

I have a complex query that creates a View within the BigQuery console.
I have simplified it to the following to illustrate the issue
REGEXP_EXTRACT(FIELD1, r"[\d]*") as F1,
FROM `project.mydataset.mytable`
Now I am trying to automate the creation of the view with cloud build.
I cannot workout how to delimit the strings inside the regex to work with both yaml and SQL.
- name: ''
entrypoint: 'bq'
args: [
REGEXP_EXTRACT(FIELD1, r"[\d]*") as F1 ,
REGEXP_EXTRACT(FIELD3, r"\[(\d{3,12}).*\]") AS F3
FROM `project.mydataset.mytable`"
I get the following error
Failed to trigger build: failed unmarshalling build config
cloudbuild/build-views.yaml: json: cannot unmarshal number into Go
value of type string
I have tried using Cloud Build substitution parameters, and as many combinations of SQL and YAML escape sequences as I can think of to find a working solution.
Generally, you want to use block scalars in such cases, as they do not process any special characters inside them and are terminated via indentation.
I have no idea how the command is supposed to look, but here's something that's at least valid YAML:
- name: ''
entrypoint: 'bq'
- 'mk'
- '--use_legacy_sql=false'
- '--project_id=${_PROJECT_ID}'
- '--expiration=0'
- >- # folded block scalar; newlines are folded into spaces
REGEXP_EXTRACT(FIELD1, r"[\d]*") as F1,
REGEXP_EXTRACT(FIELD3, r"\[(\d{3,12}).*\]") AS F3
FROM `project.mydataset.mytable`"
- dummy value to show that the scalar ends here
A folded block scalar is started with >, the following minus tells YAML to not append the final newline to its value.

Use dotted YAML variables file in Ansible

I'm trying to achieve the following using Ansible:
Define a YAML file with some variables in the dotted format inside it (variables.yml)
database.hosts[0]: "db0"
database.hosts[1]: "db1"
database.hosts[2]: "db2" 1
foo.baz: 2
Load the variables in variables.yml by using the include_vars module in my playbook (playbook.yml) and print them in a tree structure
- hosts: all
gather_facts: not
- name: "Loading vars"
run_once: true
file: 'variables.yml'
- name: "Testing"
msg: "{{ foo }}"
- name: "Testing"
msg: "{{ database }}"
Running this results in the following error:
fatal: [host0]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'foo' is undefined\n\nThe error appears to be in '.../playbook.yml': line 9, column 7, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n - name: \"Testing\"\n ^ here\n"}
Which makes it clear that each property in the YAML file has been loaded as a separate property and not as properties within two trees rooted in database and foo.
Of course, the playbook works as expected if I specify the properties as follows:
- "db0"
- "db1"
- "db2"
bar: 1
baz: 2
However, I need the YAML variables file to be in the dotted format instead of in the classic indented format. Is there any way to achieve this? E.g.: a module different from include_vars or some configuration that I can add to the ansible.cfg file? I have already tried to use hash_behaviour=merge, but that didn't help.
Q: "I need the YAML variables file to be in the dotted format instead of in the classic indented format. Is there any way to achieve this?"
A: No. It's' not possible. See Creating valid variable names.

Split JSON file using apache PIG

I have a JSON input file that needs to be split into multiple files based on a keyword and the output should also retain the same JSON format.
The keyword here is the value of the object EVT.NAME. Depeneding on the value it should route it to the output.
Input has three different values (KEYPRESS,TUNE,TRICK), so 3 different output files should be created.
Output 2:
Output 3:
You can use JsonLoader and JsonStorage. See this article -
table = LOAD 'file.json'
USING JsonLoader('KEYPRESS:chararray, TUNE:chararray, TRICK:chararray');