Filebeat: how to create new field from the path? - filebeat

i would like to add new field extracted from the path what will be used. I have two path, see below.
paths:
- /home/*/app/logs/*.log
# - /home/v209/app/logs/*.log
# - /home/v146/app/logs/*.log
fields:
campaign: v209
fields_under_root: true
i would like to create new field campaign only with folder name like v209 or v146 any idea, how to do this in filebeads?
Thank you in advance!

Here are three suggested solutions tested with Filebeat 7.1.3
1) Static configuration of campaign field per input
filebeat.inputs:
- type: filestream
id: v209
paths:
- "/home/v209/app/logs/*.log"
fields:
campaign: v209
fields_under_root: true
- type: filestream
id: v146
paths:
- "/home/v146/app/logs/*.log"
fields:
campaign: v146
fields_under_root: true
output.console:
pretty: true
Explanation: This solution is simple. Each file input will have a field set (campaign) based on a static config.
Pros/Cons: This option has the problem of having to add a new campaign field every time you add a new path. For dynamic environments, this can pose a serious operational problem but it's dead simple to implement.
2) Dynamically extract campaign name from file path
processors:
- dissect:
tokenizer: "/%{key1}/%{campaign}/%{key3}/%{key4}/%{key5}"
field: "log.file.path"
target_prefix: ""
- drop_fields:
when:
has_fields: ['key1','key3','key4','key5']
fields: ['key1','key3','key4','key5']
Explanation: These processors work on top of your filestream or log input messages. The dissect processor will tokenize your path string and extract each element of your full path. The drop_fields processor will remove all fields of no interest and only keep the second path element (campaign id).
Pros/Cons: Assuming your path structures are stable, with this solution you don't have to do anything when new files appear under /home/*/app/logs/*.log
3) Script your way around
If you wish to setup a more custom parsing logic, I'd suggest trying out the script processor and hack your way until your requirements are met:
https://www.elastic.co/guide/en/beats/filebeat/7.17/processor-script.html

Related

DBT Test configuration for particular scenario

Hello Could anyone help me how to simulate this scenario. Example I want to validate these 3 fields on my table "symbol_type", "symbol_subtype", "taker_symbol" and return unique combination/result.
I tried to use this command, however Its not working properly on my test. Not sure if this is the correct syntax to simulate my scenario. Your response is highly appreciated.
Expected Result: These 3 fields should return my unique combination using DBT commands.
I'd recommend to either:
use the generate_surrogate_key (docs) macro in the model, or
use the dbt_utils.unique_combination_of_columns (docs) generic test.
For the first case, you would need to define the following in the model:
select
{{- dbt_utils.generate_surrogate_key(['symbol_type', 'symbol_subtype', 'taker_symbol']) }} as hashed_key_,
(...)
from your_model
This would create a hashed value of the three columns. You could then use a unique test in your YAML file.
For the second case, you would only need to add the generic test in your YAML file as follows:
# your model's YAML file
- name: your_model_name
description: ""
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- symbol_type
- symbol_subtype
- taker_symbol
Both these approaches will let you check whether the combination of the three columns is unique over the whole model's output.

How to persist column descriptions in BigQuery tables

I have created models in my dbt(data build tool) where I have specified column description. In my dbt_project.yml file as shown below
models:
sakila_dbt_project:
# Applies to all files under models/example/
+persist_docs:
relation: true
columns: true
events:
materialized: table
+schema: examples
I have added +persist_docs as described by dbt as the fix to make column description appear but still no description appears in bigquery table.
My models/events/events.yml looks like this
version: 2
models:
- name: events
description: This table contains clickstream events from the marketing website
columns:
- name: event_id
description: This is a unique identifier for the event
tests:
- unique
- not_null
- name: user-id
quote: true
description: The user who performed the event
tests:
- not_null
What I'm I missing?
p.s I'm using dbt version 0.21.0
Looks consistent with the required format as shown in the docs:
dbt_project.yml
models:
..[<resource-path>](resource-path):
....+persist_docs:
......relation: true
......columns: true
models/schema.yml
version: 2
models:
..- name: dim_customers
....description: One record per customer
....columns:
......- name: customer_id
........description: Primary key
Maybe spacing? I converted the spaces to periods in the examples above because the number of spaces is unforgivingly specific for yml files.
I've started using the vscode yml formatter because of how often I run into spacing issues on these keys in both the schema.yml and the dbt_project.yml
Otherwise, this isn't for a source or external-table right? Those are the only two artifacts that persist-docs is unsupported for.
Sources unsupported persist_docs -> sources tab
External Tables unsupported (Can't find in docs again but read today in docs or github issue)
Also Apache Spark unsupported (irrelevant here) Apache Spark Profile
Also, if you're going to be working with persist_docs a lot, check out this macro example persist_docs_op that Jeremy left for a run-operation to update your persisted docs in case that's all you changed!

Bigquery dbt_external_tables External Data Configuration

I need some help when using the dbt_external_tables package.
I realized that in the csv I have in GCS some lines appear to have line breaks and this is causing some issues when trying to query the table created by the macro.
Sometimes when doing this configuration of the external table manually the BigQuery UI has two options:
Allow jagged rows (CSV)
Allow quoted newlines (CSV) true
I usually put those options in true and sometimes the issues are solved.
I don't know how to do this using the dbt_external_tables.
This is important as I am receiving this errors when trying to query the table created by dbt "Error while reading table: kpi-process.file_csv.History, error message: CSV table references column position 9, but line starting at position:10956 contains only 7 columns. "
The dbt-external-tables package supports passing a dictionary of options for BigQuery external tables, which maps to the options documented here. In your case, it sounds like you want to turn on allow_jagged_rows and allow_quoted_newlines, so you can specify them like so:
version: 2
sources:
- name: my_external_source
tables:
- name: my_external_table
location: 'gs://bucket/path/*'
options:
format: csv
allow_jagged_rows: true
allow_quoted_newlines: true
And dbt will template a DDL statement accordingly:
create or replace external statement my_external_source.my_external_table
options (
format = 'csv',
allow_jagged_rows = true,
allow_quoted_newlines = true,
uris = ['gs://bucket/path/*']
)

Defining big query dbt sources with characters in table name?

After reviewing both of the below resources:
Source configurations
BigQuery configurations
I was unable to find an answer to this question:
Given a standard dbt project directory, I am defining a sources.yml which points to pre-existing bigquery tables that contain character names.
sources.yml:
version: 2
sources:
- name: biqquery
tables:
- name: `fa--task.dataset.addresses`
- name: `fa--task.dataset.devices`
- name: `fa--task.dataset.orders`
- name: `fa--task.dataset.payments`
Using tilde as in ` was successful directly from a select statement:
(select * from `fa--task.dataset.orders`)
but is not recognized as valid yaml in sources.
The desired result would be something like:
{{ sources('bigquery','`fa--task.dataset.addresses`') }}
Edit: Updated source.yml as requested:
Try this!
version: 2
sources:
- name: bigquery # are you sure you want to name it this? usually we name things after the data source, like 'stripe', or 'saleforce'
schema: dataset
database: fa--task
tables:
- name: addresses
- name: devices
- name: orders
- name: payments
Then in your models can do:
select * from {{ source('bigquery', 'addresses') }}
It might worth checking out the guide on sources to wrap your head around what's happening here, as well as the docs for source properties which contains the list of the keys available under the source: keys.

Import xx.sql file to execute by using Ebean

Is there any way to execute SQLs directly read from SQL files(xx.sql) in Ebean?
For example, if I had a SQL file including several SQL statements (values already written in the file), is there any way to execute this SQL file by using Ebean?
You have at least two options out of the box:
Play evolutions are meant to updating DB schema, so you can use them also for inserting initial data (if they are flat, and will not contain relations to the objects not created yet), sample evolution for MySQL:
# --- !Ups
INSERT INTO your_table (some_field) VALUES ('New value');
# --- !Downs
DELETE FROM your_table WHERE some_field = 'New value`;
Use Global object and insert initial data using common Ebean's way:
public void onStart(Application app) {
if (YourModel.find.findRowCount() == 0) {
YourModel newItem = new YourModel();
newItem.someField = "New value";
newItem.save();
YourModel newItem2 = new YourModel();
// etc....
}
}
For the second approach you can check the way how to read YAML file holding initial data with Global object of Zentask sample (the file with sample is placed in conf directory)
Edit:
Take a closer look to the initial-data.yml, there are also relations between tasks and projects, so they have fixed id values. So you need do the same in your yaml:
projects:
- !!models.Project
id: 1
name: Play 2.0
folder: Play framework
tasks:
- !!models.Task
title: Fix the documentation
done: false
folder: Todo
project: !!models.Project
id: 1