elastalert2 - alert text jinja templates - which variables are available? - elastalert

I'd like to make our monitoring system a bit more "business user friendly". I am using elastalert2 for monitoring. The mails it generates by default are highly cryptic and my colleagues outside of technology do not understand them at all.
I've been trying to play with alert_text to give them a bit of a better description of what happened. Unfortunately, I don't find any documentation of what variables are available for jinja templates. Therefore, the only thing I can print out is the number of hits - not the name of the rule, or for what time period the hits apply.
Is there someone who has some experience with that?

In your elastalert2 rule definition, you can use the alert_text_args field to define some fields you would like to use in your alert_text.
For example:
elastalert2:
rules:
some_test_rule: |-
<snip>
include: ["elasticsearch", "hostname", "#timestamp", "message", "username", "connection_id"]
alert_text: |
Error_message: {3}
User: {2}
instance: {0}
time: {1}
session_id: {4}
alert_text_type: alert_text_only
alert_text_args: ["hostname", "#timestamp", "username", "message", "connection_id"]
Reference: elastalert2 rule types documentation

Related

Filebeat: how to create new field from the path?

i would like to add new field extracted from the path what will be used. I have two path, see below.
paths:
- /home/*/app/logs/*.log
# - /home/v209/app/logs/*.log
# - /home/v146/app/logs/*.log
fields:
campaign: v209
fields_under_root: true
i would like to create new field campaign only with folder name like v209 or v146 any idea, how to do this in filebeads?
Thank you in advance!
Here are three suggested solutions tested with Filebeat 7.1.3
1) Static configuration of campaign field per input
filebeat.inputs:
- type: filestream
id: v209
paths:
- "/home/v209/app/logs/*.log"
fields:
campaign: v209
fields_under_root: true
- type: filestream
id: v146
paths:
- "/home/v146/app/logs/*.log"
fields:
campaign: v146
fields_under_root: true
output.console:
pretty: true
Explanation: This solution is simple. Each file input will have a field set (campaign) based on a static config.
Pros/Cons: This option has the problem of having to add a new campaign field every time you add a new path. For dynamic environments, this can pose a serious operational problem but it's dead simple to implement.
2) Dynamically extract campaign name from file path
processors:
- dissect:
tokenizer: "/%{key1}/%{campaign}/%{key3}/%{key4}/%{key5}"
field: "log.file.path"
target_prefix: ""
- drop_fields:
when:
has_fields: ['key1','key3','key4','key5']
fields: ['key1','key3','key4','key5']
Explanation: These processors work on top of your filestream or log input messages. The dissect processor will tokenize your path string and extract each element of your full path. The drop_fields processor will remove all fields of no interest and only keep the second path element (campaign id).
Pros/Cons: Assuming your path structures are stable, with this solution you don't have to do anything when new files appear under /home/*/app/logs/*.log
3) Script your way around
If you wish to setup a more custom parsing logic, I'd suggest trying out the script processor and hack your way until your requirements are met:
https://www.elastic.co/guide/en/beats/filebeat/7.17/processor-script.html

How to programmatically list available Google BigQuery locations?

How to programmatically list available Google BigQuery locations? I need a result similar to what is in the table of this page: https://cloud.google.com/bigquery/docs/locations.
As #shollyman has mentioned
The BigQuery API does not expose the equivalent of a list locations call at this time.
So, you should consider filing a feature request on the issue tracker.
Meantime, I wanted to add Option 3 to those two already proposed by #Tamir
This is a little naïve option with its pros and cons, but depends on your specific use case can be useful and easy adapted to your application
Step 1 - load page (https://cloud.google.com/bigquery/docs/locations) html
Step 2 - parse and extract needed info
Obviously, this is super simple to implement in any client of your choice
As I am huge BigQuery fan - I went through "prove of concept" using BigQuery Tool - Magnus
I've created workflow with just two Tasks:
API Task - to load page's HTML into variable var_payload
and
BigQuery Task - to parse and extract wanted info out of html
The "whole" workflow is as simple as it looks in below screenshot
The query I used in BigQuery Task is
CREATE TEMP FUNCTION decode(x STRING) RETURNS STRING
LANGUAGE js AS """
return he.decode(x);
"""
OPTIONS (library="gs://my_bucket/he.js");
WITH t AS (
SELECT html,
REGEXP_EXTRACT_ALL(
REGEXP_REPLACE(html,
r'\n|<strong>|</strong>|<code>|</code>', ''),
r'<table>(.*?)</table>'
)[OFFSET(0)] x
FROM (SELECT'''<var_payload>''' AS html)
)
SELECT pos,
line[SAFE_OFFSET(0)] Area,
line[SAFE_OFFSET(1)] Region_Name,
decode(line[SAFE_OFFSET(2)]) Region_Description
FROM (
SELECT
pos, REGEXP_EXTRACT_ALL(line, '<td>(.*?)</td>') line
FROM t,
UNNEST(REGEXP_EXTRACT_ALL(x, r'<tr>(.*?)</tr>')) line
WITH OFFSET pos
WHERE pos > 0
)
As you can see, i used he library. From its README:
he (for “HTML entities”) is a robust HTML entity encoder/decoder written in JavaScript. It supports all standardized named character references as per HTML, handles ambiguous ampersands and other edge cases just like a browser would ...
After workflow is executed and those two steps are done - result is in project.dataset.location_extraction and we can query this table to make sure we've got what is expected
Note: obviously parsing and extracting needed locations info is quite simplified and surely can be improved to be more flexible in terms of changing source page layout
Unfortunately, There is no API which provides BigQuery supported location list.
I see two options which might be good for you:
Option 1
You can manually manage a list and expose this list to your client via an API or any other means your application support (You will need to follow BigQuery product updates to follow on updates on this list)
Option 2
If your use case is to provide a list of the location you are using to store your own data you can call dataset.list to get a list of location and display/use it in your app
{
"kind": "bigquery#dataset",
"id": "id1",
"datasetReference": {
"datasetId": "datasetId",
"projectId": "projectId"
},
"location": "US"
}

How to use Regex to find a specific string between two special characters?

I`m currently migrating sql data between two forum systems (Woltlab Burning Board 2 -> MyBB 1.8).
I need to change the internal links from http://example.com/thread.php?threadid=XYZ
to http://example.com/showthread.php?tid=ABC. Thread ID's will change between the two systems, so I'm not able to do a simple string replace.
I already catch all posts containing http://example.com/thread.php?threadid=. Now I need to get the unique ID into a variable. As the whole post-string can also contain external links (e.g. http://google.com) I cant just catch Numbers before.
I would like to catch the Thread-ID from this string http://example.com/thread.php?threadid=XYZ, which is part of a bigger string (Forum Post). I guess Regex could be used for this.
Any help would be highly appreciated! Thanks!
In PowerShell, the following will capture the thread ID:
$URI = "http://example.com/thread.php?threadid=2438&Query=GetSomething?2345"
$tid = ($URI | select-string -pattern "threadid=(\d+)").matches.groups.value[1]
$tid
If there is no thread id to be captured, the $tid assignment will throw an error.

Printing watson entity found in text (input)

I have a situation, where user asks for "I want to buy us dolars". I have already defined the intent for the question "I want to buy". What I need, is to identify which currency user is talking about (buying).
For that, I created an Entity "money", with a value "currency", and its synonyms (us dollar, euros, ienes, ....).
The issue is, that the node recognizes #items:buying and #money:currency. How can I get which currency was found, and use it onto the context/output?
I tried using and also
but it always returns an empty value.
entities[0] returns me only the buying stuff, the first recognized thing. I need the second, specifically by name, in order to customize my conversation flow.
Thanks a lot.
To resolve this, first switch on #sys-currency system entity.
After that, this example should work once training is complete.
Condition: #sys-currency
Response: Currency: <? #sys-currency.unit ?>. Total: <? #sys-currency ?>
However it assumes that you are writing the currency correctly. For example:
20 USD
$20
20 dollars
More details here:
https://www.ibm.com/watson/developercloud/doc/conversation/system-entities.html#sys-currency-entity
To address the other point of finding the value of the recognised text of the entity, you would use:
<? entities[0].literal ?>

Am I training my wit.ai bot correctly?

I'm trying to train my Wit.ai bot in order to recognize the first name of someone. I'm not very sure if I well understand how the NLP works so I'll give you an example.
I defined a lot of expressions like "My name is XXXX", "Everybody calls me XXXX"
In the "Understanding" table I added an entity named "contact_name" and I add almost 50 keywords like "Michel, John, Mary...".
I put the trait as "free-text" and "keywords".
I'm not sure if this process is correctly. So, I ask you:
does it matter the context like "My name is..." for the NLP? I mean...will it help the bot to predict that after this expression probably a fist name will come on?
is that right to add like 50 values to an entity or it's completly wrong?
what do you suggest as a training process in order to get the first name of someone?
You have done it right by keeping the entity's search strategy as "free-text" and "Keywords". But Adding keywords examples to the entity doesn't make any sense because a person's name is not a keyword.
So, I would recommend a training strategy which is as follows:
Create various templates of the message like, "My name is XYZ", "I am XYZ", "This is XYZ" etc. (all possible introduction messages you could think of)
Remove all keywords and expressions for the entity you created and add these two keywords:
"a b c d e f g h i j k l m n o p q r s t u v w x y z"
"XYZ" (can give any name but maintain this name same for validating the templates)
In the 'Understanding' tab enter the messages and extract the name into the entity ("contact_name" in your case) and validate them
Similarly, validate all message templates keeping the name as "XYZ"
After you have done this for all templates your bot will be able to recognise any name in a given template of the message
The logic behind this is your entity is a free-text and keyword which means it first tries to match the keyword if not matched it tries to find the word in the same position of the templates. Keeping the name same for validations helps to train the bot with the templates and learn the position where the name will be usually found.
Hope this works. I have tried this and worked for me. I am not sure how bot trains in background. I recommend you to start a new app and do this exercise.
Comment if there is any problem.
wit.ai has a pre-trained entity extraction method called wit/contact, which
Captures free text that's either the name or a clear reference to a
person, like "Paul", "Paul Smith", "my husband", "the dentist".
It works good even without any training data.
To read about the method refer to duckling.