Joining events into a single row - splunk

I have some events that capture the times when different jobs start or end. Here are two events that capture the start and end times of a job-
[
{
"appName": "a1",
"eventName": "START",
"eventTime": "t1"
},
{
"appName": "a1",
"eventName": "END",
"eventTime": "t2"
},
{
"appName": "a1",
"eventName": "START",
"eventTime": "t3"
},
{
"appName": "a2",
"eventName": "START",
"eventTime": "t4"
}
]
I am looking to visualize this information in a table showing the latest start and end times of each application, something like this -
--AppName--Last Start Time--Last End Time--
--a1--t3--t2--
--a2--t4--null--
The above table is assuming t3 comes after t1. How do i get to this ? I am able to extract the latest events for each into separate rows with this -
stats latest(eventTime) by appName, eventName but need them to be combined into one single tuple.

Create separate fields for start and end times, then use stats to get the latest for each.
| eval start_time=if(eventName="START", eventTime, null())
| eval end_time=if(eventName="END", eventTime, null())
| stats latest(start_time) as last_start, latest(end_time) as last_end by appName
XXX Note that for the latest function to work properly, eventTime must be in epoch format. If it isn't, use the strptime function in a eval to convert it. XXX (Disregard this last part.)

Related

Add computed field to Query in Grafana using JSON API als data source

What am I trying to achieve:
I would like to have a time series chart showing the total number of members in my club at any time. This member count should be calculated by using the field "Eintrittsdatum" (joining-date) and "Austrittsdatum" (leaving-date). I’m thinking of it as a running sum - every filled field with a joining-date means +1 on the member count, every leaving-date entry is a -1.
Data structure
I’m calling the API of webling.ch with a secret key. This is my data structure with sample data per member:
[
{
"type": "member",
"meta": {
"created": "2020-03-02 11:33:00",
"createuser": {
"label": "Joana Doe",
"type": "user"
},
"lastmodified": "2022-12-06 16:32:56",
"lastmodifieduser": {
"label": "Joana Doe",
"type": "user"
}
},
"readonly": true,
"properties": {
"Mitglieder ID": 99,
"Anrede": "Dear",
"Vorname": "Jon",
"Name": "Doe",
"Strasse": "Doeington Street",
"Adresszusatz": null,
"PLZ": "9999",
"Ort": "Doetown",
"E-Mail": "jon.doe#doenet.net",
"Telefon Privat": null,
"Telefon Geschäft": null,
"Mobile": "099 877 54 54",
"Geschlecht": "m",
"Geburtstag": "1966-03-10",
"Mitgliedschaftstyp": "Aktivmitgliedschaft",
"Eintrittsdatum": "2020-03-01",
"Austrittsdatum": null,
"Passfoto": null,
"Wordpress Benutzername": null,
"Wohnhaft im Glarnerland": false,
"Lat": "43.1563379",
"Long": "6.0474622"
},
"parents": [
240
],
"children": {
},
"links": {
"debitor": [
2124,
3056,
3897
],
"attendee": [
2576
]
},
"id": 1815
}
]
Grafana data source
I am using the “JSON API” by Marcus Olsson: GitHub - grafana/grafana-json-datasource: A data source plugin for loading JSON APIs into Grafana.
Grafana v9.3.1 (89b365f8b1) on Linux
My current approach
Queries:
Query C - uses a filter on the source-API to only show entries with "Eintrittsdatum" IS NOT EMPTY
Field 1 (alias "datum") has a JSONata-Query of:
properties.Eintrittsdatum
Field 2 (alias "names") should return the full name and has a query of:
$map($.properties, function($v) {(
($v.Vorname&" "&$v.Name);
)})
Field 3 (alias "value") should return "1" for every entry and has a query of:
$map($.properties, function($v) {(
(1);
)})
Query D - uses a filter on the source-API to only show entries with "Austrittsdatum" IS NOT EMPTY
Field 1 (alias "datum") has a JSONata-Query of:
properties.Austrittsdatum
Field 2 (alias "names") should return the full name and has a query of:
$map($.properties, function($v) {(
($v.Vorname&" "&$v.Name);
)})
Field 3 (alias "value") should return "1" for every entry and has a query of:
$map($.properties, function($v) {(
(1);
)})
Here's a screenshot to clarify things
(https://zigerschlitzmakers.ch/wp-content/uploads/2023/01/ScreenshotGrafana-1.png)
Transformations:
My applied transformations
(https://zigerschlitzmakers.ch/wp-content/uploads/2023/01/ScreenshotGrafana-2.png)
What's working
I can correctly gather the number of members added/subtracted per day.
What's not working
I can't get the graph to display the way i want: I'd like to have a running sum of these numbers instead of the following two graphs.
Time series graph with merged queries
(https://zigerschlitzmakers.ch/wp-content/uploads/2023/01/ScreenshotGrafana-3.png)
Time series graph with unmerged queries
(https://zigerschlitzmakers.ch/wp-content/uploads/2023/01/ScreenshotGrafana-4.png)
I can't get the names to display within the tooltip of the data points (really not THAT necessary).

How to understand the field "groups" and the agg "GROUPING" in EnumerableAggregate

I am new to Calcite and I am using Calcite to convert a SQL query to an optimized plan, where I will translate the plan to a dataflow graph in an execution engine. One challenge is the translation of different RelNodes (e.g., Filter, Project, Aggregate, Calc, etc). I found a difficulty in understanding the EnumerableAggregate RelNode. Specifically, for the following example, where I defined a table T as
create table T (src int, dst int, label int, time int);
and wrote a toy query as
select count(distinct dst), sum(distinct label), count(*)
from T
where dst > 1
group by src
having src = 0;
I will obtain an optimized plan which contains two EnumerableAggregate RelNodes and here is the first EnumerableAggregate RelNode:
{
"id": "2",
"relOp": "org.apache.calcite.adapter.enumerable.EnumerableAggregate",
"group": [ 0, 1, 2 ],
"groups": [
[ 0, 1 ], [ 0, 2 ], [ 0 ]
],
"aggs": [
{
"agg": {
"name": "COUNT",
"kind": "COUNT",
"syntax": "FUNCTION_STAR"
},
"type": {
"type": "BIGINT",
"nullable": false
},
"distinct": false,
"operands": [],
"name": "EXPR$2"
},
{
"agg": {
"name": "GROUPING",
"kind": "GROUPING",
"syntax": "FUNCTION"
},
"type": {
"type": "BIGINT",
"nullable": false
},
"distinct": false,
"operands": [ 0, 1, 2 ],
"name": "$g"
}
]
}
I think I understand the reason why there are two Aggregate RelNodes. The reason is due to the use of distinct on dst in count and the use of distinct on label in sum in the query, where the optimizer wants to first group the data by (1) the group key src and (2) the two distinct columns (dst and label), in order to remove duplications. Then in the second Aggregate we calculate count and sum,
What I do not understand is how the first Aggregate processes the input data, what does the field groups (i.e., [0, 1], [0, 2] and [0]) do, what does the agg function GROUPING do and how many columns are there in the ouput of the first Aggregate.
For example, given the following input data: [[2,3,4,0], [2,3,4,1], [3,2,4,2], [3,2,4,3], [5,6,7,4], [5,6,7,5]], I think the data will be firstly divided into three groups: [[2,3,4,0], [2,3,4,1]] and [[3,2,4,2], [3,2,4,3]] and [[5,6,7,4], [5,6,7,5]]. But what is the next step?
Any help would be appreciated. Thanks!

how to use trino/presto to query redis

I have a simple string and hash stored in redis
get test
"1"
hget htest first
"first hash"
I'm able to see the "table" test, but there are no columns
trino> show columns from redis.default.test;
Column | Type | Extra | Comment
--------+------+-------+---------
(0 rows)
and obviously I can't get result from select
trino> select * from redis.default.test;
Query 20210918_174414_00006_dmp3x failed: line 1:8: SELECT * not allowed from relation
that has no columns
I see in the documentation that I might need to create a table definition file, but I wasn't able to create one that will work.
I had few variations of this, but this is the one for example:
{
"tableName": "test",
"schemaName": "default",
"value": {
"dataFormat": "json",
"fields": [
{
"name": "number",
"mapping": 0,
"type": "INT"
}
]
}
}
any idea what am I doing wrong?
I focused on the string since it's simpler, but I also need to query the hash

Select JSON object that appears more than one time

I am trying to write a query to return all trains that have more than one etapesSupervision.
My table has a column called DETAIL, in this column I can find the JSON of my train.
"nomTrain": "EVOL99",
"compositionCourtLong": "LONG",
"sillons": [{
"numeroTrain": "EVOL99"
}],
"sillonsV4": [{
"refSillon": "sillons/4289505/2"
}],
"branchesStif": [{
"data": "49",
"data": "BP",
"data": "ORIGINE"
} ],
"etapesSupervision": [{
"data": "PR/0087-758896-00",
"data": "PR/0087-758607-BV",
"superviseur": "1287",
"uoSuperviseur": "B"
},
{
"data": "PR/0087-758607-BV",
"data": "PR/0087-001479-BV",
"superviseur": "1287",
"uoSuperviseur": "B"
}],
This is the query I wrote :
select * from course where CODE_LIGNE_COMMERCIALE='B'
--and ref = 'train/2018-11-12'
and instr(count(train.detail,'"etapesSupervision":'))> 1 ;
Using this, I return trains with only one etapesSupervision.
The thing is the column DETAIL is JSON, so I feel like I can't do a lot with it.
I tried also with like, but it doesn't work either.
Thank you for your comments.
This is the query that worked:
select data,data,data
from train
where
length(DETAIL) - length(replace(DETAIL,'uoSuperviseur',null)) > 20 ;
And this way I have only trains that have more than one supervisor.
Thanks again

Elasticsearch how to quickly find the result when search array fields

My JSON data:
{
"date": 1484219926,
"uid": "1234567",
"interest": [
"2000001",
"2000002",
"....",
"2000xxx"
],
"other": "xxxx"
}
The search result as the following SQL:
select count(*)
from xxxxx
where date > time1 and
date < time2 and
interest="20000xxx"
The filed interest may have 500 itmes. I want to quickly get the search result, what shou I do? The total data may be 2 billion.