KQL Determine if datetime is in any time windows - kql

I'm looking for a timestamp of specific error in a log and so I can identify which other events occurred within N seconds of it in other tables.
I'm able to construct a table (set) of datetime intervals/windows, but I'm struggling to determine when a datetime from another table occurs within any of the intervals in the set.
// Create a table of time windows (interval) +/- 5 seconds of target error
let intervals = k8slogs
| where Message contains "my specific error"
| project begin=datetime_add('second', -5, env_time), end=datetime_add('second', 5, env_time)
// Show all messages within 5 seconds of "my specific error"
k8slogs
| union logs1
| union logs2
| where env_time // is in any 'window' from above query
| project env_time, Message
| order by env_time asc
I also tried looking into the around() function, but wasn't able to come up with a solution.
Here is another example below with sample data:
k8slogs
| env_time | message |
|:-------- |:----------------------- |
| 15 | "my specific exception" |
| 45 | "my specific exception" |
logs1
| env_time | message |
|:-------- |:----------------------- |
| 11 | "another error" |
| 35 | "hello world" |
intervals
| begin | end |
|:----- |:------:|
| 10 | 20 |
| 40 | 50 |
desired query result
| env_time | message |
|:-------- |:----------------------- |
| 11 | "another error" |
| 15 | "my specific exception" |
| 45 | "my specific exception" |

One technique that might work is doing joins on the time window, see the relevant article, you can do the join with the other tables and then filter based on the time interval that you are interested in, see the relevant example in the section titled "Rewrite the query to account for the time window"

// Data sample generation. Not part of the solution
let t1 = materialize(range record_id from 1 to 20 step 1 | extend env_time = ago(1h*rand()), Message = strcat(case(rand()<0.5, "my specific error:", "some other error:"), tostring(record_id)));
let t2 = materialize(range record_id from 1 to 100 step 1 | extend env_time = ago(1h*rand()), Message = strcat("logs1 : ", tostring(record_id)));
let t3 = materialize(range record_id from 1 to 100 step 1 | extend env_time = ago(1h*rand()), Message = strcat("logs2 : ", tostring(record_id)));
let k8slogs = view(){t1};
let logs1 = view(){t2};
let logs2 = view(){t3};
// Solution starts here
let time_window = 5s;
k8slogs
| where Message contains "my specific error"
| mv-expand i = range(-1,1) to typeof(int)
| extend env_time_bin = bin(env_time + i * time_window, time_window)
| project-away i
| project-rename error_env_time = env_time, error_message = Message, error_record_id = record_id
| join kind=inner
( union withsource=table (k8slogs | where not(Message contains "my specific error") | as k8slogs), logs1, logs2
| extend env_time_bin = bin(env_time, time_window)
| project-rename log_env_time = env_time, log_message = Message, log_record_id = record_id
) on env_time_bin
| where abs(log_env_time - error_env_time) <= time_window
| project-away env_time_bin*
| order by log_env_time asc
error_record_id
error_env_time
error_message
table
log_record_id
log_env_time
log_message
3
2022-05-18T11:53:48.7571282Z
my specific error:3
logs2
33
2022-05-18T11:53:52.2075125Z
logs2 : 33
18
2022-05-18T12:05:10.2440591Z
my specific error:18
logs1
48
2022-05-18T12:05:06.1936749Z
logs1 : 48
6
2022-05-18T12:11:11.8643195Z
my specific error:6
logs2
15
2022-05-18T12:11:08.0750978Z
logs2 : 15
1
2022-05-18T12:38:15.4453636Z
my specific error:1
k8slogs
11
2022-05-18T12:38:12.473693Z
some other error:11
1
2022-05-18T12:38:15.4453636Z
my specific error:1
logs1
73
2022-05-18T12:38:18.8940209Z
logs1 : 73
Fiddle

Related

PowerBI / SQL Query to verify records

I am working on a PowerBI report that is grabbing information from SQL and I cannot find a way to solve my problem using PowerBI or how to write the required code. My first table, Certifications, includes a list of certifications and required trainings that must be obtained in order to have an active certification.
My second table, UserCertifications, includes a list of UserIDs, certifications, and the trainings associated with a certification.
How can I write a SQL code or PowerBI measure to tell if a user has all required trainings for a certification? ie, if UserID 1 has the A certification, how can I verify that they have the TrainingIDs of 1, 10, and 150 associated with it?
Certifications:
CertificationsTable
UserCertifications:
UserCertificationsTable
This is a DAX pattern to test if contains at least some values.
| Certifications |
|----------------|------------|
| Certification | TrainingID |
|----------------|------------|
| A | 1 |
| A | 10 |
| A | 150 |
| B | 7 |
| B | 9 |
| UserCertifications |
|--------------------|---------------|----------|
| UserID | Certification | Training |
|--------------------|---------------|----------|
| 1 | A | 1 |
| 1 | A | 10 |
| 1 | A | 300 |
| 2 | A | 150 |
| 2 | B | 9 |
| 2 | B | 90 |
| 3 | A | 7 |
| 4 | A | 1 |
| 4 | A | 10 |
| 4 | A | 150 |
| 4 | A | 1000 |
In the above scenario, DAX needs to find out if the mandatory trainings (Certifications[TrainingID]) by Certifications[Certification] is completed by
UserCertifications[UserID ]&&UserCertifications[Certifications] partition.
In the above scenario, DAX should only return true for UserCertifications[UserID ]=4 as it is the only User that completed at least all the mandatory trainings.
The way to achieve this is through the following measure
areAllMandatoryTrainingCompleted =
VAR _alreadyCompleted =
CONCATENATEX (
UserCertifications,
UserCertifications[Training],
"-",
UserCertifications[Training]
) // what is completed in the fact Table; the fourth argument is very important as it decides the sort order
VAR _0 =
MAX ( UserCertifications[Certification] )
VAR _supposedToComplete =
CONCATENATEX (
FILTER ( Certifications, Certifications[Certification] = _0 ),
Certifications[TrainingID],
"-",
Certifications[TrainingID]
) // what is comeleted in the training Table; the fourth argument is very important as it decides the sort order
VAR _isMandatoryTrainingCompleted =
CONTAINSSTRING ( _alreadyCompleted, _supposedToComplete ) // CONTAINSSTRING (<Within Text>,<Search Text>); return true false
RETURN
_isMandatoryTrainingCompleted

Apply condition for a particular group: SQL

Currently I have to provide a report in BigQuery for Google Analytics of a format:
ClientID, SessionID, Source, Medium SessionNumber
So, the outcome has to be the table where for every ClientID, SessionNumber contains ordinal of an event occurred sorted by date
The problem is during one session a conversion occurs. So SessionNumber values for the particular ClientID do not end at the event where conversion occurs.
[For example, there are 30 events in a ClientID = 1, and conversion occurs at 15-th event (ordinal from SessionNumber)]
My question is how to delete all the data in a table after the ordinal in SessionNumber when conversion occurs?
So it would become possible to see when the lastclick occurred and and see the Assisted Conversion?
How to apply a condition for a particular group of data (for every client ID, in this case - REMEMBER that, here, for every ClientID we have a particular number of events and corresponding ordinals)
Example:
Client ID | SessionID | Source | Medium | SesionNumber | Goal Achieved (1 if yes)
1 | 456| google | cpc | 1 | 0
1 | 456| google | cpc | 2 | 0
1 | 456| google | cpc | 3 | 1
1 | 456| google | cpc | 4 | 0
2 | 234| ... ... ... ...
... ... ... ... ... ...
So, I have to get rid of all the rows after the SessionNumber when conversion occurred inside of each ClientID (Conversion occurs after the goal is achieved - I have already created such code). Is there any Syntax for standard SQL which would allow to create such condition for every ClientID ?
The thing is that inside one session conversion may occur and after that user can do other things as well, but I only need records from the start of the session until conversion occurs
I would like to return the table (from the example):
Client ID | SessionID | Source | Medium | SesionNumber | Goal Achieved (1 if yes)
1 | 456| google | cpc | 1 | 0
1 | 456| google | cpc | 2 | 0
1 | 456| google | cpc | 3 | 1
2 | 234| ... ... ... ...
... ... ... ... ... ...

Adding a field to differentiate parts of tables

I have several gigabites of arducopter binary flight logs. Each log is a series of messages.
MessageType1: param1, param2, param3
MessageType2: param3, param4, param5, param6
...
The logs are self describing in the sense that the first time a message appears in the log it tells what are the names of the params.
MessageType1: timestamp, a, b
MessageType1: value 1, value 2, value 3
MessageType2: timestamp, c, d, e
MessageType1: value 4, value 5, value 6
MessageType1: value 7, value 8, value 9
MessageType2: value 10, value 11, value 12, value 13
I have written a python script that takes the logs apart and creates tables for each message type in a sqlite database where the message type is the table name and the parameter name is the column name.
Table MessageType1
| Flight Index | Timestamp | a | b |
|--------------|-----------|-------|---------|
| ... | | | |
| "Flight 1" | 111 | 14725 | 10656.0 |
| "Flight 1" | 112 | 57643 | 10674.0 |
| "Flight 1" | 113 | 57157 | 13674.0 |
| ... | | | |
| "Flight 2" | 111 | 56434 | 16543.7 |
| "Flight 2" | 112 | 56434 | 16543.7 |
Table MessageType2
| Flight Index | Timestamp | c | d | e |
|--------------|-----------|-------|---------|--------|
| ... | | | | |
| "Flight 1" | 111 | 14725 | 10656.0 | 462642 |
| "Flight 1" | 112 | 57643 | 10674.0 | 426428 |
| "Flight 1" | 113 | 57157 | 13674.0 | 642035 |
| ... | | | | |
| "Flight 2" | 111 | 56434 | 16543.7 | 365454 |
| "Flight 2" | 112 | 56434 | 16543.7 | 754632 |
| ... | | | | |
For a single log this database is good enough but i would like to add several logs. Meaning messages of several logs of same type go into a single table.
In this case I added a column "Flight Index" which is what I would like to have but:
Each log processed should have a unique identifier
The identifier should be minimal in size, as im dealing with tables that have possibly millions of rows.
Im thinking of adding the flight index as an integer and just iterating the number when processing logs and if the database exists taking the last row of a table and using its index + 1. Is this optimal or is there a SQL native way of operating?
Am i doing something wrong in general as I'm not experienced with SQL?
EDIT: added a second table to show that messages dont have the same number of parameters and example messages.
You can achieve this with two tables
Table 1
Flights
Flight name, Flight number, date, device, etc. (any other data points make sense)
"Flight 1", 1, 1/1/2018,...
"Flight 2", 2, 1/2/2018,...
Table 2
Flight_log
Flight_number, timestamp, parameter1, parameter2,
1,111,14725,10656.0
1,112,57643,10674.0
1,113,57157,13674.0
...
2,111,56434,16543.7
2,112,56434,16543.7
Before you load Flight_logs table you should have an entry in Flights table, you can do a "lookup" do get the Flight_number from Flight table
After reading about data normalization I ended up with the following database.
This minimizes the number of tables. I could have done 35 tables (one for each message) and right parameters for each column, but that would make the database more fragile in the case where the parameters in a message are changed.
EDIT: replaced the image as datamodler got fixed.

Query'd top 15 faults, need the accumulated downtime from another column

I'm currently trying to query up a list of the top 15 occurring faults on a PLC in the warehouse. I've gotten that part down:
Select top 15 fault_number, fault_message, count(*) FaultCount
from Faults_Stator
where T_stamp> dateadd(hour, -18, getdate())
Group by Fault_number, Fault_Message
Order by Faultcount desc
HOOOWEVER I now need to find out the accumulated downtime of said faults in the top 15 list, information in another column "Fault_duration". How would I go about doing this? Thanks in advance, you've all helped me so much already.
+--------------+---------------------------------------------+------------+
| Fault Number | Fault Message | FaultCount |
+--------------+---------------------------------------------+------------+
| 122 | ST10: Part A&B Failed | 23 |
| 4 | ST16: Part on Table B | 18 |
| 5 | ST7: No Spring Present on Part A | 15 |
| 6 | ST7: No Spring Present on Part B | 12 |
| 8 | ST3: No Pin Present B | 8 |
| 1 | ST5: No A Housing | 5 |
| 71 | ST4: Shuttle Right Not Loaded | 4 |
| 144 | ST15: Vertical Cylinder did not Retract | 3 |
| 98 | ST8: Plate Loader Can not Retract | 3 |
| 72 | ST4: Shuttle Left Not Loaded | 2 |
| 94 | ST8: Spring Gripper Cylinder did not Extend | 2 |
| 60 | ST8: Plate Loader Can not Retract | 1 |
| 83 | ST6: No A Spring Present | 1 |
| 2 | ST5: No B Housing | 1 |
| 51 | ST4: Vertical Cylinder did not Extend | 1 |
+--------------+---------------------------------------------+------------+
I know I wouldn't be using the same query, but I'm at a loss at how to do this next step.
Fault duration is a column which dictates how long the fault lasted in ms. I'm trying to have those accumulated next to the corresponding fault. So the first offender would have those 23 individual fault occurrences summed next to it, in another column.
You should be able to use the SUM accumulator:
Select top 15 fault_number, fault_message, count(*) FaultCount, SUM (Fault_duration) as FaultDuration
from Faults_Stator
where T_stamp> dateadd(hour, -18, getdate())
Group by Fault_number, Fault_Message
Order by Faultcount desc

SQL table join based on aggregated/equal values

I have a very horrible two table dataset to hand that I need to create a join query for. Its best to show an example:
+------+---------+-----------+--+
| Time | Sent | Received | |
+------+---------+-----------+--+
| 1 | 100 | NULL | |
| 2 | NULL | 100 | |
| 3 | 50 | NULL | |
| 4 | NULL | 40 | |
| 5 | NULL | 10 | |
| 6 | 400 | 200 | |
| 7 | 100 | 200 | |
| 8 | NULL | 100 | |
| 9 | 500 | 500 | |
+------+---------+-----------+--+
Assuming 'time' above is in hours - 'Sent' shows the number of items sent in that hour, and 'Received' shows the number received. The problem being that they likely will not arrive in the same hour they were sent (though they can).
I need to match the received against the appropriate sent to find the time the received item was Sent.
Using the above:
Received 100 at time 2 is obviously the items sent from hour 1, so
that would be assigned to hour 1.
50 Sent in time 3 arrived in two batches (40 and 10 in time 4/5 respectively). So received 40/10 should be lumped into the time 3 category
Received in 6/7 (each for 200) correspond to the 400 order in time 6 (note that half the order was received in the same hour, this can happen)
Also in time 7 a new order was sent which corresponds to received for time 8
Also in time 9 an order of 500 was sent and received in the same hour.
Below is an example of what the output would look like (Note that there are other values associated with each 'Received' row but they are orthogonal to the task and will just be summed to provide meaning)
+------+----------+
| Time | Received |
+------+----------+
| 1 | 100 |
| 3 | 50 |
| 6 | 400 |
| 7 | 100 |
| 8 | 100 |
| 9 | 500 |
+------+----------+
I have been trying to rack my head around this for a while. If I could do this outside of sql I would have some function that loops through the value for each 'Sent' incrementally through time and loop that through 'Received' until the values match then assign those Received values to the Time index, then delete both the sent and received from the array (or note where the loop got to and continue from there)
Unfortunately the project doesnt allow the that scope - This must be done as much in SQL as possible. I am really at a loss and hoping there is some SQL functionality I have overlooked. Any help is much appreciated
If this is in SQL Server, you can use a WHILE loop. Look at the documentation. So, your project might look something like this:
CREATE TABLE #temp ([Time] int, [Received] int)
DECLARE #i int = 1
DECLARE #value int = 0
WHILE #i <= 9
BEGIN
#value = SELECT [Received] FROM [table] WHERE [Time] = #i
--Your logic here
INSERT INTO #temp ...
END
SELECT * FROM #temp
DROP TABLE #temp